A Heterogeneous Microkernel OS for Rack-Scale Systems

A Heterogeneous Microkernel OS for Rack-Scale Systems Matthias Hille Nils Asmussen Technische Universität Dresden Barkhausen Institut [email protected] [email protected] Hermann Härtig Pramod Bhatotia Technische Universität Dresden The University of Edinburgh [email protected] [email protected] ABSTRACT 1 INTRODUCTION Datacenters are adopting heterogeneous hardware in the Heterogeneous hardware is finding its way into datacenters. form of different CPU ISAs and accelerators. Advances in Cloud vendors offer x86 and ARM general purpose cores, low-latency and high-bandwidth interconnects enable hard- GPUs, TPUs and FPGAs [20, 25, 26, 28]. Today these compo- ware vendors to tighten the coupling of multiple CPU servers nents are hosted in discrete servers which are bundled in a and accelerators. The closer connection of components fa- rack [16]. A server consists of multiple CPUs, memory and cilitates bigger machines, which pose a new challenge to extension cards including accelerator cards [38]. This design operating systems. We advocate to build a heterogeneous offers little flexibility regarding the ratio between general- OS for large heterogeneous systems by combining multiple purpose and specialized compute power, meaning CPUs and OS design principles to leverage the benefits of each design. accelerators, respectively. One could argue that there is flex- Because a security-oriented design, enabled by simplicity ibility within one server, namely whether or not accelerator and clear encapsulation, is vital in datacenters, we choose to cards are added and if so, which kind of accelerators. How- survey various design principles found in microkernel-based ever, this is not the granularity a datacenter operator thinks systems. We explain that heterogeneous hardware employs of. Companies which run large datacenters, like Facebook, different mechanisms to enforce access rights, for example are starting to build racks out of chassis which consist of dif- for memory accesses or communication channels. We outline ferent server types like compute blades and accelerators [4]. a way to combine enforcement mechanisms of CPUs and This shows, that in a datacenter the flexibility should rather accelerators in one system. A consequence of this is a hetero- be at the level of a whole rack to fit the demands of individ- geneous access rights management which is implemented as ual workloads and ease component composition. From this a heterogeneous capability system in a microkernel-based demand of flexibility we derive an architecture that brings OS. flexibility into a rack by means of different board types which make up a rack. ACM Reference Format: In a rack-scale design as depicted in Figure 1, a machine Matthias Hille, Nils Asmussen, Hermann Härtig, and Pramod Bhato- handled by the operating system comprises the whole rack. tia. 2020. A Heterogeneous Microkernel OS for Rack-Scale Systems. In ACM SIGOPS Asia-Pacific Workshop on Systems (APSys ’20), Au- Due to the amount of components and the size of the rack gust 24–25, 2020, Tsukuba, Japan. ACM, New York, NY, USA, 9 pages. the overhead for cache-coherence across the whole rack https://doi.org/10.1145/3409963.3410487 is prohibitive. Hence, we do not expect a rack to resem- ble a cache-coherent machine but rather a conglomeration of cache-coherent islands [37]. This leaves the OS with a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not heterogeneous system containing various architectures and made or distributed for profit or commercial advantage and that copies bear communication schemes. On top of that, the enforcement of this notice and the full citation on the first page. Copyrights for components access rights, one of the main tasks of an OS, works differ- of this work owned by others than ACM must be honored. Abstracting with ently on diverse architectures. credit is permitted. To copy otherwise, or republish, to post on servers or to Various operating systems have been developed and op- redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. timized for a particular system type. One such system type APSys ’20, August 24–25, 2020, Tsukuba, Japan are cache coherent systems for which highly optimized oper- © 2020 Association for Computing Machinery. ating systems like Linux, Windows, XNU and Fiasco.OC [34] ACM ISBN 978-1-4503-8069-0/20/08...$15.00 have been developed. on the contrary, there are systems https://doi.org/10.1145/3409963.3410487 APSys ’20, August 24–25, 2020, Tsukuba, Japan Matthias Hille, Nils Asmussen, Hermann Härtig, and Pramod Bhatotia the severity of many errors compared to the same errors CPU NIC CPU NIC CPU NIC in a monolithic OS [9]. The least-privilege access policy is Accelerator NIC Accelerator NIC CPU NIC typically implemented by means of capabilities [34]. We propose to merge different microkernel-based OS de- CPU NIC Accelerator NIC CPU NIC signs into one OS to get the maximum out of both: CPUs Rack Rack Rack and accelerators. When combining different OS designs di- CPU NIC Accelerator NIC CPU NIC verse OS abstractions and therefore various capability types and hardware features are fused in one system. This raises Accelerator NIC Accelerator NIC Accelerator NIC the question how to design hardware that couples different Accelerator NIC Accelerator NIC Accelerator NIC hardware architectures in a system so it is modular, secure, and provides isolation which is configurable by software. The combination of different isolation mechanisms requires Figure 1: Architectural scheme for racks in a datacen- the OS to control these and mediate resources of different ter: A rack is equipped with two different board types: architectures. CPU boards and accelerator boards. Each board is con- In this work we present microkernel-based OS architecture nected with an intra-rack interconnect resembled by for a new hardware paradigm in large datacenter. We cover the blue boxes on the left. The unconstrained choice the hardware landscape and the resulting aspects for the OS between board types provides flexibility in this de- with an emphasis on microkernels. We give an outline how to sign. combine different isolation and communication mechanisms in an OS and the implications on the capability system. like Popcorn Linux [6] and M3 [5] designed to work without cache-coherence. The design of the Barrelfish multikernel [7] 2 DATACENTER ARCHITECTURE aims at a shared-nothing architecture, while its implementa- A modular and extensible hardware architecture is crucial tion still uses cache-coherence protocols to implement mes- for datacenters to facilitate fast exchange of components or saging. So far the requirements for different system types quick launches of new hardware. In the following we present were following from the integration of CPUs and memory, a hardware architecture of a rack derived from recent efforts but there are also accelerators which have been considered of hardware vendors like Dell and datacenter providers in- in OS designs like Omnix [45] and M3. They strive to elevate cluding Facebook, Microsoft and Amazon [16, 23, 36, 48]. We accelerators to first-class citizens in the OS design, giving further explain our view on how a suitable OS for such a them direct access to OS services as other systems also tried system can be built. for selected types of accelerators [46, 47]. Finally, an OS design like LegoOS [43], considering disaggregated hardware resources, splits the OS into various components for each 2.1 Hardware Architecture resource type. We base our architecture on the trend that resources are be- In our opinion each of these OS designs solves a specific coming more numerous and accelerators gain importance [3, requirement, but in a flexible rack-scale datacenter design 13, 24, 28, 55] and also interconnects advance and enable as we envision it, multiple requirements exist in parallel like tighter coupling of servers [23, 48]. Figure 2 depicts a snip- architectural heterogeneity and the heterogeneity of com- pet of a rack. We choose a machine to span a whole rack, munication mechanisms. Hence, we propose to pick features however, the principles of our approach are also applicable to of the individual OS designs and combine them into one machines of smaller extent, for example a chassis comprising heterogeneous rack-scale OS. several boards. Besides the goals to improve flexibility, system perfor- All boards of a rack are interconnected with a fast fab- mance and integrate accelerators into the system, security ric which enables close collaboration between them. The is essential for datacenters. A cloud datacenter is constantly interconnect provides two main features: 1) configurable exposed via the network and security breaches due to er- communication channels for message passing and 2) a con- roneous software raise serious threats to a cloud vendor’s figurable access to memory ranges. This can be implemented business. Microkernel-based OS designs suit the increased by an interconnect like Gen-Z [15], CXL [1] as has been security requirements due to their small trusted computing showcased by Dell EMC recently [32]. Similar to these inter- base (TCB), clear encapsulation and least-privilege access connects M3 [5] introduced a component called data transfer policy. The small code base of the microkernel makes it unit (DTU) providing a configurable high-bandwidth inter- possible to verify the kernel [33]. Encapsulation and a least- connect. The interconnect between the boards is configured privilege access policy enable failure containment limiting by a central privileged entity labeled with rack kernel in A Heterogeneous Microkernel OS for Rack-Scale Systems APSys ’20, August 24–25, 2020, Tsukuba, Japan C DRA DRA are equipped with memory which is also accessible from other boards (via the board DTU). The system could also be extended with a memory or storage board type to in- DRAM DRAM U tegrate NVM and disks into the system.

Load more