Cocoon: Custom-Fitted Kernel Compiled on Demand

Cocoon: Custom-Fitted Kernel Compiled on Demand Bernhard Heinloth, Marco Ammon, Dustin T. Nguyen, Timo Hönig, Volkmar Sieh, and Wolfgang Schröder-Preikschat Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) {heinloth,marco.ammon,nguyen,thoenig,sieh,wosch}@cs.fau.de Abstract Source Code src int f(int n) { As computer processors and their hardware designs contin- if (n <= 1) return n; multiple specialized else return f(n-1)+ f(n-2); uously evolve, operating systems provide many different versions of a function } assembly-level implementations for the same functionality. Analysis This enables support for new platforms and ensures back- Intermediate Code IR %2 = alloca i32, align 4 ; %3 = alloca i32, align 4 ; ward compatibility for older ones at the same time. However, store i32 %0, i32* Optimization %3, align 4 ; %4 = load i32, i32* the source code of operating systems grows more complex Synthesis Compiler Application Application and becomes much harder to maintain. Nonspecialized bin 1000001001100110 Operating System In this paper we explore ways to build made-to-measure 1100101111001000 1101111001001001 Machine Code 1010001010000110 1101100011110100 with self-modifying code 0000110001010000 system software by relegating work to the compiler which 1011011110010110 has necessary knowledge about the system at hand. We pro- Distributor Hardware pose Cocoon, an approach for compiling a system-tailored and -optimized kernel at boot time. For two operating sys- Figure 1. The status quo of today’s OS distribution is ship- tems (i.e., Linux and FreeBSD) we demonstrate the soundness ping a generic binary kernel built by the distributor, while of the approach by hands of a prototypical implementation. the system tailoring might be performed by code patching The implementation shows various aspects of Cocoon, such during run time. as the ability to remove hard-to-maintain code while pre- serving and even increasing the system performance. out the efficient use of target-system–specific build-time 1 Introduction optimizations which, for example, exploit extensions to the To efficiently exploit the achievements of new hardware instruction set architecture (ISA). designs (i.e., performance improvements), made-to-measure OS distributors provide kernels that are generic for a spe- system software [5, 11] is a must. But as complexity grows cific instruction set architecture. These kernels runona [16], it becomes increasingly difficult to provide hand-crafted broad set of machines and provide support for a wide range of software routines that leverage hardware offerings in the different processors and peripheral devices. The decisive dis- most efficient manner. Instead, system software (i.e., operat- advantage is that system-specific adjustments (i.e., the use of ing-system kernels) must be adaptable [13] and made-to- available processor extensions and selection of drivers) are re- measure with a broad set of tools—in particular, with support located to run time. Therefore, distributors have to simultane- from the compiler [10]. However, it is a chicken-and-egg ously deal with two contrary issues: On the one hand, adding problem: Compilers require an operating system (OS) to functionality to support all possible peripheral devices (thus build a custom-fitted OS kernel. Hence, general-purpose bloating the kernel), on the other hand, finding the lowest OS kernels are commonly prebuilt by the distributor and common denominator of the ISA to achieve machine-code shipped in binary format, without tailoring to an individual compatibility—relinquishing all improvements potentially system, as shown in Figure 1. This approach, however, rules offered by processor extensions. Today’s operating systems like Linux work around both issues at run time: Support for Permission to make digital or hard copies of all or part of this work for loadable modules reduces the memory footprint by loading personal or classroom use is granted without fee provided that copies only the required drivers, while self-modification selects the are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights functions most suitable to the available ISA extension(s). For for components of this work owned by others than the author(s) must example, Linux provides no less than three different variants be honored. Abstracting with credit is permitted. To copy otherwise, or of the memory-copy function memcpy on the x86_64 architec- republish, to post on servers or to redistribute to lists, requires prior specific ture [17, arch/x86/lib/memcpy_64.S]. At run time, the OS permission and/or a fee. Request permissions from [email protected]. kernel chooses from the available memcpy implementations PLOS’19, October 27, 2019, Huntsville, ON, Canada in order to efficiently execute memory-copying operations © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. on different x86_64 CPUs. ACM ISBN 978-1-4503-7017-2/19/10...$15.00 The practices mentioned above indicate several issues https://doi.org/10.1145/3365137.3365398 with the current state of the art. Firstly, OS kernels currently PLOS’19, October 27, 2019, Huntsville, ON, Canada Heinloth et al. Source Code src int f(int n) only sufficient, but also reduces the kernel build timeas { if (n <= 1) return n; generic, high- else return Application Application f(n-1)+ f(n-2); the compiler-frontend’s tasks and most optimizations have level functions } Analysis Operating System already been performed. The contributions of this paper are threefold: Firstly, we Intermediate Code IR %2 = alloca i32, align 4 ; %3 = alloca i32, Compiler present Cocoon, an approach aiming for target optimization align 4 ; store i32 %0, i32* Optimization %3, align 4 ; %4 = load i32, i32* by custom-fitting kernels, compiled on demand. Secondly, Hardware Distributor we present the implementation of Cocoon as an extensible framework which creates the necessary bootable build envi- Figure 2. When compiling the OS’s intermediate- ronment and utilizes this environment in conjunction with representation code during run time, the kernel can fully two open-source operating systems (i.e., Linux and FreeBSD). utilize the features of the target hardware. Thirdly, we evaluate the overall approach in two different scenarios: (1) we analyze and compare the start-up time of kernel images built by Cocoon and (2) we analyze and com- include hand-crafted, manually optimized program code for pare the performance of memory-copying functions Cocoon providing different implementations of various functions. achieves in comparison to stock OS kernels. This bloats the source code and requires very high mainte- To encourage additional research on this topic, we pub- nance efforts. Secondly, the segregation of drivers in separate lished our tool as open-source software1. compilation modules as well as the integration of assem- The paper is structured as follows: Section 2 discusses bly code in high-level programming-language source code background information and related work. Section 3 presents act as a barrier for compiler analysis. Thus, optimizations a high-level overview of the Cocoon approach, while im- leveraging additional context knowledge cannot be applied. plementation details are shown accordingly in Section 4. In Both maintenance efforts and optimization barriers can be Section 5, we evaluate the current prototype of Cocoon in mitigated by providing generic high-level implementations real-world scenarios. Section 6 concludes the paper. while relocating system-specific adjustments exclusively to the compiler. This leads to a drastic change where generic 2 Background OS kernels are transformed to highly system-specific ones. Even though target-specific compilation of software is al- By following this path, however, OS distributors would have ready quite common in some domains of computer science, to provide an OS kernel for each individual system config- most of the research focuses on user space applications. Tar- uration instead of one kernel per supported ISA. As this is get optimization in user space is either obtained by manually infeasible due to the large amount of different ISA-peripheral compiling all required software, or by using software based configurations—and hence kernels—this paper presents aso- on the principle of just-in-time (JIT) compilation. However, lution to the problem. those concepts can also be adopted and used to enhance the Instead of using pre-compiled OS kernels, we propose performance of an OS kernel. Cocoon, a custom-fitted kernel compiled on demand approach Hunt and Larus developed the Singularity operating for operating-system kernels. Our approach exploits that system written in a variant of the C# language, which is well compiling an OS kernel on target systems themselves is only suited for JIT compilation [7]. Even though they focused on a matter of seconds (cf. Section 5.2) with today’s technology building an operating system in a type- and memory-safe and that state-of-the-art compilers are capable of building language, their approach can also be leveraged for target highly optimized kernel binaries (cf. Section 5.3). Compiling optimization. Their research group developed isolated ker- OS kernels on the target system yields necessary insight with nel components which are translated from an IR to machine regards to available hardware and allows the kernel to adapt code at the time of installation, when all hardware features data

Load more