On the Construction of Dynamic and Adaptive Operating Systems

DISS. ETH NO. 24811 On the Construction of Dynamic and Adaptive Operating Systems A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Zurich) presented by Gerd Zellweger Master of Science ETH in Computer Science, ETH Zurich born on 26.06.1987 citizen of Switzerland accepted on the recommendation of Prof. Dr. Timothy Roscoe (ETH Zurich), examiner Prof. Dr. Gustavo Alonso (ETH Zurich), co-examiner Prof. Dr. Jonathan Appavoo (Boston University), co-examiner 2017 Abstract Trends in hardware development indicate that computer architectures will go through considerable changes in the near future. One reason for this is the end of Moore’s Law. It implies that CPUs can no longer just become faster or made more complex due to having more and smaller transistors in a new chip. Instead, applications either have to use multiple cores or specialized hardware to achieve performance gains. Another reason is the end of Dennard scaling which means that as transistors get smaller, the consumed power per chip area does no longer remain constant. The implications are that large areas of a chip have to be powered down most of the time and system software has to dynamically enable and disable the hardware that applications want to use. Operating systems as of today were not designed for such a future, and rather assume homogeneous hardware that remains in the same static configuration during its runtime. In this dissertation, we study how operating systems must change to handle dynamic hardware, where cores and other devices in the system can power on and off individually. Furthermore, we examine how operating systems can be made more adaptive to suit the needs of applications better or specialize themselves for the underlying hardware. First, we present Barrelfish/DC, which decouples physical cores from the OS, and the OS itself from application state. The resulting design allows the OS to treat all available cores in the system as fully dynamic. Next, we present Badis, an OS architecture that enables applications to execute on top of adapted OS kernels and services. We show that the flexibility to run specialized kernels helps applications by optimizing the OS for the workload requirements, but also allows the OS to optimize for the complexity and heterogeneity present in modern and future machines. Finally, we propose a mechanism that promotes virtual address spaces to first-class citizens, thus enabling a process to attach to, detach from, and switch between multiple virtual address spaces. The system enables applications to quickly and dynamically change their logical view of the memory system, and we show how this can increase performance considerably. Zusammenfassung Trends der Hardwareentwicklung zeigen, dass Computerarchitekturen in naher Zukunft beachtliche Veränderungen erfahren werden. Ein Grund dafür ist das Ende von Moore’s Law. Es bedeutet dass zukünftige CPUs nicht mehr einfach schneller oder komplexer werden indem man sich darauf verlässt immer kleinere und deswegen mehr Transistoren zur Verfügung zu haben. Stattdessen müssen Programme mehrere CPUs oder spezial- isierte Hardware verwenden um ihre Leistung zu verbessern. Ein weiterer Grund ist das Ende der Dennard Skalierung, was bedeutet dass mit kleiner werdenden Transistoren der Stromverbrauch nicht mehr nur von der eingenommenen Chipfläche abhängt. Die Auswirkung davon ist dass in Zukunft grosse Flächen eines Chips ausgeschaltet sein müssen, und die Systemsoftware dynamisch jene Hardware welche eine Applikation verwenden will einschalten muss. Heutige Betriebssysteme wurden entwickelt für homogene Hardware welche während ihrer Laufzeit in der gleichen, statischen Konfiguration bleibt. In dieser Dissertation studieren wir wie Betriebssysteme sich verändern müssen um mit dynamischer Hardware, in welcher CPUs oder andere Geräte individuell an und ausgeschaltet werden, umzugehen. Desweiteren analysieren wir wie Betriebssysteme anpassungsfähiger entwickelt werden können, um besser den Anforderungen von Applikationen gerecht zu werden oder um sich besser an die verwendete Hardware anzupassen. Zuerst präsentieren wir Barrelfish/DC, ein System welches physische CPUs von dem OS und das OS selber von dem Applikationsstate entkoppelt. Das resultierende Design erlaubt es dem OS alle CPU-Cores eines Systems als dynamisch zu behandeln. Als nächstes präsentieren wir Badis, eine OS Architektur welche es Applikationen erlaubt unter angepassten OS Kernels und Systemdiensten zu laufen. Wir zeigen wie diese Flexibilität hilft um die Anforderungen von Applikationen besser zu erfüllen, aber es zugleich auch erlaubt das OS zu optimieren für komplexe und heterogenen Hardware. Schlussendlich stellen wir einen Mechanismus vor welcher virtuelle Adressräume als Objekte erster Klasse behandelt und es Prozessen erlaubt diese zu erstellen, zu teilen, und zwischen ihnen zu wechseln. Der Mechanismus erlaubt Applikationen schnell und dynamisch ihre Sichtweise auf den verfügbaren Speicher zu wechseln, was wiederum verschiedene Leistungssteigerungen ermöglicht. Acknowledgments The work presented in this dissertation was shaped, created and described through collaboration and interactions with many wonderful and exceptional friends and colleagues. First of all, I would like to express my gratitude to my advisor Timothy Roscoe for always believing in me, supporting me and mentoring me during my masters and doctoral studies. Your optimism and joyful approach to everything you do make it truly a pleasure to work with you. Likewise, I thank Gustavo Alonso for being a fantastic and encouraging co-advisor that always gave insightful advice on my research and this dissertation. Finally, thanks to you Jonathan for taking part in my committee and all the reassuring feedback that you have given me. During my studies and internships, I had the opportunity to collaborate with many incredibly smart and dedicated colleagues who directly and indirectly contributed to this dissertation: Adrian Schüpbach, Alexander Merritt, Besmira Nushi, Dejan Milojicic, Denny Lin, Gabriel Kliot, Izzat El Hajj, Jana Giceva, Kornilios Kourtis, Paolo Faraboschi, Reto Achermann, Simon Gerber and Wen-mei Hwu. A special thanks also goes to all the friends I have gained during my time in the Systems Group: Akhi, Andreas, Anja, Besmira, Claude, Darko, Georgios, Gitalee, Ingo, Lefteris, Lukas, Marco, Moritz, Nina, Pratanu, Pravin, Reto, Renato, Roni, Simon and Stefan. Our ski-trips, vacations, and events are unforgettable and made the last five years a pleasure to be part of the group. Finally, for all great things I have been able to do in life, I thank my parents Irène and Max and my brother Urs. They have been supportive like no one else and always encouraged me to do what I enjoy. Contents 1 Introduction 1 1.1 Problem statement . .3 1.2 Contributions . .4 1.3 Background: Barrelfish . .5 1.3.1 CPU Driver and Monitor . .6 1.3.2 Capabilities . .7 1.3.3 Scheduling . .8 1.3.4 Device management . .9 1.4 Evaluation methodology . .9 1.5 Overview . 10 2 Decoupling Cores, Kernels and Operating Systems 11 2.1 Motivation . 12 2.1.1 Hardware . 13 2.1.2 Software . 14 2.2 Related work . 15 2.2.1 CPU Hotplug . 15 2.2.2 Kernel updates . 16 2.2.3 Multikernels . 18 2.2.4 Virtualization . 19 2.3 Design and Implementation . 19 Contents 2.3.1 Booting a new core . 20 2.3.2 Per-core state . 21 2.3.3 Capabilities in Barrelfish/DC . 23 2.3.4 Kernel Control Blocks . 24 2.3.5 Replacing a kernel . 25 2.3.6 Kernel sharing and core shutdown . 26 2.3.7 Dealing with time . 26 2.3.8 Dealing with interrupts . 27 2.3.9 Application support . 28 2.3.10 Discussion . 29 2.4 Evaluation . 30 2.4.1 Core management operations . 31 2.4.2 Applications . 35 2.4.2.1 Ethernet driver . 35 2.4.2.2 Web server . 36 2.4.2.3 PostgreSQL . 36 2.5 Concluding remarks . 39 3 A Framework for an Adaptive OS Architecture 41 3.1 Motivation . 42 3.1.1 Use-case: Coordinated parallel data processing . 42 3.1.2 Use-case: Eliminating OS noise . 44 3.2 Related work . 46 3.2.1 Operating System customization . 46 3.2.2 High-performance computing . 47 3.2.3 Scheduling parallel workloads . 48 3.2.4 OS abstractions for parallel execution . 49 3.3 Customization Goals . 50 3.3.1 Run-to-completion execution . 50 viii Contents 3.3.2 Co-scheduling . 52 3.3.3 Spatial isolation of tasks and threads . 52 3.3.4 OS interfaces . 53 3.3.5 Data aware task placement . 54 3.4 Design and Implementation . 54 3.4.1 Control plane . 55 3.4.2 Compute plane . 57 3.4.3 Discussion . 59 3.5 Basslet: A kernel based, task-parallel runtime system . 59 3.5.1 Task-parallel compute plane kernel . 60 3.5.2 Compute plane configuration . 61 3.5.3 Basslet runtime libraries . 62 3.5.3.1 Porting pthreads to Basslet . 63 3.5.3.2 Porting OpenMP to Basslet . 63 3.5.4 Basslet code size . 65 3.6 bfrt: A real-time OS kernel . 65 3.7 Evaluation . 65 3.7.1 Basslet runtime . 66 3.7.1.1 Interference between a pair of parallel jobs . 66 3.7.1.2 System throughput scale-out . 68 3.7.1.3 Standalone runtime comparison . 70 3.7.2 Performance isolation with bfrt . 71 3.7.3 Badis OS architecture . 73 3.7.3.1 Control plane applications . 73 3.7.3.2 Overhead of Badis enqueuing . 74 3.8 Concluding remarks . 75 ix Contents 4 Using Multiple Address Spaces in Applications 77 4.1 Motivation . 79 4.1.1 Memory technology . 79 4.1.2 Preserving pointer-based data structures . 80 4.1.3 Large-scale sharing of memory . 80 4.1.4 Problems with legacy methods . 80 4.2 Related work . 82 4.2.1 Operating systems . 82 4.2.2 Memory management . 83 4.2.3 Communication and Sharing .

On the Construction of Dynamic and Adaptive Operating Systems

Definition Process Scheduling Queues

Intra-Unikernel Isolation with Intel Memory Protection Keys

A Linux in Unikernel Clothing Lupine

Unikernel Monitors: Extending Minimalism Outside of the Box

Assessing Unikernel Security April 2, 2019 – Version 1.0

“A Linux in Unikernel Clothing”

Scheduling Algorithm for Grid Computing Using Shortest Job First with Time Quantum

Comprehensive Examinations in Computer Science 1872 - 1878

Jails and Unikernels

State Spill in Modern Operating Systems

Organisasi Komputer

In the Context of the CS3551 Spire Class Project