Scalability of Microkernel-Based Systems

Scalability of Microkernel-Based Systems Zur Erlangung des akademischen Grades eines DOKTORS DER INGENIERWISSENSCHAFTEN von der Fakultat¨ fur¨ Informatik der Universitat¨ Fridericiana zu Karlsruhe (TH) genehmigte DISSERTATION von Volkmar Uhlig aus Dresden Tag der mundlichen¨ Prufung:¨ 30.05.2005 Hauptreferent: Prof. Dr. rer. nat. Gerhard Goos Universitat¨ Fridericiana zu Karlsruhe (TH) Korreferent: Prof. Dr. sc. tech. (ETH) Gernot Heiser University of New South Wales, Sydney, Australia Karlsruhe: 15.06.2005 i Abstract Microkernel-based systems divide the operating system functionality into individual and isolated components. The system components are subject to application- class protection and isolation. This structuring method has a number of benefits, such as fault isolation between system components, safe extensibility, co-existence of different policies, and isolation between mutually distrusting components. How- ever, such strict isolation limits the information flow between subsystems including information that is essential for performance and scalability in multiprocessor systems. Semantically richer kernel abstractions scale at the cost of generality and mini- mality–two desired properties of a microkernel. I propose an architecture that al- lows for dynamic adjustment of scalability-relevant parameters in a general, flex- ible, and safe manner. I introduce isolation boundaries for microkernel resources and the system processors. The boundaries are controlled at user-level. Operating system components and applications can transform their semantic information into three basic parameters relevant for scalability: the involved processors (depending on their relation and interconnect), degree of concurrency, and groups of resources. I developed a set of mechanisms that allow a kernel to: 1. efficiently track processors on a per-resource basis with support for very large number of processors, 2. dynamically and safely adjust lock primitives at runtime, including full de- activation of kernel locks in the case of no concurrency, 3. dynamically and safely adjust locking granularity at runtime, 4. provide a scalable translation-look-aside buffer (TLB) coherency algorithm that uses versions to minimize interprocessor interference for concurrent memory resource re-allocations, and 5. efficiently track and communicate resource usage in a component-based operating system. Based on my architecture, it is possible to efficiently co-host multiple isolated, independent, and loosely coupled systems on larger multiprocessor systems, and also to fine-tune individual subsystems of a system that have different and poten- tially conflicting scalability and performance requirements. I describe the application of my techniques to a real system: L4Ka::Pistachio, the latest variant of an L4 microkernel. L4Ka::Pistachio is used in a variety of research and industry projects. Introducing a new dimension to a system — paral- lelism of multiprocessors — naturally introduces new complexity and overheads. I evaluate my solutions by comparing with the most challenging competitor: the uniprocessor variant of the very same and highly optimized microkernel. ii iii Zusammenfassung Mikrokernbasierte Systeme teilen die Betriebssystemfunktionalitat¨ in unabhan-¨ gige und isolierte Komponenten auf. Die Systemkomponenten unterliegen dabei denselben Isolations- und Schutzbedingungen wie normale Nutzeranwendungen. Solch eine Systemstruktur hat eine Vielzahl von Vorteilen, wie zum Beispiel Fehlerisolation zwischen Systemkomponenten, sichere Erweiterbarkeit, die Ko- existenz mehrerer unterschiedlicher Systemrichtlinien und die strikte Isolation zwischen Komponenten, die sich gegenseitig mißtrauen. Gleichzeitig erzeugt strikte Isolation auch Barrieren fur¨ den Informationsfluß zwischen den individu- ellen Subsystemen; diese Informationen sind essentiell fur¨ die Performanz und die Skalierbarkeit in Multiprozessorsystemen. Kernabstraktionen mit semantisch hoherem¨ Gehalt skalieren auf Kosten der Allgemeinheit und der Minimalitat,¨ zwei erwunschte¨ Eigenschaften von Mikro- kernen. In dieser Arbeit wird eine Architektur vorgestellt, die es erlaubt, die fur¨ die Skalierbarkeit relevanten Parameter generisch, flexibel, und sicher dynamisch anzupassen. Es werden Isolationsschranken fur¨ die Mikrokernresourcen und Sys- temprozessoren eingefuhrt,¨ welche unter der Kontrolle von Nutzerapplikationen stehen. Die Betriebssystemkomponenten und Anwendungen konnen¨ das ihnen zur Verfugung¨ stehende semantische Wissen in die folgenden drei skalierbarkeitsre- levanten Basisparameter umwandeln: die involvierten Prozessoren (abhangig¨ von den Prozessorbeziehungen und dem Speichersubsystem), den Grad der Parallelitat¨ und Ressourcengruppierungen. Es wurden die folgenden Methoden und Mechanismen entwickelt: 1. eine effiziente Methode zur Speicherung und Auswertung der zu einer Res- source zugehorigen¨ und relevanten Prozessoren, 2. ein dynamisches und sicheres Synchronisationsprimitiv, welches zur Laufzeit angepaßt werden kann (dies beinhaltet die vollstandige¨ Deak- tivierung von Kernsperren fur¨ den Fall, daß keine Parallelitat¨ vorhanden ist), 3. die dynamische und sichere Anpassung der Granularitat¨ von Sperren zur Laufzeit, 4. ein skalierbarer translation-look-aside buffer (TLB) Koharenzalgorithmus,¨ der zur Vermeidung von wechselseitigen Beeinflussungen von Prozessoren ein Versionsschema nutzt, sowie 5. ein Mechanismus zur effizienten Ermittlung und Weiterleitung von Ressour- cennutzungsinformationen in einem komponentenbasierten Betriebssystem. Basierend auf dieser Architektur ist es moglich,¨ mehrere unabhangige,¨ isolierte, und lose verbundene Systeme gleichzeitig auf einem Mehrprozessorsys- tem zu betreiben. Desweiteren ermoglicht¨ die Architektur, einzelne Subsysteme iv individuell feinabzustimmen. Dies ist selbst dann moglich,¨ wenn Subsysteme un- terschiedliche oder sogar widerspruchliche¨ Skalierbarkeits- und Performancean- forderungen haben. Die Techniken werden an einem real existenten System exemplarisch evaluiert: dem L4Ka::Pistachio Mikrokern, der die neueste L4-Version darstellt. L4Ka::Pis- tachio wird aktiv in einer Reihe von Forschungs- und Industrieprojekten eingesetzt. Die Einfuhrung¨ der neuen Dimension Parallelitat¨ von Prozessoren erhoht¨ sowohl die Komplexitat¨ als auch die Kosten. Die Effizienz der vorgestellten Losungen¨ werden daher an dem großten¨ Konkurrenten gemessen: der Uniprozessorvariante desselben hochoptimierten Mikrokerns. v Acknowledgements First, I would like to thank my supervisor, Prof. Gerhard Goos. After the loss of Jochen Liedtke, he not only ensured the survival of our group but rather its blos- soming. It took me a while to realize the wisdom behind many of his demanding requests and I am extremely thankful for his patience and guidance over the last few years. I am particularly thankful for many of his advices in areas that were not related to this work or even computer science. Jochen Liedtke, who passed away in 2001, was most influential on my interest in research in general and operating systems in particular. I am grateful to him in many ways: he was a great researcher, excellent teacher, mentor, and friend. This work would not exist without his groundbreaking achievements, the astonishing intellect he had, and his ability to help all of the people around him thrive. I want to specifically thank Joshua LeVasseur, who is a good friend and col- league. With him, it was not only possible to achieve excellent research results, but also to have a life in Karlsruhe. I hope we will continue to work together over the coming years. Thanks also to Susanne for her endless supply of excellent food and many fun hours. Thanks to Jan Stoß¨ for his efforts in our joined projects, including but not limited to multiprocessor L4. Andreas Haeberlen and Curt Cramer have both been influential in many ways; I enjoyed the constant technical and political discussions. I still owe Andy a two-lettered rubber stamp. I had endless hours of brainstorming and discussions with Lars Reuther and many of my research ideas (including those in this thesis) were born with him on the other end of the phone line. I am thankful to the people of the System Research Lab of Intel MTL, in particular Sebastian Schonberg¨ and Rich Uhlig. It was a wonderful time working in such an inspiring environment and getting deep insights into the hardware architecture. Rich is an excellent manager and good friend and was supportive in many ways. Intel supported this research with a generous research grant, which financed the required multiprocessor hardware and also supported me during the last three months of my thesis. I really enjoyed the productive and intellectually challenging work with Sebastian, but also our regular Wednesday evenings at Imbri Hall. I thank Gernot Heiser for his support, for reading initial drafts of this thesis, and coming over from Australia for the defense. Although we have had many flame wars on internal and external mailing lists, it is always a joy to work with him. I also want to thank Gernot for offering me a PhD position in Sydney when things got rough. I want to thank Julie Fast for editing the final version of this thesis, and also for her support and encouragement. I also have to thank Monika Schlechte, my sister, and my parents for their constant support on this interesting journey. Monika was probably most influential on the directions of my life including starting a research career. Jan Langert and Michaela Friedrich were of great support during the

Scalability of Microkernel-Based Systems

Automatic Sandboxing of Unsafe Software Components in High Level Languages

A Programmable Microkernel for Real-Time Systems∗

A Microkernel API for Fine-Grained Decomposition

A Practical UNIX Capability System

The Design of the EMPS Multiprocessor Executive for Distributed Computing

The Design and Implementation of Multiprocessor Support for an Industrial Operating System Kernel

Chapter 1. Origins of Mac OS X

Improving Learning of Programming Through E-Learning by Using Asynchronous Virtual Pair Programming

The Joe-E Language Specification (Draft)

Filesystem Performance on Freebsd

Access Control and Operating System

Research Purpose Operating Systems – a Wide Survey