L4 Microkernels: the Lessons from 20 Years of Research and Deployment

1 L4 Microkernels: The Lessons from 20 Years of Research and Deployment GERNOT HEISER and KEVIN ELPHINSTONE, NICTA and UNSW, Sydney, Australia The L4 microkernel has undergone 20 years of use and evolution. It has an active user and developer community, and there are commercial versions that are deployed on a large scale and in safety-critical systems. In this article we examine the lessons learnt in those 20 years about microkernel design and implementation. We revisit the L4 design articles and examine the evolution of design and implementation from the original L4 to the latest generation of L4 kernels. We specifically look at seL4, which has pushed the L4 model furthest and was the first OS kernel to undergo a complete formal verification of its implementation as well as a sound analysis of worst-case execution times. We demonstrate that while much has changed, the fundamental principles of minimality, generality, and high inter-process communication (IPC) performance remain the main drivers of design and implementation decisions. Categories and Subject Descriptors: D.4.7 [Operating Systems]: Organization and Design General Terms: Design, Performance, Security, Verification Additional Key Words and Phrases: Microkernels, L4, seL4, IPC, message passing, minimality, performance, formal verification, real time, worst-case execution time, virtualization ACM Reference Format: Gernot Heiser and Kevin Elphinstone. 2016. L4 microkernels: The lessons from 20 years of research and deployment. ACM Trans. Comput. Syst. 34, 1, Article 1 (April 2016), 29 pages. DOI: http://dx.doi.org/10.1145/2893177 1. INTRODUCTION Twenty years ago, Liedtke [1993a] demonstrated with his L4 kernel that microkernel IPC could be fast, a factor 10–20 faster than contemporary microkernels. This was the start of a journey that created a whole family of microkernels and led to large-scale commercial deployment as well as to systems of unprecedented assurance. Microkernels minimize the functionality that is provided by the kernel: The kernel provides a set of general mechanisms, while user-mode servers implement the actual operating system (OS) services [Brinch Hansen 1970; Levin et al. 1975]. Application code obtains a system service by communicating with servers via an interprocess communication (IPC) mechanism, typically message passing. Hence, IPC is on the critical path of any service invocation, and low IPC costs are essential. By the early 1990s, IPC performance had become the achilles heel of microkernels: The typical cost for a one-way message was around 100 μs, which was too high for building performant systems. This resulted in a trend to move core services back into NICTA is funded by the Australian Government as represented by the Department of Broadband, Communi- cations and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. Authors’ addresses: G. Heiser and K. Elphinstone, School of Computer Science & Engineering, UNSW Sydney NSW 2052, Australia; Correspondence email: {gernot, k.elphinstone}@unsw.edu.au. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2016 ACM 0734-2071/2016/04-ART1 $15.00 DOI: http://dx.doi.org/10.1145/2893177 ACM Transactions on Computer Systems, Vol. 34, No. 1, Article 1, Publication date: April 2016. 1:2 G. Heiser and K. Elphinstone the kernel [Condict et al. 1994]. There were also arguments that high IPC costs were an (inherent?) consequence of the structure of microkernel-based systems [Chen and Bershad 1993]. In this context, the order-of-magnitude improvement of IPC costs Liedtke demonstrated was quite remarkable. It was followed by work discussing the philosophy and mechanisms of L4 [Liedtke 1995, 1996], the demonstration of a paravirtualised Linux on L4 with only a few percent overhead [Hartig¨ et al. 1997], the deployment of L4 kernels on billions of mobile devices and in safety-critical systems; and, finally, the comprehensive formal verification, including a functional correctness proof of the implementation and a complete proof chain of high-level security properties down to the executable binary [Klein et al. 2014]. L4 also had a strong influence on other research systems, such as Pebble [Gabber et al. 1999], EROS [Shapiro et al. 1999], and Barrelfish [Baumann et al. 2009]. In this article we examine the development of L4 since the mid-1990s. Specifically, we look at what makes modern L4 kernels tick, how this relates to Liedtke’s original design and implementation rules, and which of his microkernel “essentials” have passed the test of time. We specifically examine how the lessons of the past have influenced the design of the latest generation of L4 microkernels, exemplified by seL4 [Klein et al. 2009], but point out where other current L4 versions have made different design decisions. 2. BACKGROUND 2.1. The L4 Microkernel Family L4 evolved from an earlier system, called L3, developed at GMD by Liedtke [1993b] in the mid-1980s on i386 platforms. L3 was a complete OS with built-in persistence, and it already featured user-mode drivers [Liedtke et al. 1991], still a characteristic of L4 microkernels. It was commercially deployed in a few thousand installations (mainly schools and legal practices). Like all “microkernels” at the time, L3 suffered from IPC costs on the order of 100 μs. Liedtke initially used L3 to try out new ideas, and what he referred to as “L3” in early publications [Liedtke 1993a] was actually an interim version of a radical re-design. He first used the name “L4” with the “V2” ABI circulated in the community from 1995. In the following we refer to this version as the “original L4.” Liedtke implemented it completely in assembler on i486-based PCs and soon ported it to the Pentium. This initial work triggered a 20yr evolution, with multiple ABI revisions and from- scratch implementations, as depicted in Figure 1. It started with TU Dresden and UNSW reimplementing the ABI, with necessary adaptations, on 64-bit Alpha and MIPS processors, the latter implemented all longer-running operations in C. Both kernels achieved submicrosecond IPC performance [Liedtke et al. 1997a] and were released as open source. The UNSW Alpha kernel was the first multiprocessor version of L4. Liedtke, who had moved from GMD to IBM Watson, kept experimenting with the ABI in what became known as Version X . GMD and IBM imposed an IP regime which proved too restrictive for other researchers, prompting Dresden to implement a new x86 version from scratch, called Fiasco, in reference to their experience in trying to deal with IP issues. The open-source Fiasco was the first L4 version written almost completely in a higher-level language (C++) and is the oldest L4 codebase still actively maintained. It was the first L4 kernel with significant commercial use (estimated shipments up to 100,000). After Liedtke’s move to Karlsruhe, he and his students did their own from-scratch implementation, Hazelnut, which was written in C and was the first L4 kernel that was ported (rather than reimplemented) to another architecture (from Pentium to ARM). ACM Transactions on Computer Systems, Vol. 34, No. 1, Article 1, Publication date: April 2016. L4 Microkernels: The Lessons from 20 Years of Research and Deployment 1:3 Fig. 1. The L4 family tree (simplified). Black arrows indicate code, green arrows ABI inheritance. Box colours indicate origin as per key at the bottom left. Karlsruhe’s experience with Version X and Hazelnut resulted in a major ABI revision, V4, aimed at improving kernel and application portability and multiprocessor support and addressing various other shortcomings. After Liedtke’s tragic death in 2001, his students implemented the design in a new open-source kernel, L4Ka::Pistachio (“Pistachio” for short). It was implemented in C++ on x86 and PowerPC; at UNSW/NICTA we soon after ported it to MIPS, Alpha, 64-bit PowerPC, and ARM.1 In most of these ports, less than 10% of the code changed. At NICTA we then retargeted Pistachio for use in resource-constrained embedded systems, resulting in a fork called NICTA::Pistachio-embedded (“L4-embedded”). It saw massive-scale commercial deployment when Qualcomm adopted it as a protected-mode real-time OS for the firmware of their wireless modem processors. It is now running on the security processor of all recent Apple iOS devices [Apple Inc 2015]. NICTA spinout Open Kernel Labs (OK Labs) took on the support and further development of this kernel, renaming it the OKL4 microkernel.2 Another deployed version is PikeOS, a commercial V2 clone by German company Sysgo, certified for use in safety-critical avionics and deployed in aircraft and trains.3 The influence of KeyKOS [Hardy 1985] and EROS [Shapiro et al. 1999] and an increased focus on security resulted in the adoption of capabilities [Dennis and Van Horn 1966] for access control, first with the 2.1 release of OKL4 (2008) and soon followed by Fiasco; Fiasco was renamed Fiasco.OC in reference to its use of object capabilities. Aiming for formal verification, which seemed infeasible for a code base not designed for the purpose, we instead opted for a from-scratch implementation for our capability-based seL4 kernel.

Load more