From L3 to Sel4: What Have We Learnt in 20 Years of L4 Microkernels? Kevin Elphinstone, Gernot Heiser NICTA and University of New South Wales

From L3 to seL4: What Have We Learnt in 20 Years of L4 Microkernels? Kevin Elphinstone, Gernot Heiser NICTA and University of New South Wales SOSP'13 1993 Improving IPC by Kernel Design [SOSP] ©2013 Gernot Heiser, NICTA 2 SOSP'13 1993 IPC Performance [µs] 400 Mach i486 @ 50 MHz 300 200 115 µs Culprit: Cache L4 footprint 100 [SOSP’95] 5 µs raw copy 0 0 2000 4000 6000 Message Length [B] ©2013 Gernot Heiser, NICTA 3 SOSP'13 IPC Performance over 20 Years Name Year Processor MHz Cycles µs Original 1993 i486 50 250 5.00 Original 1997 Pentium 160 121 0.75 L4/MIPS 1997 R4700 100 86 0.86 L4/Alpha 1997 21064 433 45 0.10 Hazelnut 2002 Pentium 4 1,400 2,000 1.38 Pistachio 2005 Itanium 1,500 36 0.02 OKL4 2007 XScale 255 400 151 0.64 NOVA 2010 i7 Bloomfield (32-bit) 2,660 288 0.11 seL4 2013 i7 Haswell (32-bit) 3,400 301 0.09 seL4 2013 ARM11 532 188 0.35 seL4 2013 Cortex A9 1,000 316 0.32 ©2013 Gernot Heiser, NICTA 4 SOSP'13 Core Microkernel Principle: Minimality A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e. permitting competing implementations, would prevent the implementation of the system’s required functionality. [SOSP’95] ©2013 Gernot Heiser, NICTA 5 SOSP'13 Minimality: Source Code Size Name Architecture C/C++ asm total kSLOC Original i486 0 6.4 6.4 L4/Alpha Alpha 0 14.2 14.2 L4/MIPS MIPS64 6.0 4.5 10.5 Hazelnut x86 10.0 0.8 10.8 Pistachio x86 22.4 1.4 23.0 L4-embedded ARMv5 7.6 1.4 9.0 OKL4 3.0 ARMv6 15.0 0.0 15.0 Fiasco.OC x86 36.2 1.1 37.6 seL4 ARMv6 9.7 0.5 10.2 ©2013 Gernot Heiser, NICTA 6 SOSP'13 L4 Family Tree API Inheritance Code Inheritance L4-embed. seL4 OKL4 Microvisor L4/MIPS OKL4 µKernel L4/Alpha Codezero L3 → L4 “X” Hazelnut Pistachio Fiasco Fiasco.OC UNSW/NICTA GMD/IBM/Karlsruhe NOVA Dresden OK Labs P4 → PikeOS Commercial Clone 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 ©2013 Gernot Heiser, NICTA 7 SOSP'13 L4 Deployments – in the Billions SiMKo 3 “Merkelphone” ©2013 Gernot Heiser, NICTA 8 SOSP'13 seL4: Unprecedented Dependability Confiden- Availability Integrity tiality Integrity Proof Proof [ITP’11] Non-interference Abstract [S&P’13] Model Functional correctness [SOSP’09] Proof Proof Translation C Imple- • First & only OS kernel correctness mentation with security proofs to [PLDI’13] binary code • First & only protected- Timeliness Proof mode OS kernel with [RTSS’11, Binary sound timeliness analysis EuroSys’12] code ©2013 Gernot Heiser, NICTA 9 SOSP'13 L4 Design and Implementation Implement. Tricks [SOSP’93] Design Decisions [SOSP’95] • Process kernel • Synchronous IPC • Virtual TCB array • Rich message structure, • Lazy scheduling arbitrary out-of-line messages • Direct process switch • Zero-copy register messages • Non-preemptible • User-mode page-fault handlers • Non-portable • Threads as IPC destinations • Non-standard calling • IPC timeouts convention • Hierarchical IPC control • Assembler • User-mode device drivers • Process hierarchy • Recursive address-space construction Objective: Minimise cache footprint and TLB misses ©2013 Gernot Heiser, NICTA 10 SOSP'13 DESIGN ©2013 Gernot Heiser, NICTA 11 SOSP'13 L4 Synchronous IPC Thread1 Thread2 Running Blocked Blocked Running Rendezvous model Send (dest, msg) Wait (src, msg) ….... Kernel copy Kernel executes in sender’s context • copies memory data directly to receiver (single-copy) • leaves message registers unchanged during context switch (zero copy) ©2013 Gernot Heiser, NICTA 12 SOSP'13 “Long” IPC Sender address space Kernel copy Page fault! Receiver address space LONG IPC ABANDONED • IPC page faults are nested exceptions ⇒ In-kernel concurrency – L4 executes with interrupts disabled for performance, no concurrency • Must invoke untrusted usermode page-fault handlers – potential for DOSing other thread Why have long IPC? • Timeouts to avoid DOS attacks • POSIX-style APIs – complexity write (fd, buf, nbytes) • Usually prefer shared buffers ©2013 Gernot Heiser, NICTA 13 SOSP'13 Timeouts Limit IPC Thread1 Thread2 blocking Running Blocked Blocked Running time Send (dest, msg) IPC Timeouts ….... Wait (src, msg) Thread1 Kernel Running BlockedABANDONED copy in seL4, OKL4 Rcv(NIL_THRD, delay) • No theory/heuristics for determining timeouts • Typically server reply ….... Timed with zero TO, else ∞ wait • Can do timed wait with timer syscall ©2013 Gernot Heiser, NICTA 14 SOSP'13 Synchronous IPC Issues Thread1 Running Blocked Worker_Th IO_Th Running Blocked Blocked Running Initiate_IO(…,…) Unblock (IO_Th) ….... Call (IO,msg) IO_Wait(…,…) Not Sync(Worker_Th) generally possible Sync(IO_Th) ….... • Sync IPC forces multi-threaded code! • Also poor choice for multi-core ©2013 Gernot Heiser, NICTA 15 SOSP'13 Asynchronous Notifications Thread1 Thread2 Running Blocked Blocked Running Sync IPC w = Poll (…) complementedasync …... w = Wait (…) with Send (Thr_2, …) ….... ….... • Delivers few bits (destructively) Send (Thr_2, …) • Kernel only buffers single word • Maps well to interrupts, exceptions • Thread can wait for Server Sync Async synchronous and Client Driver asynchronous messages concurrently ©2013 Gernot Heiser, NICTA 16 SOSP'13 IPC Destination Naming RPC reply from wrong thread! Original L4 Client Server addressed IPC to threads Client Server Client Server IPC Load balancer Workers All IPCs duplicated! Thread IDs Client must do load balancing? replaced by IPC “endpoints” (ports)• Inefficient designs • Poor information hiding • Covert channels [Shapiro ‘02] ©2013 Gernot Heiser, NICTA 17 SOSP'13 Endpoints Client Server Client Sync EP Server Rcv IPC Send • Sync EP queues senders/receivers • Does not buffer messages 0x01 Async EP 0x10 0x30 0x000x010x110x31 • Async EP accumulates bits ©2013 Gernot Heiser, NICTA 18 SOSP'13 Other Design Issues IPC Control: “Clans & Chiefs” Process Hierarchy Chief Hierarchical IPC outside clan resource IPC re-directs to chief management Create Clan Hierarchies replaced by cap- delegatable Inflexible, clumsy, based access inefficient hierarchies! control ©2013 Gernot Heiser, NICTA 19 SOSP'13 IMPLEMENTATION ©2013 Gernot Heiser, NICTA 20 SOSP'13 Virtual TCB Array Fast TCB & stack lookup Thread ID VM TCB TCB Trades cache for TLB footprint Trades TLB and virtual address space footprint TCB stack for cache proper Kernel and kernel • Not worthwhile on memory Get own modern processors! TCB base • Stacks can dominate by masking kernel memory use! stack pointer ©2013 Gernot Heiser, NICTA 21 SOSP'13 “Lazy” Scheduling • In IPC-based systems, threads block and unblock frequenty Idea: leave blocked • Many ready queue manipulations threads in ready queue, scheduler cleans up thread_t schedule() { foreach (prio in priorities) { foreach (thread in runQueue[prio]) { Scheduler execution if (isRunnable(thread)) return thread; time is unbounded! else schedDequeue(thread); } } “Benno scheduling”: return idleThread; • All threads on ready queue } are runnable • All runnable threads in ready queue except the running one ©2013 Gernot Heiser, NICTA 22 SOSP'13 L4 Design and Implementation Implement. Tricks [SOSP’93] Design Decisions [SOSP’95] • Process kernel • Synchronous IPC • Virtual TCB array • Rich message structure, • Lazy scheduling arbitrary out-of-line messages • Direct process switch • Zero-copy register messages • Non-preemptible • User-mode page-fault handlers • Non-portable • Threads as IPC destinations • Non-standard calling • IPC timeouts convention • Hierarchical IPC control • Assembler • User-mode device drivers • Process hierarchy • Recursive address-space construction ©2013 Gernot Heiser, NICTA 23 SOSP'13 What are the Principles? • Minimality is excellent driver of design decisions – L4 kernels have become simpler over time – Policy-mechanism separation (user-mode page-fault handlers) – Device drivers really belong to user level – Minimality is key enabler for formal verification! • IPC speed still matters – But not everywhere, premature optimisation is wastive – Compilers have got so much better – Verification does not compromise performance – Verification invariants can help improve speed! [Shi, OOPSLA’13] • Also found that capabilities are the way to go – Shapiro (EROS) was right • However, a clean abstraction of time still elusive ©2013 Gernot Heiser, NICTA 24 SOSP'13 Conclusions • Details changed, but principles remained • Microkernels rock! (If done right!) Thank you! We’re hiring: • Chair in Software Systems • Postdocs / junior faculty ©2013 Gernot Heiser, NICTA 25 SOSP'13 .

From L3 to Sel4: What Have We Learnt in 20 Years of L4 Microkernels? Kevin Elphinstone, Gernot Heiser NICTA and University of New South Wales

Quantifying Security Impact of Operating-System Design

The OKL4 Microvisor: Convergence Point of Microkernels and Hypervisors

Microkernel Vs

Operating Systems & Virtualisation Security Knowledge Area

Login Dec 07 Volume 33

Hype and Virtue

Flexible Task Management for Self-Adaptation of Mixed-Criticality Systems with an Automotive Example

Faults in Linux: Ten Years Later a Case for Reproducible Scientiﬁc Results

An Overview of Minix 3

A Secure Computing Platform for Building Automation Using Microkernel-Based Operating Systems Xiaolong Wang University of South Florida, [email protected]

Systematic Setup of Compartmentalised Systems on the Sel4 Microkernel

Intel SGX Explained