The OKL4 Microvisor: Convergence Point of Microkernels and Hypervisors

The OKL4 Microvisor: Convergence Point of Microkernels and Hypervisors Gernot Heiser, Ben Leslie Open Kernel Labs and NICTA and UNSW Sydney, Australia ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. Microkernels vs Hypervisors > Hypervisors = “microkernels done right?” [Hand et al, HotOS ‘05] • Talks about “liability inversion”, “IPC irrelevance” … > What’s the difference anyway? ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 2 What are Hypervisors? > Hypervisor = “virtual machine monitor” • Designed to multiplex multiple virtual machines on single physical machine VM1 VM2 Apps Apps AppsApps AppsApps OS OS Hypervisor > Invented in ‘60s to time-share with single-user OSes > Re-discovered in ‘00s to work around broken OS resource management ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 3 What are Microkernels? > Designed to minimise kernel code • Remove policy, services, retain mechanisms • Run OS services in user-mode • Software-engineering and dependability reasons • L4: ≈ 10 kLOC, Xen ≈ 100 kLOC, Linux: ≈ 10,000 kLOC ServersServers ServersServers Apps Servers Device AppsApps Drivers Microkernel > IPC performance critical (highly optimised) • Achieved by API simplicity, cache-friendly implementation > Invented 1970 [Brinch Hansen], popularised late ‘80s (Mach, Chorus) ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 4 What’s the Difference? > Both contain all code executing at highest privilege level • Although hypervisor may contain user-mode code as well > Both need to abstract hardware resources • Hypervisor: abstraction closely models hardware • Microkernel: abstraction designed to support wide range of systems > What must be abstracted? • Memory • CPU • I/O • Communication ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 5 What’s the difference? Just page tables in disguise Just Resource Hypervisor Microkernel kernel- Memory Virtual MMU (vMMU) Address space scheduled activities CPU Virtual CPU (vCPU) Thread or scheduler activation Real I/O • Simplified virtual • IPC interface to Difference? device user-mode driver • Driver in hypervisor • Interrupt IPC • Virtual IRQ (vIRQ) Communication Virtual NIC, with High-performance driver and network message-passing IPC stack Minimal overhead, Modelled on HW, Custom API Re-uses SW ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 6 Closer Look at I/O and Communication VM1 Device Driver AppsApps Apps VM OS Driver OS Virtual Device Driver Driver Apps Server Device AppsApps Driver Virtual NW Microkernel IPC Hypervisor > Communication is critical for I/O • Microkernel IPC is highly optimised • Hypervisor inter-VM communication is frequently a bottleneck ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 7 Hypervisors vs Microkernels: Summary > Fundamentally, both provide similar abstractions > Optimised for different use cases • Hypervisor designed for virtual machines • API is hardware-like to ease guest ports • Microkernel designed for multi-server systems • seems to provide more OS-like abstractions ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 8 Hypervisors vs Microkernels: Drawbacks Hypervisors: Microkernels: > Communication is Achilles heel > Not ideal for virtualization • More important than expected • API not very effective • Critical for I/O • L4 virtualization performance close to hypervisor • Plenty attempts to improve in Xen • effort much higher • Virtualization needed for legacy > Most hypervisors have big TCBs > L4 model uses kernel-scheduled • Infeasible to achieve high threads for more than exploiting assurance of security/safety parallelism • In contrast, microkernel • Kernel imposes policy implementations can be proved • Alternatives exist, eg. K42 uses correct scheduler activations Could we get the best of both? ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 9 Best of Both Worlds: The OKL4 Microvisor Microvisor = microkernel hypervisor > VM1 > Microkernel with a hypervisor API: Apps AppsApps • vCPU: scheduled by microvisor on real CPU • Multiple vCPUs per VM for OS running on multiple cores Device • vMMU: looks like a large TLB Virtual DeviceDevice DriverDriver • caches mappings Driver Driver • vIRQ: virtualize interrupts and provide asynchronous inter-VM signals Microvisor Channel > Plus asynchronous channel abstraction • for high-bandwidth, low-latency communication > Small TCB: ≈ 10 kLOC > Suitable for multi-server systems ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 10 But Liedtke Said: “IPC Shall Be Synchronous!” > Early microkernels had poorly-performing semi-synchronous IPC • Asynchronous send, synchronous receive • Requires message buffering in kernel • Each message is copied twice! send (dest,buf) receive (*,buf) Buffer Buffer Copy Copy Kernel Buffer ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 11 But Liedtke Said: “IPC Shall Be Synchronous!” > Early microkernels had poorly-performing semi-synchronous IPC > Liedtke designed L4 IPC fully synchronously to avoid such overheads • Rendezvous model: each operation blocks until partner is ready • No buffering, message copied directly from sender to receiver • Requires kernel-scheduled threads to avoid blocking whole app! send (dest,buf) receive (*,buf) Copy Buffer Buffer Kernel ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 12 OKL4 Microvisor Communication is Asynchronous > In our experience, real-world embedded systems require asynchronous IPC • Maps more naturally to hardware events • In the past, we added asynchronous notification (lightweight signals) to L4 IPC > The OKL4 Microvisor only provides fully asynchronous communication: • vIRQs as asynchronous signals • Channels for asynchronous data transfer • Still single-copy, no kernel buffers! • Synchronised eg. by sending vIRQ Deposit data in read (channel, data) channel buffer Buffer Copy Kernel ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 13 Performance: lmbench Benchmark Native Virtualized Overhead null syscall 0.6 µs 0.96 µs 0.36 µs 180 cy 60 % read 1.14 µs 1.31 µs 0.17 µs 85 cy 15 % stat 4.73 µs 5.05 µs 0.32 µs 160 cy 7 % fstat 1.58 µs 2.24 µs 0.66 µs 330 cy 42 % open/close 9.12 µs 8.23 µs -0.89 µs -445 cy -10 % select(10) 2.62 µs 2.98 µs 0.36 µs 180 cy 14 % sig handler 1.77 µs 2.05 µs 0.28 µs 140 cy 16 % pipe latency 41.56 µs 54.45 µs 12.89 µs 6.4 kcy 31 % UNIX socket 52.76 µs 80.90 µs 28.14 µs 14 kcy 53 % fork 1,106 µs 1,190 µs 84 µs 42 kcy 8 % fork+execve 4,710 µs 4,933 µs 223 µs 112 kcy 5 % system 7,583 µs 7,796 µs 213 µs 107 kcy 3 % Beagle Board (500 MHz ARMv7) ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 14 Performance: Netperf on Loopback Device Type Benchmark Native Virtualized Overhead TCP Throughput 651 [Mib/s] 630 [Mib/s] 3 % CPU load 99 [%] 99 0 % Cost 12.5 [µ/KiB] 12.9 [µs/KiB] 3 % UDP Throughput 537 [Mib/s] 516 [Mib/s] 4 % CPU load 99 99 0 % Cost 15.2 15.8 [µs/KiB] 4 % Beagle Board (500 MHz ARMv7) ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 15 Conclusions > Classical hypervisors and microkernels both have significant drawbacks • Hypervisor inter-VM comms primitives are slow, hurts I/O • Also have huge TCBs • Microkernel APIs not ideal for virtualization > Microvisor combines best of both • Virtualization-oriented API • Microkernel (i.e. small) code base • Fast communication channels for multi-server designs (stand-alone drivers) > OKL microvisor shows excellent virtualization performance > Suitability for multi-server designs still needs to be proven ok-labs.com ©2010 Open Kernel Labs and NICTA. All rights reserved. 16 .

The OKL4 Microvisor: Convergence Point of Microkernels and Hypervisors

UG1046 Ultrafast Embedded Design Methodology Guide

Quantifying Security Impact of Operating-System Design

Microkernel Vs

Sel4: Formal Verification of an Operating-System Kernel

Operating Systems & Virtualisation Security Knowledge Area

Login Dec 07 Volume 33

Operating System Support for Run-Time Security with a Trusted Execution Environment

Hype and Virtue

Partitioned System with Xtratum on Powerpc

Operating System Verification—An Overview

OKL4 Microkernel Reference Manual

Lltzvisor: a Lightweight Trustzone-Assisted Hypervisor for Low-End ARM Devices