Copyright Notice

These slides are distributed under the Creative Commons Attribution 3.0 License ! You are free: • to share — to copy, distribute and transmit the work Virtual Machines • to remix — to adapt the work ! Under the following conditions: • Attribution. You must attribute the work (but not in any way that suggests that the author endorses you or your use of the work) as follows: • “Courtesy of Gernot Heiser, UNSW” COMP9242 2009/S2 Week 6 ! The complete license text can be found at http:// creativecommons.org/licenses/by/3.0/legalcode

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 2

Virtual Machines Virtual Machines, Simulators and Emulators

! “A virtual machine (VM) is an efficient, isolated duplicate of a real machine” Simulator ! Duplicate: VM should behave identically to the real machine !" Provides a functionally accurate software model of a machine • Programs cannot distinguish between execution on real or virtual hardware " "# ""May run on any hardware • Except for: !" Is typically slow (order of 1000 slowdown) ! Fewer resources available (and potentially different between executions) Emulator ! Some timing differences (when dealing with devices) ! Isolated: Several VMs execute without interfering with each other !" Provides a behavioural model of hardware (and possibly S/W) ! Efficient: VM should execute at a speed close to that of real hardware " !" Not fully accurate • Requires that most instruction are executed directly by real hardware "" " Reasonably fast (order of 10 slowdown) Virtual machine !" Models a machine exactly and efficiently "" " Minimal showdown " !" Needs to be run on the physical machine it virtualizes (more or less)

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 3 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 4

Types of Virtual Machines Virtual Machine Monitor (VMM), aka

! Contemporary use of the term VM is more general ! Program that runs on real hardware to implement the virtual machine ! Call virtual machines even if there is nor correspondence to an existing real ! Controls resources machine • Partitions hardware Guest OS • E.g: Java virtual machine • Schedules guests • Can be viewed as virtualizing at the ABI level • Mediates access to shared resources Hypervisor • Also called process VM ! e.g. console OS ! We only concern ourselves with virtualizing at the ISA level • Performs world switch • ISA = instruction-set architecture (hardware-software interface) ! Implications: Hardware Hardware • Also called system VM • Hypervisor executes in privileged mode • Will later see subclasses of this • Guest software executes in unprivileged mode • Privileged instructions in guest cause a trap into hypervisor • Hypervisor interprets/emulates them • Can have extra instructions for hypercalls

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 5 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 6 Why Virtual Machines? Why Virtual Machines?

! Historically used for easier sharing of expensive mainframes ! Renaissance in recent years for improved isolation • Run several (even different) OSes on same machine ! Server/desktop virtual machines • Each on a subset of physical resources • Improved QoS and security • Can run single-user single-tasked OS in time-sharing system • Uniform view of hardware ! legacy support • Complete encapsulation • “world switch” between VMs VM1 Apps VM2 Apps ! replication VM1 Apps VM2 Apps ! Gone out of fashion in 80’s ! migration • Time-sharing OSes common-place ! • Hardware too cheap to worry... Guest Guest checkpointing Guest Guest OS OS ! debugging OS OS • Different concurrent OSes ! e.g.: and Windows Virt RAM Virt RAM • Total mediation Virt RAM Virt RAM ! Would be mostly unnecessary Hypervisor • if OSes were doing their job... Hypervisor

Mem. region Mem. region Mem. region Mem. region RAM RAM

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 7 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 8

Native vs. Hosted VMM VMM Types

Native/Classic/Bare-metal/Type-I Hosted/Type-II Classic: as above Guest OS Guest OS Hosted: run on top of another • e.g. VMware Player/Fusion Hypervisor Hypervisor Whole-system: Virtual hardware and operating system • Really an emulation • E.g. Virtual PC (for Macintosh) Host OS Hardware Physically partitioned: allocate actual processors to each VM Logically partitioned: time-share processors between VMs Hardware Co-designed: hardware specifically designed for VMM ! Hosted VMM can run besides native apps • E.g. Transmeta Crusoe, IBM i-Series • Sandbox untrusted apps • Run second OS Pseudo: no enforcement of partitioning • Less efficient: • Guests at same privilege level as hypervisor ! Guest privileged instruction traps into OS, forwarded to hypervisor • Really abuse of term “virtualization” ! Return to guest requires a native OS system call • e.g. products with “optional isolation” • Convenient for running alternative OS environment on desktop

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 9 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 10

Virtualization Mechanics Virtualization Mechanics: Address Translation

! Traditional “trap and emulate” approach: Application (Virtual) guest • guest attempts to access physical resource page table ld r0, adr Guest • hardware raises exception (trap), invoking hypervisor's exception handler virtual address • hypervisor emulates result, based on access to virtual resource Shadow page table Hypervisor mirrors guest's ! Most instructions do not trap Virt PT ptr page table updates • makes efficient virtualization possible • requires that VM ISA is (almost) same as physical processor ISA Hypervisor's Guest guest memory Guest OS physical map address PT ptr Guest VMM

ld r0, curr_thrd Exception Hypervisor ld r1, (r0,ASID)! lda r1, vm_reg_ctxt Physical mv CPU_ASID, r1 ld r2, (r1,ofs_r0)! address l s d sp, (r1,kern_stk)! to r2, (r1,ofs_ASID)! dat a

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 11 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 12 Requirements for Virtualization Trap-and-Emulate Requirements

Definitions: ! ! Privileged instruction: executes in privileged mode, traps in user mode An architecture is virtualizable if all sensitive instructions are privileged • Note: trap is required, NO-OP is insufficient! ! Can then achieve accurate, efficient guest execution ! Privileged state: determines resource allocation • by simply running guest binary on hypervisor • Includes privilege mode, addressing context, exception vectors, … ! VMM controls resources ! Sensitive instruction: control-sensitive or behaviour-sensitive ! Virtualized execution is indistinguishable from native, except: • control sensitive: changes privileged state • Resources more limited (running on smaller machine) • behaviour sensitive: exposes privileged state • Timing is different (if there is an observable time source) ! includes instructions which are NO-OPs in user but not privileged mode ! Recursively virtualizable machine: ! Innocuous instruction: not sensitive • VMM can be built without any timing dependence

Note: Guest VMM • Some instructions are inherently sensitive ld r0, curr_thrd Exception ! e.g. TLB load ld r1, (r0,ASID)! lda r1, vm_reg_ctxt • Others are sensitive in some context mv CPU_ASID, r1 ld r2, (r1,ofs_r0)! l s ! e.g. store to page table d sp, (r1,kern_stk)! to r2, (r1,ofs_ASID)!

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 13 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 14

Virtualization Overheads Unvirtualizable Architectures

! VMM needs to maintain virtualized privileged machine state ! : lots of unvirtualizable features • processor status • e.g. sensitive PUSH of PSW is not privileged • addressing context • segment and interrupt descriptor tables in virtual memory • device state... • segment description expose privileged level ! : mostly virtualizable, but ! VMM needs to emulate privileged instructions • • translate between virtual and real privileged state interrupt vector table in virtual memory • THASH instruction exposes hardware page tables address • e.g. guest ! real page tables ! MIPS: mostly virtualizable, but ! Virtualization traps are be expensive on modern hardware • kernel registers k0, k1 (needed to save/restore state) user-accessible • can be 100s of cycles (x86) • performance issue with virtualizing KSEG addresses ! Some OS operations involve frequent traps ! ARM: mostly virtualizable, but • STI/CLI for mutual exclusion • some instructions undefined in user mode (banked registers, CPSR) • frequent page table updates during fork()... • PC is a GPR, exception return in MOVS to PC, doesn’t trap • MIPS KSEG address used for physical addressing in kernel ! Most others have problems too ! Recent architecture extensions provide virtualization support hacks

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 15 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 16

Impure Virtualization Binary Translation

! Used for two reasons: ! Locate sensitive instructions in guest binary and replace on-the-fly by • unvirtualizable architectures emulation code or hypercall • performance problems of virtualization • pioneered by VMware ! Change the guest OS, replacing sensitive instructions • can also detect combinations of sensitive instructions and replace by single • by trapping code (hypercalls) emulation • by in-line emulation code • doesn’t require source, uses unmodified native binary ! Two standard approaches: ! in this respect appears like pure virtualization! • binary translation: modifies binary • very tricky to get right (especially on x86!) • para-virtualization: changes ISA ld r0, curr_thrd • needs to make some assumptions on sane behaviour of guest ld r1,(r0,ASID)! trap ld r0, curr_thrd ld sp,(r1,kern_stck)! ld r1, (r0,ASID)! mv CPU_ASID, r1 ld sp,(r1,kern_stk)!

ld r0, curr_thrd ld r1,(r0,ASID)! jmp fixup_15 ld sp,(r1,kern_stck)!

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 17 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 18 Para-Virtualization Virtualization Techniques

! New name, old technique ! Impure virtualization methods enable new optimisations • Unix server [Golub et al, 90], [Härtig et al, 97], • due to the ability to control the ISA Disco [Bugnion et al, 97] ! E.g. maintain some virtual machine state inside VMM: • Name coined by Denali [Whitaker et al, 02], popularised by • e.g. interrupt-enable bit (in virtual PSR) Xen [Barham et al, 03] Guest • guest can update without (expensive) hypervisor invocation ! Idea: manually port the guest OS to modified ISA ! requires changing guest's idea of where this bit lives • Augment by explicit hypervisor calls (hypercalls) • hypervisor knows about VMM-local virtual state and can act accordingly ! Use more high-level API to reduce the number of traps Hypervisor ! e.g. queue virtual interrupt until guest enables in virtual PSR ! Remove un-virtualizable instructions ! Remove “messy” ISA features which complicate virtualization Hardware • Generally out-performs pure virtualization and binary-rewriting VPSR PSR ! Drawbacks: 0 0 mov r1,#VPSR • Significant engineering effort Trap ldr r0,[r1] psid orr r0,r0,#VPSR_ID • Needs to be repeated for each guest-ISA-hypervisor combination 1 0 sto r0,[r1] • Para-virtualized guest needs to be kept in sync with native guest • Requires source

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 19 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 20

Virtualization Techniques Virtualization Techniques

Page table implementation options ! E.g. lazy update of virtual machine state ! Strict shadowing of virtual page table • virtual state is kept inside hypervisor • write protect PTs ⇒ force trap into hypervisor on each update • keep copy of virtual state inside VM • allow temporary inconsistency between local copy and real VM state • can combine multiple updates in single hypercall • synchronise state on next forced hypervisor invocation ! e.g. during fork() ! actual trap ! Lazy shadowing of virtual page table ! explicit hypercall when physical state must be updated • identify synchronisation points • Example: add a mapping: • possible due to TLB semantics ! guest enables FPU ! real PT updates only become effective once loaded into TLB ! no need to invoke hypervisor at this point ! explicit TLB loads and flushes are natural synchronisation points ! hypervsior syncs state on virtual kernel exit • PTs are big ⇒ need to tell hypervisor which part to sync ! Expose real page tables (write-protected) • emulate updates • guest must deal with PT reads differing from what was written ! Complex trade-offs • Xen changed approach several times

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 21 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 22

Virtualization Techniques Device Virtualization Techniques

Virtual memory tricks Full virtualization of device ! Over-committing memory ! Hypervisor contains real device driver • like classical virtual memory ! Native guest driver accesses device as usual • sum of guest physical RAM > physical RAM • implies virtualizing all device registers etc ! Page sharing • trap on every device access • multiple VMs running the same guest have a lot of common pages ! Virtualizes at device interface May ! text segments, zeroed pages page ! Hypervisor implements device-sharing policies App App out • hypervisor detects pages with same content Guest ! Drawbacks: Memory Guest OS ! keeps hash of every page Balloon • must re-implement/port all drivers in hypervisor • uses copy-on-write to map those to a single copy Guest ! unfeasible for contemporary hardware Native Native ! up to 60% memory savings [Waldspurger 02] Memory May Balloon page • very expensive (frequent traps) Driver Driver ! Memory reclamation using ballooning in • will not work on most devices Guest • load pseudo device driver into guest, colludes with hypervisor Memory ! timing constraints violated Hypervisor • to reclaim memory, hypervisor instructs driver to request memory Balloon Driver Driver • hypervisor can re-use memory hoarded by ballooning driver • guest controls which memory it gives up Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 23 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 24 Device Virtualization Techniques Device Virtualization Techniques

Virtual device drivers Driver inside host (for Type-II VMMs) ! Guest OS contains virtual drivers ! Guest OS contains virtual drivers • forwards guest I/O requests to real driver via hypercalls ! Hypervisor passes requests through to host • very simple driver ! Fits Type-II model (VMM is just an app) ! Need only support small number of different virtual devices App App • e.g. one type of virtual NIC, one type of virtual disk ! Virtualizes at driver interface App App Guest OS ! Hypervisor implements device-sharing policies Virt Virt ! Drawback: Guest OS Driver Driver • must re-implement/port all drivers in hypervisor Virt Virt • unfeasible for contemporary hardware Driver Driver Hypervisor • ... unless a complete OS becomes the hypervisor (KVM) Hypervisor Host OS Driver Driver Driver Driver

Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 25 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 26

Device Virtualization Techniques Device Virtualization Techniques

Device-driver OS Native real driver in guest ! Special guest OS contains real drivers ! Guest allowed to “own” device

• Xen “Dom0 guest” ! Hypervisor not involved in I/O ! Hypervisor passes requests from virtual driver through to driver OS • native I/O performance! ! Can re-use driver guest's native drivers unchanged ! In general insecure and thus infeasible ! Drawbacks: ! Possible for: • driver invocation requires • simple devices not doing DMA full context switch App App ! but sharing is an issue • driver OS + all drivers becomes • with hardware support App App Guest OS Driver OS part of VMM Guest OS ! virtualization-friendly devices ! very large TCB Virt Virt • e.g. IBM channel architecture Guest OS Virt Driver Driver ! Can improve TCB by running each Driver Driver ! IO-MMU Native Native driver in its own guest OS instance • maps IO space to RAM Driver Driver • full encapsulation of drivers • under control of hypervisor Hypervisor [LeVasseur et al 04] • e.g. Intel VT-d Hypervisor Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 27 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 28

Soft Layering aka Pre-Virtualization Soft Layering aka Pre-Virtualization

! 2nd idea: do actual fix-up at load time ! Combines advantages of pure and para-virtualization [LeVasseur et al, 08] • leave original (unvirtualized) instructions in binary ! Automated para-virtualization, not unlike binary translation • pad with no-ops where virtualization expands code ! Core idea: Post-process (“afterburn”) assembly code (compiler output) • add info to ELF file describing where to patch • prepares (“pre-virtualizes”) code • over-write unvirtualized instructions (and no-ops) during load • more flexible than binary re-writing • link in hypervisor-specific user-level VMM code (“wedge”) • use semantic info from compiler ! Advantage: actual binary is hypervisor-neutral Guest • replace instruction sequences by hypercalls • can be patched (at load time) for any supported hypervisor ! hook onto macros etc • can run on bare hardware without any patching • no need to keep addresses invariant ! no-ops have very little performance effect (0.15%) Virtualization code ! jump (to virtualization code) may need • has most of the properties of pure virtualization more space than virtualized instruction • ... except for much improved performance ! linker will fix up address changes • pre-virtualization doesn't have to be perfect Hypervisor • can expand code for virtualization ! no harm if some sensitive instructions are missed ! can do much virtualization in-line • will be subject to normal (pure) virtualization ! avoid branches to virtualization code Hardware • ... as long as the instruction traps ! Disadvantage: needs source (at least assembler output of compiler) ! e.g. page table updates (PTs are write-protected)

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 29 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 30 Soft Layering aka Pre-Virtualization Hardware Virtualization Support

! 3rd idea: feedback loop for optimisation ! Intel VT-x/VT-i: virtualization support for x86/Itanium • initially only substitute most important subset of instructions • Introduces new processor mode: VMX root mode for hypervisor ! non-trapping sensitive instructions • In root mode, processor behaves like pre-VT x86 ! obviously performance-critical • In non-root mode, all sensitive instructions trap to root mode (“VM exit”) • profile virtualization traps at run-time ! orthogonal to privilege rings, i.e. each has 4 ring levels ! hypervisor records location and frequency ! very expensive traps (700+ cycles on Core processors)" • use this to reduce virtualization overheads ! not used by VMware for that reason [Adams & Agesen 06] ! identify hot spots from profiling data • Supported by Xen for pure virtualization (as alternative to para-virtualization)" ! annotate hot spots in source code • Used exclusively by KVM ! add replacement rules to pre-virtualizer ! KVM uses whole Linux system as hypervisor! ! re-run pre-virtualization and link ! Implemented by loadable driver that turns on root mode ! Advantage: guided optimization • VT-i (Itanium) also reduces virtual address-space size for non-root • similar to “optimized para-virtualization” [Magenheimer & Christian 04] ! Similar AMD (Pacifica), PowerPC • but less ad-hoc ! Other processor vendors working on similar feature • ARM TrustZone is partial solution ! Aim is virtualization of unmodified legacy OSes

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 31 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 32

Virtualization Performance Enhancements (VT-x) I/O Virtualization Enhancements (VT-d)

! Hardware shadows some privileged state ! Introduce separate I/O address space • “guest state area” containing segment registers, PT pointer, interrupt mask etc ! Mapped to physical address space by I/O MMU • swapped by hardware on VM entry/exit • under hypervisor control • guest access to those does not cause VM exit ! Makes DMA safely virtualizable • reduce hypervisor traps • device can only read/write RAM that is mapped into its I/O space ! Hypervisor-configurable register makes some VM exits optional ! Useful not only for virtualization • allows delegating handling of some events to guest • safely encapsulated user-level drivers for DMA-capable devices ! e.g. interrupt, floating-point enable, I/O bitmaps • ! selected exceptions, eg syscall exception ideal for ! • reduce hypervisor traps ! AMD IOMMU is essentially same ! Exception injection allows forcing certain exceptions on VM entry ! Similar features existed on high-end Alpha and HP boxes ! Extended page tables (EPT) provide two-stage address translation ! ... and, of course, IBM channels since the '70s... • guest virtual # guest physical by guest's PT • guest physical # physical by hypervisor's PT • TLB refill walks both PTs in sequence

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 33 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 34

Halfway There: ARM TrustZone Uses of Virtual Machines

! Multiple (identical) OSes on same platform ! ARM TrustZone extensions introduce: • the original raison d'être • new processor mode: monitor Insecure World • these days driven by server consolidation ! similar to VT-x root mode • interesting variants of this: ! banked registers (PC, LR) App App App App App App Secure World ! different OSes (Linux + Windows) ! can run unmodified guest OS binary in non-monitor OS OS App App ! old version of same OS (Win2k for stuff broken under Vista) kernel mode ! OS debugging (most likely uses Type-II VMM) OS • new privileged instruction: SMI Hypervisor ! Checkpoint-restart ! enters monitor mode • minimise lost work in case of crash • new processor status: secure Monitor • useful for debugging, incl. going backwards in time • partitioning of resources ! re-run from last checkpoint to crash, collect traces, invert trace from crash ! memory and devices marked secure or insecure • life system migration ! in secure mode, processor has access to all resources ! load balancing, environment take-home ! in insecure mode, processor has access to insecure resources only ! Ship application with complete OS • monitor switches world (secure ! insecure) • reduce dependency on environment • really only supports one virtual machine (guest in insecure mode) • “Java done right” ! ! need another hypervisor and para-virtualization for multiple guests ! How about embedded systems?

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 35 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 36 Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 1: Mobile phone processor consolidation Use case 1a: License separation ! High-end phones run high-level OS (Linux) on app processor ! Linux desired for various reasons • supports complex UI software • familiar, high-level API ! Base-band processing supported by real-time OS (RTOS) • large developer community • free ! Medium-range phone ! Other parts of system contain proprietary code needs less grunt UI UI • can share processor Software ! Manufacturer doesn't want to open-source Software UI • two VMs on one ! User VM to contain Linux + GPL Software Baseband Baseband physical processor Linux Software Linux Software • hardware cost Baseband GPL reduction Linux Software RTOS RTOS RTOS Hypervisor Hypervisor

Processor Processor Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 37 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 38

Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 1b: Software-architecture abstraction Use case 1c: Dynamic processor allocation ! Support for product series ! Allocate share of base-band processor to application OS • range of related products of varying capabilities • Provide extra CPU power during high-load periods (media play) ! Same low-level software for high- and medium-end devices • Better processor utilisation ! higher performance with lower-end hardware ! Benefits: • HW cost reduction • time-to-market UI UI UI UI Software • engineering cost Software Software Software Realtime Realtime Baseband Baseb. Linux Software Linux Software Linux Software Linux SW

RTOS RTOS RTOS RTOS Hypervisor Hypervisor Hypervisor Hypervisor

Processor Processor Processor Processor Processor Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 39 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 40

Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 2: Certification re-use Use case 2a: Open phone with user-configured OS ! Phones need to be certified to comply with communication standards ! Give users control over the application environment ! Any change that (potentially) affects comms needs re-certification • perfect match for Linux ! UI part of system changes frequently ! Requires strong encapsulation of application environment ! ncapsulation of UI E • without • provided by VM undermining UI UI UI UI • avoids need for Software performance! Software costly re- Software Software certification Linux Baseband Linux Baseband Linux Baseband Linux Baseband Software Software Software Software

RTOS RTOS RTOS RTOS Hypervisor Hypervisor Hypervisor Hypervisor

Processor Processor Processor Processor Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 41 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 42 Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 2b: Phone with private and enterprise environment Use case 2c: Security ! Work phone environment integrated with enterprise IT system ! Protect against exploits ! Private phone environment contains sensitive personal data ! Modem software attacked by UI exploits ! Mutual distrust between the environments ! strong isolation needed • Compromised application OS could compromise RT side Attack • Could have serious consequences Private Enterprise ! e.g. jamming cellular network UI phone env phone env ! Virtualization protects SW Access • Separate apps and system code into SW Linux Linux Baseband libjpeg Software different VMs

OS OS RTOS Hypervisor Hypervisor

Processor Processor Core

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 43 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 44

Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 3: Mobile internet device (MID) with enterprise app Use case 3a: Environment with minimal trusted computing base (TCB) ! MID is open device, controlled by owner ! Minimise exposure of highly security-critical service to other code ! Enterprise app is closed and controlled by enterprise IT department ! Avoid even an OS, provide minimal trusted environment ! Hypervisor provides isolation • need a minimal programming environment • goes beyond capabilities of normal hypervisor Apps • requires basic OS functionality Enterpr. App Apps

Linux Special- Linux purpose OS Critical code Hypervisor Generalised Hypervisor Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 45 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 46

Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 3b: Point-of-sale (POS) device Use case 4: DRM on open device ! May be stand-alone or integrated with other device (eg phone) ! Device runs Linux as app OS, uses Linux-based media player ! Financial services providers require strong isolation ! DRM must not rely on Linux • dedicated processor for PIN/key entry ! Need trustworthy code that • use dedicated virtual processor ! HW cost reduction • loads media content into on-chip RAM Apps • decrypts and decodes content Apps • allows Linux-based player to disply Codec Apps ! Need to protect data from guest OS HLOS

Linux Crypto Linux PIN entry Generalised Hypervisor PIN entry Generalised Hypervisor Processor Processor Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 47 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 48 Why Virtualization for Embedded Systems? Why Virtualization for Embedded Systems?

Use case 4a: IP protection in set-top box Use case 5: Automotive control and infotainment ! STB runs Linux for UI, but also contains highly valuable IP ! Trend to processor consolidation in automotive industry • highly-efficient, proprietary compression algorithm • top-end cars have > 100 CPUs! ! Operates in hostile environment • cost, complexity and space pressures to reduce by an order of magnitude • reverse engineering of algorithms • AUTOSAR OS standard addressing this for control/convenience function Apps ! Need highly-trustworthy code that Decom- ! Increasing importance of Infotainment • loads code from Flash into on-chip RAM pression • driver information and entertainment UI Automotive • decrypts code function Linux Crypto, Software apps • not addressed by AUTOSAR • runs code protected from interference Secure loader ! Increasing overlap of infotainment and Linux AUTOSAR control/convenience • eg park-distance control using Hypervisor infotainment display Hypervisor • benefits from being located on same CPU Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 49 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 50

Why Virtualization for Embedded Systems? Enterprise vs Embedded Systems VMs

Future use case: multicore resource management (esp. power) Homogenous vs heterogenous guests ! Hypervsior is virtualization layer that allows turning off idle resources ! Enterprise: many similar guests ! Embedded: 1 HLOS + 1 RTOS Guest 1 Guest 2 • hypervisor size irrelevant • hypervisor resource-constrained • VMs scheduled round-robin • interrupt latencies matter

Guest 1 Guest 2 VCPU VCPU VCPU VCPU Hypervisor Apps Apps Apps UI VCPU VCPU VCPU VCPU Software CPU CPU CPU CPU Baseband Hypervisor Linux Linux Windows Software guest guest guest Linux CPU CPU CPU CPU Guest 1 Guest 2 RTOS

VCPU VCPU VCPU VCPU Hypervisor Hypervisor Hypervisor Processor Processor

CPU CPU CPU CPU ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 51 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 52

Core Difference: Isolation vs Cooperation Enterprise vs Embedded Systems VMs

Devices in enterprise-style virtual machines ! Hypervisor owns all devices ! Drivers in hypervisor ! Drivers in privileged guest OS • need to port all drivers • can leverage guest's driver support • huge TCB • need to trust driver OS • still huge TCB! App App

App App Enterprise Embedded Guest ! Independent services ! Integrated system Virt Virt Guest Dom0 ! Emphasis on isolation ! Driver Driver Cooperation with protection Virt ! Inter-VM communication is ! Inter-VM communication is critically Driver Driver Driver secondary important Hypervisor • performance secondary • performance crucial Driver Driver Trusted Hypervisor ! VMs connected to Internet ! VMs are subsystems accessing (and thus to each other) shared (but restricted) resources Processor Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 53 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 54 Enterprise vs Embedded Systems VMs Isolation vs Cooperation: Scheduling

Devices in embedded virtual machines Enterprise Embedded ! Some devices owned by particular VM ! Round-robin scheduling of VMs ! Global view of scheduling ! Some devices shared ! Guest OS schedules its apps ! Schedule threads, not VMs ! Some devices too sensitive to trust any guest ! Driver OS too resource hungry VM VM VM VM VM VM

! Use isolated drivers Task Task Task Task Prio1 ... • protected from other drivers ... App App • protected from guest OSes Task Task Task Task Task Prio2

Guest Guest Task Task Task Task Task Prio3 Guest Virt Virt Native Driver Driver Driver Prio Driver Task Task Task Task Task 4

Hypervisor ! Similar for energy management: Processor • energy is a global resource • optimal per-VM energy policies are not globally optimal

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 55 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 56

Inter-VM Communication Control Inter-VM Communication Control

Modern embedded systems are multi-user devices! Modern embedded systems are multi-user devices!

Media Media Player Priv. Player Priv. ! Eg a phone has three classes of “users”: Data ! Eg a phone has three classes of “users”: Data Operat Operat or or UI Stake UI Stake • the network operator(s) etc • the network operator(s) etc Codec Comms Codec Comms Crypto Stack Crypto Stack • assets: cellular network HLOS • assets: cellular network HLOS

Buffer Buffer RTOS • content providers Buffer Buffer RTOS

• media content Media Player Priv. Data Operator Stake UI etc Codec Comms Crypto Stack HLOS Media Provider Stake Buffer Buffer RTOS

Media Media Player Priv. Player Priv. Data Operator Data Operator Stake Stake UI UI etc Codec Comms etc Codec Comms Crypto Stack Crypto Stack HLOS HLOS Media Provider Buffer Buffer RTOS Stake Buffer Buffer RTOS

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 57 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 58

Inter-VM Communication Control Inter-VM Communication Control

Modern embedded systems are multi-user devices! Modern embedded systems are multi-user devices!

Media Media Player Priv. Player Priv. ! Eg a phone has three classes of “users”: Data ! Eg a phone has three classes of “users”: Data Operat Operat or or UI Stake UI Stake • the network operator(s) etc • the network operator(s) etc Codec Comms Codec Comms Crypto Stack Crypto Stack • assets: cellular network HLOS • assets: cellular network HLOS

• content providers Buffer Buffer RTOS • content providers Buffer Buffer RTOS

• media content Media • media content Media Player Priv. Player Priv. Data Data Operator Operator • the owner of the physical device Stake • the owner of the physical device Stake UI UI etc etc Codec Comms Codec Comms • assets: private data, access keys Crypto Stack • assets: private data, access keys Crypto Stack HLOS HLOS Media Media Provider Provider Stake Stake Buffer Buffer RTOS ! They are mutually distrusting Buffer Buffer RTOS

Media Owner • need to protect integrity and confidentiality against Media Owner Player Stake Priv. Player Stake Priv. Owner Data Data Media Operator Operator Priv. Stake internal exploits Stake Stake UI UI Player etc etc Data Codec Comms Codec Comms Operator Crypto Stack • need control over information flow Crypto Stack HLOS HLOS StakeMedia Media UI Provider • strict control over who has access to what Provider Stake Stake Buffer Buffer RTOS Buffer Buffer RTOS etc Codec Comms • strict control over communication channels Crypto Stack HLOS Media Provider Stake Buffer Buffer RTOS

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 59 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 60 Inter-VM Communication Control High Safety/Reliability Requirements

! Different “users” are mutually distrusting ! Software complexity is mushrooming in embedded systems too ! Need strong protection / information-flow control between them • millions of lines of code ! Isolation boundaries $ VM boundaries ! Some have very high safety or reliability requirements

Media Owner Player Stake Priv. ! • some are much smaller than VMs Data Need divide-and-conquer approach to software reliability Operator Stake UI ! individual buffers, programs etc • Highly componentised systems to enable fault tolerance Codec Comms Crypto Stack • some contain VMs HLOS Media Provider Stake • some overlap VMs Buffer Buffer RTOS Comp Comp Comp Comp ! Need to define information flow between App isolation domains Real Comp Secure App Time Comms Comp Object Loader HyperCellTM App Library Mgr App App App boundary File System TCP/IP User Interface Linux App Network Display Flash OK Linux " # # OKL4 Processor

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 61 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 62

Componentisation for IP Blocks Componentization for Security — MILS

! Match HW IP blocks with SW IP blocks ! MILS architecture: multiple independent levels of security ! HW IP owner provides matching SW blocks ! Approach to making security verification of complex systems tractable • encapsulate SW to ensure correct operation ! Separation kernel provides strong security isolation between subsystems • Stable interfaces despite changing HW/SW boundary ! High-grade verification requires small components

Isolation boundary

Other SW SW SW Domain Domain Domain SW Block Block Block

Hypervisor Separation Kernel

Processor IP IP IP Processor Block Block Block

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 63 ©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 64

Embedded Systems Requirements

! Sliding scale of isolation from individual program to VM running full-blown OS • isolation domains, information-flow control ! Global scheduling and power management • no strict VM-hypervisor hierarchy • increased hypervisor-guest interaction ! High degree of sharing is essential and performance-critical • high bandwidth, low latency communication, subject to security policies ! Real-time response • fast and predictable switches to device driver / RT stack ! High safety/security requirements • need to maintain minimal TCB • need to support componentized software architecture / MILS

Virtualization in embedded systems is good, but different from enterprise • requires more than just a hypervisor, also needs general OS functionality • perfect match for good microkernel, such as OKL4...

©2009 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons Attribution License 65