(VM) Layered model of computation Software and hardware divided into logical layers Layer n Receives services from layer n – 1 Provides services to client layer n + 1 Virtual Layers interact through well-defined programming interface Virtual layer Machines Software emulation of hardware or software layer n Transparent to layer n + 1 Provides service to layer n + 1 as expected from real layer n Virtual layer n can run at some layer m ≠ n in real system

n + 1 n + 1 n virtual n = m n –1 m –1 Virtual System Real System

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 1 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 2

Examples of Virtual Systems Types of Virtual Machine Web browser exchanges data with server Process Virtual Machine VM provides application interpretation above OS Browser virtual Web server Hosted Virtual Machine Local OS Server OS Protocol Protocol Virtual machine monitor (VMM) Stack Stack Runs above primary OS / below guest OS real real Hardware Network Hardware Provides guest OS with software emulation of real hardware system Client Server Emulation of system-level hardware environment Cloud computing Runs above physical hardware and below one or more OSs Virtual Service level agreement (SLA) specifies infrastructure requirements Application Application Application Application Application User sees hardware / software configuration / performance VM Real OS OS Guest OS OS Provider assembles virtual configuration VMM VMM Meets SLA requirements Hardware OS Hardware

May be implemented in any way Hardware Hardware Basic System System VM Hosted VM Process VM

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 3 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 4 Process VM Example —Java Hosted VM Example — Guest OS Over OS Designed for program portability between platforms DOS line interface over Windows Windows allocates 1 MB virtual memory space Provides standard interface to software Copies DOS kernel into low memory Windows Java VM located above a standard OS Application Virtual 86 System calls handled by guest DOS kernel Interface to hardware implementation dependent DOS accesses to hardware Windows I/O operations performed by calls to OS Trapped and served by Windows host OS Java compiled to bytecode Responses returned to DOS Hardware Bytecode usually run (interpreted) in Java VM Concurrent DOS windows Multiple allocations of 1 MB virtual memory spaces DEBUG Application running in virtual DOS machine Java without VM Sees 1 MB memory space allocated by Windows Java bytecode processor in IBM mainframes Register values Native machine language (ISA) is Java bytecode Windows emulates real values to DOS Execute Java bytecode without interpretation Debug emulates DOS values to user

Parallels, VirtualBox, VMware, DOSBox, ... Host Windows, , DOS, … as guest OSs over host OS http://java.sun.com/docs/books/tutorial/getStarted/intro/definition.html

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 5 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 6

Virtual Machine in IBM z/990 Mainframe VM as System Management Tool Hardware Isolate user environments on single hardware platform CPUs, I/O system, internal communication network Multiple copies of single running independently VMM () Multiple operating systems running concurrently Operator console for partitioning/configuring CPUs and I/O Maintain higher security Provides hardware emulation as abstraction to OS layer OS (LPAR) runs separate instance of operating system App1 App2 App2 App3 Resource management Run z/OS, MVS, VM, Unix, Linux, Windows, … instances in parallel OS OS OS OS Non-Windows OS versions expect to see hypervisor (not hardware) Hardware redundancy User High availability VMM VMM Recovery management User sees single-user interface provided by one OS Server Server Hardware pooling User … User User … User User … User User … User User … User Assemble hardware cluster OS — LPAR OS — LPAR OS — LPAR … OS — LPAR Map applications to hardware efficiently

VMM — Systems Manager — Hypervisor Load balancing Remap applications to hardware Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 7 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 8 z/990 Parallel Sysplex Model for Server Systems Parallel Sysplex Old file server model 2 to 32 instances of z/OS into a single system Run one application per physical server Applications divide work and data among LPARs Server specified for worst case load High capacity for very large workloads Large number of typically underutilized servers Resource sharing Huge aggregate space capacity Dynamic workload balancing Competition from mainframes Geographical diversity VMM provides dynamic load balancing Coupled LPARs on remote physical systems Hardware provides centralized power, cooling, monitoring, backup Physical backup User … User User … User User … User User … User User … User High SAR — scalability, availability, reliability Automatic failure recovery LPAR - OS LPAR - OS LPAR - OS … LPAR - OS Lower cost per served client than server farm Continuous availability Systems Manager Hardware (processors, RAM, I/O) Virtualization in server

Partition hardware resources to run independent applications User … User User … User User … User User … User User … User

LPAR - OS LPAR - OS LPAR - OS … LPAR - OS virtualization Coupling Systems Manager IA-32 and IA-64 ISA support Facility Hardware (processors, RAM, I/O)

I/O chipset support

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 9 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 10

HP Virtual Partitions (vPars) System VM Organization Hypervisor Virtual machine monitor (VMM) Lowest layer above physical hardware (host) Uniprocessor or multiprocessor system Creates virtual machine (VM) environments for guest OSs Allocates physical host resources to virtual resources VM overhead Processor intensive applications — low overhead Infrequent use of OS calls Most instructions run directly on hardware I/O intensive applications — high overhead Frequent use of OS calls

Boot OS calls for I/O services run in emulation Order I/O-limited applications Program throughput limited by I/O latency Emulation adds relatively small overhead

Hewlett-Packard, "Installing and Managing HP-UX Virtual Partitions (vPars)" Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 11 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 12 VMM Requirements Virtualization Awareness Hardware abstraction Virtualization-aware guest OS Guest environment must replicate hardware OS written to run above VMM/hypervisor VMM must present well-defined software interface to OS Expects to interact with virtual host Does not expect full or direct control of physical hardware Protection OS code interfaces with hypervisor code Isolate guests from one another No need to remap (bluff) pointers intended for real hardware Protect VMM from guest OS and application software May be presented with view of real system for limited operations Guest software cannot change allocation of physical resources Example — mainframe OS Privilege Writes I/O outputs to hypervisor interface VMM runs in kernel mode Does not attempt to configure I/O hardware devices Guest OSs and applications run in user mode Particular OS may be given direct control of particular I/O device Hardware support for VMM Virtualization-unaware guest OS Virtualization primitives built into mainframe ISA OS written to run above physical hardware Any OS or application access to hardware causes trap to VMM Expects full and direct control of real hardware VMM catches every access to hardware abstraction layer (HAL) Requires extensive intervention and remapping by VMM

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 13 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 14

Hardware Emulation Activities Full/Partial System Emulation OS sees hardware through operations Full system emulation OS instructions cause to CPU initiate memory and I/O operations VMM intervenes in every OS access to hardware I/O devices initiate DMA operations and CPU Translates guest ISA to host ISA Application Memory Translates memory size and organization Real Operation VMM Emulation OS Chipset Translates guest configuration instructions to host Hardware I/O devices Translates guest driver to host driver VMM CPU Read data or Translate data/instruction from guest to host format Memory instruction Remap address space Hardware CPU emulation example Access Write data Read/Write to real host memory Run Nintendo game on PC CPU I/O Read data or Translate data/instruction from guest to host format Translate each Nintendo instruction to IA-32 instruction set Device instruction Remap I/O port space Partial system emulation Access Write data Read/Write to real host I/O device Part of host hardware presented to OS unchanged VMM manages I/O device DMA VMM passes guest operations to host with minimal intervention I/O device DMA or IRQ actions Translate OS handlers from guest Most system VMs emulate subset/superset of real host hardware format to host format CPU emulation only in special cases

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 15 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 16 Software Emulation of I/O Hardware Bootstrap Process in System VM Advantages Workstation without VMM Workstation with VMM VMM provides emulation of widely supported device hardware CPU loads initial system loader (ISL) CPU loads initial system loader (ISL) Guest OS runs available device drivers without modification System ISL points to system boot device ISL points to system boot device boot Difficulties Boot device contains OS Boot device contains VMM Requires very accurate device emulation OS loader writes to host I/O space VMM loader writes to host I/O space Includes hardware revisions and "bug emulation" Device Chipset and I/O devices respond Chipset and I/O devices respond Performance issues Discovery OS loads drivers for host devices VMM loads drivers for host devices VMM intervention on every guest OS access to I/O device OS provides user interface VMM provides administrator interface

Context switch from guest OS to VMM Administrator configures VM partitions VMM emulates I/O access and access to real I/O device Secondary Administrator points VMM to device Context switch back to guest OS with response Boot containing OS boot image Adds considerable overhead VMM boots OS into partition Emulation is compute-intensive — increases CPU utilization OS loader writes to virtual I/O space Least-bad case Device VMM responds for I/O devices Discovery Virtual device = real device OS loads drivers for virtual devices OS provides user interface Remap I/O ports — no change to driver operation

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 17 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 18

Virtualization Difficulties for IA‐32 Memory Resource Compression IA-32 designed to provide hardware support to OS OS manages resources using IA-32 system tables Memory segmentation Assigns pointer to page table root (directory) Virtual memory and paging Manages page table entries Task management Manages memory segmentation with descriptor tables limited to 8 K entries Global descriptor table (GDT) Interrupt management Map segment pointer to virtual address Protection and privilege for segmentation, paging, interrupts Define segment (code, data, system) and privilege level Interrupt descriptor table (IDT) Workaround virtualization Map interrupts and traps to service routines Treat OS like user application Application Application User Memory compression Can create a kludge on IA-32 systems OS VMM must reserve part of guest virtual memory for management OS VMM IA-32 operating systems Kernel OS expects to see the full virtual memory space Expect to have highest privilege Hardware Hardware Virtual Real Can easily discover their lower privilege Table resource compression VMM requires entries in GDT and IDT for management of OS VMM must prevent OS access to its descriptors OS expects full control of all 8 K table entries

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 19 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 20 Ring Aliasing Non‐Faulting Access to Privileged State Privilege rings Privileged registers Memory segments assigned privilege from 0 (highest) to 3 (lowest) Control configuration of hardware systems Stored in segment descriptor (table entry defining segment) VMM must Access rights for code limited to segments of same or lower privilege Intercept OS access to privileged registers Copied into code segment selector (pointer to segment via descriptor) Provide virtual values based determined for guest environment User mode ~ ring 3 Access to privileged registers in IA-32 OS kernel mode ~ ring 0 Access by unprivileged software usually prevented Acc ess Ring aliasing Granted Causes protection fault Deprivileging Acc ess VMM emulates response to guest instruction Denied Run VMM at ring 0 and OS at ring 1 CPL Some unprivileged accesses privileged state and do not fault Issues CPL GDTR pointer to GDT On user access to system state Paging restricted to two levels IDTR pointer to IDT CPL DPL Protection fault on write 4 level privilege not supported in 64-bit systems 0 1 2 3 LDTR pointer to LDT DPL No fault on read CPL OS can read its CPL from code segment selector TR pointer to current task segment DPL

DPL CPL — privilege level of code segment Guest OS can determine that it does control CPU DPL — privilege level for data access or branch target

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 21 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 22

System Calls and Interrupts Intel Virtualization Technology (VT) System calls Virtual machine monitor Application in ring 3 invokes OS in ring 0 Hardware boots (3rd party) VMM software instead of OS Require indirect mechanism (call gate) VMM configures hardware resources among guest systems Redirects to hidden ring 0 address Remaps hardware locations to virtual pointers for guests VMM must emulate call gates OSs boot within guest partitions SYSENTER instruction provides fast calls to ring 0 Hardware support for virtualization Will call VMM instead of guest OS VT-enabled processors alternate between operating modes SYSEXIT instruction ends SYSENTER routine Root mode grants full hardware control to VMM Faults to ring 0 if executed from lower privilege Non-root mode presents virtual pointers to guest OS VMM must emulate response to SYSENTER/SYSEXIT VT-enabled chipset Ring 3 User Interrupts Grants control of I/O to root mode VMX User Privilege Remaps I/O channels for non-root mode non‐root Ring 0 Interrupts can be masked by controlling (IF) OS Virtual Full Privilege VMM must mask interrupts and handle interrupts by emulation Operating system VMX root VMM Real Full Privilege Some OSs toggle IF frequently requiring many VMM interventions Sees virtual machine as real system Operates in ring 0 for maximum privilege Sends instructions to hardware pointers in usual way http://www.intel.com/technology/itj/2006/v10i3/index.htm

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 23 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 24 System Issues in Virtualization VT‐x for IA‐32 Processor Virtualization Virtual machine extensions (VMX) CPU virtualization support VMX root operation Handles operations initiated by CPU Operating mode designed for VMM Grants highest privilege access to host CPU hardware state Memory access by guest software PCI Host-to-Bus Bridge ROM (bus controller) VMM assigns virtual address RAM VMX non-root operation space to guest OS Operating mode designed for guest OS I/O access by guest software Presents OS with virtual host configured by VMM CPU VMM translates OS driver PCI (expansion) bus OS sees standard ring 0 access to virtual IA-32 resources output for host device OS access to privileged state trapped by VMM

ISA Graphics I/O I/O I/O I/O Bridge Mode transitions Chipset virtualization support VM entry Handles operations initiated VM ISA/EISA bus VMX root operation → VMX non-root operation by I/O device VM exit VMX User Interrupts and DMA accesses by I/O device non‐root OS VMX non-root operation → VMX root operation Intercepted by VMM and remapped disk I/O VMX root VMM VM Entry Hardware Host

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 25 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 26

Virtual Machine Control Structure VMCS Details Virtual-machine control structure (VMCS) Referenced by physical address Used for mode transition management No page table entry in any guest address space VM entry Location determined by VMM software Saves processor state to VMCS host-state area VMCS structure Loads processor state from VMCS guest-state area Not determined by architecture VM exit Defined as set of VMCS access host instructions Saves processor state to VMCS guest-state area VMM author chooses implementation Loads processor state from VMCS host-state area VM entry VMCS host-state area Loads table pointers from VMCS Segment register selectors for VMM operations Pointer updates cause context shift to VM process Privileged system table pointers (GDTR, IDTR, TR, page table root) VMM can optionally inject virtual event (interrupt) to cause VM VMCS guest-state area response Segment register selectors for OS operations VM exit Virtual system table pointers determined by VMM VM saves context to memory VMM physical address space not mapped to guest OS virtual address All VMs exit to common entry point in VMM space VM exit records details of reason for exit in VMCS Interrupt flag (IF) VMM provides detailed response to VM exit

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 27 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 28 VMCS Control Fields VT‐x Solves Virtualization Problems Settable options for interrupt virtualization Ring aliasing and compression Guest software runs at intended privilege level VM exit on external interrupt External‐interrupt exiting Address-Space Compression External interrupts not maskable by guest Guest/VMM transitions can change virtual address space Interrupt‐window exiting VM exit if guest allows interrupts Guest software has full use of its own address space VMCS resides in physical address space Guest/Host mask for control register virtualization Does not use not linear address space Status flags in control registers determine processor options Nonfaulting access to privileged state VMM masks selected flags to prevent write by guest VMCS controls interrupts Guest write to masked flag causes VM exit VMM allows guest OS access to privileged registers Guest reads flag value specified by VMM in VMCS VM exit bitmaps Accesses cause transition to VMM VMM chooses subset of guest actions that cause VM exit System calls Exception bitmap — 32 exceptions that optionally cause VM exit Guest OS runs at ring 0 as intended I/O bitmap — each 16-bit I/O port can be set to VM exit on guest access Interrupts Instruction bitmap — selects privileged instructions that cause VM exit VMCS controls response to interrupt through VMCS

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 29 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 30

VT‐x Interrupt Virtualization

Set option external-interrupt exiting not set OS handles OS continues in bitmap

VMM services VM entry Exception exception VM exit VMM prepares event Interrupt VM entry to VMM system tables injection set in VM exit bitmap to VMM Event injection replicates interrupt VMM updates event system tables injection Possible updates — interrupt tables, system registers, I/O configuration, ... Event injection replicates exception Possible updates — page tables, system registers, I/O configuration, ...

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 31 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 32 VT‐d for PCI Chipset Virtualization DMA Protection Domains VMM allocates resources to guest OSs Protection domain

Virtual address space CPU Bridge RAM Subset of physical memory allocated for device-initiated DMA Virtual I/O devices mapped to real I/O devices Protection domains may be allocated to OS accesses real I/O device through VMM mapping VMM DMA remapping I/O I/O I/O Guest OS OS configures virtual I/O devices Driver process running under guest OS Enables device-initiated DMA operations to guest address space I/O device Real I/O device must write to guest OS through emulation mapping May be assigned to a protection domain Interrupt remapping Can only perform DMA to assigned protection domain Real I/O devices my interrupt CPU DMA address translation Interrupt intended for one guest OS I/O device DMA request to bridge contains memory address Real I/O device must deliver interrupt VT-d treats request address as DMA virtual address (DVA) to guest OS through emulation Guest Physical Address (GPA) of guest OS mapping General software-generated virtual I/O address DVA translated to Host Physical Address (HPA)

http://www.intel.com/technology/itj/2006/v10i3/index.htm

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 33 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 34

Mapping I/O Devices to Protection Domains Address Space Overview PCI device requester ID

Identifies DMA device and request VMM PCI Bus Device Function

Page Page Assigned by PCI configuration software during device discovery Tables HPA Structures Context Entry Root-Entry Table Host Table GVA Physical HPA DMA Guest Memory DVA Virtual Index — 8-bit bus number from requester ID Virtual GPA Memory Memory Root Entry — Pointer to context-entry table Entry Guest Table Context-Entry Table Physical Memory Index — 8-bit device/function number from requester ID PCI Bus Device Function Entry — pointer to page structure used to translate DVA I/O device DMA Request ID Page structure Multilevel table structure Virtual Emulated Physical Virtual Similar to IA-32 page tables Memory Physical Memory Memory Memory

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 35 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 36 IA‐32 Interrupt Handling Message Interrupt Handling Legacy interrupts Local APIC Interrupt controller in chipset handles device interrupts CPU interrupt controller Programmable Interrupt Controller (PIC) integrated into ISA chipset Receives/decodes local interrupt signals APIC (Advanced PIC) integrated into PCI chipset Receives interrupt messages from I/O device assigned interrupt request (IRQ) connection to APIC I/O APIC APIC Translates device IRQ to 8-bit CPU interrupt number n I/O APIC Sends hardware interrupt signal (INTR) to processor PCI chipset interrupt controller CPU Receives/decodes device IRQ signals Loads 64-bit entry n from Interrupt Descriptor Table (IDT) Sends/receives interrupt messages Entry points to Interrupt Service Routine (ISR) Message signaled interrupts (MSI) I/O APIC in PCI chipset formats IRQ signal into structured message Message transferred on PCI bus as device-initiated DMA operation Local APIC in CPU receives and decodes message

IA-32 Intel Architecture Software Developer’s Manual

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 37 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 38

Interprocessor Interrupts Interrupt Remapping Interprocessor Interrupt (IPI) Message signaled interrupt (MSI) Subset of APIC interrupt message table Encodes interrupt vector and destination processor CPU writes to interrupt command register (ICR) in local APIC Real I/O device not aware of guest OS view of emulated I/O device Local APIC issues IPI message on system bus VMM must intercept MSI Used to boot and spawn threads in multiprocessor system VMM redefines interrupt message format Provides substitute MSI DMA write request contains Message identifier No interrupt attributes (vector and destination processor) Requester ID of real I/O device generating interrupt Requester ID mapped through table structure (root/context tables) Points to interrupt remapping table (IRT) Entry provides vector and destination processor

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 39 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 40 Caching of Remapping Structures VirtualBox VT-d supports hardware caching of remapping tables Open source hosted VMM by Oracle (Sun Microsystems) Root/Context tables Runs on Intel and AMD hardware Paging structures Runs above Windows, Linux, Mac OS X (Intel), Solaris IOTLB Provides VM with guest OS Interrupt remapping table entries Standard DOS, Windows, Linux, OS/2, FreeBSD, Solaris Uses support if available (not required) VMM responsible for maintaining remapping cache Must invalidate stale cache entries Scheduling Host OS grants timeslice to VM Remapping errors VM sub-processes scheduled by guest OS DMA access request returns error message Application Device response to error implementation dependent Application Application Guest OS Errors logged to VMM VirtualBox Hypervisor VMM may reset cache or I/O device configuration tables Host OS x86 Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 41 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 42

VirtualBox Architecture CPU Operating Modes Front-end (client) Runs native (not emulated) on CPU VirtualBox hypervisor Host applications at ring 3 Runs above host OS Host OS code at ring 0 Without Intel VT performs workaround virtualization Guest "safe" application code at ring 3 Hypervisor runs in ring 0 of guest context Non-system activities Guest OS runs as user program in ring 1 of guest context Makes system calls to guest OS Limited use of Intel VT if available Runs emulated on CPU at ring 3 Application Guest application code that causes guest OS interventions Guest OS Disable interrupts Hypervisor Trap of prohibited accesses Back-end (server) Host OS Executes code Ring 0 driver in host OS VirtualBox Driver Copes with "gory details of x86 architecture" Each instruction interpreted by VirtualBox driver Interpreted code run in CPU instead of native code Allocates physical memory for VM (guest OS) Runs native on CPU at ring 1 Saves/restores guest CPU context during host interrupt Registers and descriptor tables Guest OS ring 0 code No intervention in guest OS process management VirtualBox driver handles "gory details" of workaround

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 43 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 44 Xen Architecture Open source system VMM Xen hypervisor Runs on Intel and AMD x86 hardware Directly above hardware Runs directly above hardware Boots system on on start-up Linux required to build and install Xen Domain 0 Provides VMs with guest OSs Initialized by hypervisor on boot Linux, Solaris, Windows XP, 2003 Server Runs XenLinux — modified Linux kernel Hardware virtualization support required for Windows guest OS Provides Domain Management and Control (DMC) Domain U Para-virtualization for Linux/Unix guest OS VM running guest OS OS kernel modified to support Xen explicitly Operating systems ported to run on Xen Similar effort to porting OS to new hardware platform DMC Application Application Application Para-virtual machine architecture very similar to native hardware XenLinux OS Guest OS Guest OS Guest OS User space applications and libraries not modified Domain 0 Domain U Domain U Domain U Xen Hypervisor x86 Hardware Xen Architecture Overview, http://wiki.xensource.com/xenwiki

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 45 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 46

Hypervisor Domain 0 Full privilege XenLinux Operates directly on hardware in ring 0 Modified Linux kernel running in unique VM over hypervisor Functions Direct privileged access rights to physical I/O resources CPU scheduling for virtual machines Provides I/O virtualization to Domain U guest VMs Memory partitioning Generic I/O drivers Provides hardware abstraction to virtual machines Network Backend Driver No awareness of Manages local networking hardware Networking Processes all VM networking requests from Domain U guests External storage devices Block Backend Driver Video Domain Domain Domain Domain Manages local storage disk Domain Domain Domain 0 Common I/O U U U 0 Processes all read/write data requests U U I/O Drivers Xen Hypervisor Scheduler Partitioner from Domain U guests Scheduler Partitioner Process Page I/O Process Page I/O x86 Hardware CPU List Tables CPU List Tables Memory Memory

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 47 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 48 Domain U PV Domain U HVM Domain U PV guests Domain U HVM Guests Paravirtualized VM running modified Linux/UNIX kernels Fully virtualized machines OS expectations Run standard Windows or other unmodified OS No direct access to host hardware OS runs as VMX non-root operation with VT-x Shares host hardware with other VMs Guest drivers provide I/O access OS expectations Domain 0 Domain 0 Domain U Access backend drivers in Domain 0 No hardware virtualization OS Driver OS Driver daemon PV Network Driver Not sharing with other VMs PV Block Driver Normal hardware access for boot Xen virtual firmware runs as VMX root operation with VT-x Simulates BIOS expected by OS on initial startup

Domain 0 Domain 0 Domain U PV Driver PV Driver Backend Driver I/O support No special drivers Domain 0 runs Qemu-dm daemon for each HVM Guest Supports Domain U HVM Guest for networking and disk access requests

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 49 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 50

Domain Management Domain U PV to Domain 0 Communication Xend daemon Domain U PV Guest requests I/O from Domain 0 via hypervisor Python application running in Domain 0 No direct support in hypervisor for I/O System manager for Xen environment Inter-Domain event channel Processes requests as XML remote procedure call (RPC) Domain 0 and each Domain U have shared memory area Qemu-dm Asynchronous inter-domain interrupts implemented in hypervisor Daemon handles networking and disk requests from Domain U HVM Example — Domain U PV Guest data write to hard disk Provides full emulation of hardware for standard OS I/O drivers Guest OS sends write request to PV block driver Virtual firmware Guest PV block driver Provides full emulation of BIOS for Domain U HVM Guest OS Writes data to Domain 0 shared memory through hypervisor Sends inter-domain interrupt to Domain-0 through hypervisor Xend Unix Application Linux Application Windows Application Domain 0 receives interrupt from hypervisor Qemu Triggers PV Block Backend Driver access to shared memory XenLinux OS XenUnix XenLinux Standard Windows Backend Driver Domain 0 Domain U PV Domain U PV Domain U HVM Reads blocks from Domain U PV Guest shared memory Xen Hypervisor Writes data to hard disk x86 Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 51 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 52 I/O Driver Communication Xen PV and HVM Performance

Test Configuration Intel Xeon @ 2.3 GHz DMC Unix Application 4 GB DDR2 533 MHz memory 160 GB Seagate SATA disk Write Intel E100 Ethernet controller request

Backend block Driver PV block driver Write XenLinux XenLinux disk Interrupt Domain 0 Domain U PV Interrupt

Read Write shared shared memory memory Xen Hypervisor

Interrupt Interrupt x86 Hardware

Dong, et. al., "Extending Xen with Intel Virtualization Technology", Intel Technical Journal

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 53 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 54

I/O Bottleneck Bottleneck — Single Ethernet controller Guest OS tasks waiting for I/O access hides performance degradation caused by virtualization

Web server running over native Linux without Xen Threads compete above 2.5 Gbps

Web server running over XenLinux in Domain 0 Threads compete above 1.9 Gbps

Web server running over XenLinux in Domain U PV Threads compete above 0.9 Gbps

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 55