Virtual Machines

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 1 (VM) Layered model of computation Software and hardware divided into logical layers Layer n Receives services from server layer n – 1 Provides services to client layer n + 1 Layers interact through well-defined programming interface Virtual layer Software emulation of hardware or software layer n Transparent to layer n + 1 Provides service to layer n + 1 as expected from real layer n Virtual layer n can run at some layer m ≠ n in real system

n + 1 n + 1 n virtual n = m n –1 m –1 Virtual System Real System

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 2 Examples of Virtual Systems Web browser exchanges data with server

Browser virtual Web server Local OS Server OS Protocol Protocol Stack Stack real real Hardware Network Hardware Client Server

Cloud computing Virtual Service level agreement (SLA) specifies infrastructure requirements User sees hardware / software configuration / performance Real Provider assembles virtual configuration Meets SLA requirements May be implemented in any way

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 3 Types of Virtual Machine Process Virtual Machine VM provides application interpretation above OS Hosted Virtual Machine Virtual machine monitor (VMM) Runs above primary OS / below guest OS Provides guest OS with software emulation of real hardware system Emulation of system-level hardware environment Runs above physical hardware and below one or more OSs

Application Application Application Application Application VM

OS OS Guest OS OS

VMM VMM Hardware OS Hardware

Hardware Hardware Basic System System VM Hosted VM Process VM

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 4 Process VM Example —Java Designed for program portability between platforms Provides standard interface to software Java VM located above a standard OS Interface to hardware implementation dependent I/O operations performed by calls to OS Java compiled to bytecode Bytecode usually run (interpreted) in Java VM

Java without VM Java bytecode processor in IBM mainframes Native machine language (ISA) is Java bytecode Execute Java bytecode without interpretation

http://java.sun.com/docs/books/tutorial/getStarted/intro/definition.html

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 5 Hosted VM Example — Guest OS Over OS DOS command line interface over Windows Windows allocates 1 MB virtual memory space debug Copies DOS kernel into low memory Windows Application Virtual 86 System calls handled by guest DOS kernel DOS accesses to hardware Windows Trapped and served by Windows host OS Responses returned to DOS Hardware Concurrent DOS windows Multiple allocations of 1 MB virtual memory spaces DEBUG Application running in virtual DOS machine Sees 1 MB memory space allocated by Windows Register values Windows emulates real values to DOS Debug emulates DOS values to user

Parallels, VirtualBox, VMware, DOSBox, ... Host Windows, , DOS, … as guest OSs over host OS

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 6 Virtual Machine in IBM z/990 Mainframe Hardware CPUs, I/O system, internal communication network VMM () Operator console for partitioning/configuring CPUs and I/O Provides hardware emulation as abstraction to OS layer OS (LPAR) runs separate instance of Run z/OS, MVS, VM, Unix, Linux, Windows, … instances in parallel Non-Windows OS versions expect to see hypervisor (not hardware) User User sees single-user interface provided by one OS

User … User User … User User … User User … User User … User

OS — LPAR OS — LPAR OS — LPAR … OS — LPAR

VMM — Systems Manager — Hypervisor

Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 7 VM as System Management Tool Isolate user environments on single hardware platform Multiple copies of single operating system running independently Multiple operating systems running concurrently Maintain higher security

App1 App2 App2 App3 Resource management OS OS OS OS Hardware redundancy High availability VMM VMM Recovery management Server Server Hardware pooling Assemble hardware cluster Map applications to hardware efficiently Load balancing Remap applications to hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 8 z/990 Parallel Sysplex Model Parallel Sysplex Merge 2 to 32 instances of z/OS into a single system Applications divide work and data among LPARs High capacity for very large workloads Resource sharing Dynamic workload balancing

Geographical diversity Coupled LPARs on remote physical systems Physical backup User … User User … User User … User User … User User … User Automatic failure recovery LPAR - OS LPAR - OS LPAR - OS … LPAR - OS

Continuous availability Systems Manager

Hardware (processors, RAM, I/O)

User … User User … User User … User User … User User … User

LPAR - OS LPAR - OS LPAR - OS … LPAR - OS

Coupling Systems Manager

Facility Hardware (processors, RAM, I/O)

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 9 for Server Systems Old file server model Run one application per physical server Server specified for worst case load Large number of typically underutilized servers Huge aggregate space capacity Competition from mainframes VMM provides dynamic load balancing Hardware provides centralized power, cooling, monitoring, backup High SAR — scalability, availability, reliability Lower cost per served client than server farm Virtualization in server Partition hardware resources to run independent applications Intel virtualization IA-32 and IA-64 ISA support I/O chipset support

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 10 HP Virtual Partitions (vPars)

Boot Order

Hewlett-Packard, "Installing and Managing HP-UX Virtual Partitions (vPars)" Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 11 System VM Organization Hypervisor Virtual machine monitor (VMM) Lowest layer above physical hardware (host) Uniprocessor or multiprocessor system Creates virtual machine (VM) environments for guest OSs Allocates physical host resources to virtual resources VM overhead Processor intensive applications — low overhead Infrequent use of OS calls Most instructions run directly on hardware I/O intensive applications — high overhead Frequent use of OS calls OS calls for I/O services run in emulation I/O-limited applications Program throughput limited by I/O latency Emulation adds relatively small overhead

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 12 VMM Requirements Hardware abstraction Guest environment must replicate hardware VMM must present well-defined software interface to OS Protection Isolate guests from one another Protect VMM from guest OS and application software Guest software cannot change allocation of physical resources Privilege VMM runs in kernel mode Guest OSs and applications run in user mode Hardware support for VMM Virtualization primitives built into mainframe ISA Any OS or application access to hardware causes trap to VMM VMM catches every access to hardware abstraction layer (HAL)

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 13 Virtualization Awareness Virtualization-aware guest OS OS written to run above VMM/hypervisor Expects to interact with virtual host Does not expect full or direct control of physical hardware OS code interfaces with hypervisor code No need to remap (bluff) pointers intended for real hardware May be presented with view of real system for limited operations Example — mainframe OS Writes I/O outputs to hypervisor interface Does not attempt to configure I/O hardware devices Particular OS may be given direct control of particular I/O device Virtualization-unaware guest OS OS written to run above physical hardware Expects full and direct control of real hardware Requires extensive intervention and remapping by VMM

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 14 Hardware Emulation Activities OS sees hardware through operations OS instructions cause to CPU initiate memory and I/O operations I/O devices initiate DMA operations and interrupts

Application

Real Operation VMM Emulation OS Hardware

VMM CPU Read data or Translate data/instruction from guest to host format Memory instruction Remap address space Hardware Access Write data Read/Write to real host memory

CPU I/O Read data or Translate data/instruction from guest to host format Device instruction Remap I/O port space Access Write data Read/Write to real host I/O device

VMM manages I/O device DMA I/O device DMA or IRQ actions Translate OS interrupt handlers from guest format to host format

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 15 Full/Partial System Emulation Full system emulation VMM intervenes in every OS access to hardware

CPU Translates guest ISA to host ISA Memory Translates memory size and organization Chipset Translates guest configuration instructions to host I/O devices Translates guest driver to host driver

CPU emulation example Run Nintendo game on PC Translate each Nintendo instruction to IA-32 instruction set Partial system emulation Part of host hardware presented to OS unchanged VMM passes guest operations to host with minimal intervention Most system VMs emulate subset/superset of real host hardware CPU emulation only in special cases

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 16 Software Emulation of I/O Hardware Advantages VMM provides emulation of widely supported device hardware Guest OS runs available device drivers without modification Difficulties Requires very accurate device emulation Includes hardware revisions and "bug emulation" Performance issues VMM intervention on every guest OS access to I/O device Context switch from guest OS to VMM VMM emulates I/O access and access to real I/O device Context switch back to guest OS with response Adds considerable overhead Emulation is compute-intensive — increases CPU utilization Least-bad case Virtual device = real device Remap I/O ports — no change to driver operation

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 17 Bootstrap Process in System VM

Workstation without VMM Workstation with VMM

CPU loads initial system loader (ISL) CPU loads initial system loader (ISL) System ISL points to system boot device ISL points to system boot device boot Boot device contains OS Boot device contains VMM

OS loader writes to host I/O space VMM loader writes to host I/O space Device Chipset and I/O devices respond Chipset and I/O devices respond Discovery OS loads drivers for host devices VMM loads drivers for host devices OS provides user interface VMM provides administrator interface

Administrator configures VM partitions Secondary Administrator points VMM to device Boot containing OS boot image VMM boots OS into partition

OS loader writes to virtual I/O space Device VMM responds for I/O devices Discovery OS loads drivers for virtual devices OS provides user interface

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 18 Virtualization Difficulties for IA‐32 IA-32 designed to provide hardware support to OS Memory segmentation Virtual memory and paging Task management Interrupt management Protection and privilege for segmentation, paging, interrupts

Workaround virtualization Treat OS like user application Application Application User Can create a kludge on IA-32 systems OS OS VMM IA-32 operating systems Kernel Expect to have highest privilege Hardware Hardware Can easily discover their lower privilege Virtual Real

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 19 Memory Resource Compression OS manages resources using IA-32 system tables Assigns pointer to page table root (directory) Manages page table entries Manages memory segmentation with descriptor tables limited to 8 K entries Global descriptor table (GDT) Map segment pointer to virtual address Define segment (code, data, system) and privilege level Interrupt descriptor table (IDT) Map interrupts and traps to service routines

Memory compression VMM must reserve part of guest virtual memory for management OS expects to see the full virtual memory space

Table resource compression VMM requires entries in GDT and IDT for management of OS VMM must prevent OS access to its descriptors OS expects full control of all 8 K table entries

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 20 Ring Aliasing Privilege rings Memory segments assigned privilege from 0 (highest) to 3 (lowest) Stored in segment descriptor (table entry defining segment) Access rights for code limited to segments of same or lower privilege Copied into code segment selector (pointer to segment via descriptor) User mode ~ ring 3 OS kernel mode ~ ring 0 Acc ess Ring aliasing Granted Deprivileging Acc ess Denied Run VMM at ring 0 and OS at ring 1 CPL Issues CPL Paging restricted to two levels CPL DPL 4 level privilege not supported in 64-bit systems DPL 0 1 2 3 OS can read its CPL from code segment selector CPL DPL

DPL CPL — privilege level of code segment DPL — privilege level for data access or branch target

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 21 Non‐Faulting Access to Privileged State Privileged registers Control configuration of hardware systems VMM must Intercept OS access to privileged registers Provide virtual values based determined for guest environment Access to privileged registers in IA-32 Access by unprivileged software usually prevented Causes protection fault VMM emulates response to guest instruction Some unprivileged accesses privileged state and do not fault

GDTR pointer to GDT On user access to system state IDTR pointer to IDT Protection fault on write LDTR pointer to LDT No fault on read TR pointer to current task segment

Guest OS can determine that it does control CPU

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 22 System Calls and Interrupts System calls Application in ring 3 invokes OS in ring 0 Require indirect mechanism (call gate) Redirects to hidden ring 0 address VMM must emulate call gates SYSENTER instruction provides fast calls to ring 0 Will call VMM instead of guest OS SYSEXIT instruction ends SYSENTER routine Faults to ring 0 if executed from lower privilege VMM must emulate response to SYSENTER/SYSEXIT Interrupts Interrupts can be masked by controlling interrupt flag (IF) VMM must mask interrupts and handle interrupts by emulation Some OSs toggle IF frequently requiring many VMM interventions

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 23 Intel Virtualization Technology (VT) Virtual machine monitor Hardware boots (3rd party) VMM software instead of OS VMM configures hardware resources among guest systems Remaps hardware locations to virtual pointers for guests OSs boot within guest partitions Hardware support for virtualization VT-enabled processors alternate between operating modes Root mode grants full hardware control to VMM Non-root mode presents virtual pointers to guest OS VT-enabled chipset Ring 3 User Grants control of I/O to root mode VMX User Privilege Remaps I/O channels for non-root mode non‐root Ring 0 OS Operating system Virtual Full Privilege Sees virtual machine as real system VMX root VMM Real Full Privilege Operates in ring 0 for maximum privilege Sends instructions to hardware pointers in usual way http://www.intel.com/technology/itj/2006/v10i3/index.htm

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 24 System Issues in Virtualization

CPU virtualization support Handles operations initiated by CPU

Memory access by guest software PCI Host-to-Bus Bridge ROM (bus controller) VMM assigns virtual address RAM space to guest OS I/O access by guest software CPU VMM translates OS driver PCI (expansion) bus output for host device

ISA Graphics I/O I/O I/O I/O Bridge Chipset virtualization support Handles operations initiated by I/O device ISA/EISA bus Interrupts and DMA accesses by I/O device

Intercepted by VMM and remapped disk I/O

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 25 VT‐x for IA‐32 Processor Virtualization Virtual machine extensions (VMX) VMX root operation Operating mode designed for VMM Grants highest privilege access to host CPU hardware state VMX non-root operation Operating mode designed for guest OS Presents OS with virtual host configured by VMM OS sees standard ring 0 access to virtual IA-32 resources OS access to privileged state trapped by VMM

Mode transitions VM entry VMX root operation → VMX non-root operation VM VM exit VMX User non‐root OS VMX non-root operation → VMX root operation VMX root VMM VM Entry Hardware Host

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 26 Virtual Machine Control Structure Virtual-machine control structure (VMCS) Used for mode transition management VM entry Saves processor state to VMCS host-state area Loads processor state from VMCS guest-state area VM exit Saves processor state to VMCS guest-state area Loads processor state from VMCS host-state area VMCS host-state area Segment register selectors for VMM operations Privileged system table pointers (GDTR, IDTR, TR, page table root) VMCS guest-state area Segment register selectors for OS operations Virtual system table pointers determined by VMM VMM physical address space not mapped to guest OS virtual address space Interrupt flag (IF)

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 27 VMCS Details Referenced by physical address No page table entry in any guest address space Location determined by VMM software VMCS structure Not determined by architecture Defined as set of VMCS access host instructions VMM author chooses implementation VM entry Loads table pointers from VMCS Pointer updates cause context shift to VM process VMM can optionally inject virtual event (interrupt) to cause VM response VM exit VM saves context to memory All VMs exit to common entry point in VMM VM exit records details of reason for exit in VMCS VMM provides detailed response to VM exit

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 28 VMCS Control Fields Settable options for interrupt virtualization

VM exit on external interrupt External‐interrupt exiting External interrupts not maskable by guest Interrupt‐window exiting VM exit if guest allows interrupts

Guest/Host mask for control register virtualization Status flags in control registers determine processor options VMM masks selected flags to prevent write by guest Guest write to masked flag causes VM exit Guest reads flag value specified by VMM in VMCS VM exit bitmaps VMM chooses subset of guest actions that cause VM exit Exception bitmap — 32 exceptions that optionally cause VM exit I/O bitmap — each 16-bit I/O port can be set to VM exit on guest access Instruction bitmap — selects privileged instructions that cause VM exit

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 29 VT‐x Solves Virtualization Problems Ring aliasing and compression Guest software runs at intended privilege level Address-Space Compression Guest/VMM transitions can change virtual address space Guest software has full use of its own address space VMCS resides in physical address space Does not use not linear address space Nonfaulting access to privileged state VMCS controls interrupts VMM allows guest OS access to privileged registers Accesses cause transition to VMM System calls Guest OS runs at ring 0 as intended Interrupts VMCS controls response to interrupt through VMCS

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 30 VT‐x Exception Handling

not set OS handles OS continues in bitmap

VMM services VM entry Exception exception

set in VM exit bitmap to VMM

VMM updates event system tables injection

Event injection replicates exception Possible updates — page tables, system registers, I/O configuration, ...

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 31 Interrupt Virtualization

Set option external-interrupt exiting

VM exit VMM prepares event Interrupt VM entry to VMM system tables injection

Event injection replicates interrupt

Possible updates — interrupt tables, system registers, I/O configuration, ...

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 32 VT‐d for PCI Chipset Virtualization VMM allocates resources to guest OSs

Virtual address space CPU Bridge RAM Virtual I/O devices mapped to real I/O devices OS accesses real I/O device through VMM mapping DMA remapping I/O I/O I/O OS configures virtual I/O devices Enables device-initiated DMA operations to guest address space Real I/O device must write to guest OS through emulation mapping Interrupt remapping Real I/O devices my interrupt CPU Interrupt intended for one guest OS Real I/O device must deliver interrupt to guest OS through emulation mapping

http://www.intel.com/technology/itj/2006/v10i3/index.htm

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 33 DMA Protection Domains Protection domain Subset of physical memory allocated for device-initiated DMA Protection domains may be allocated to VMM Guest OS Driver process running under guest OS I/O device May be assigned to a protection domain Can only perform DMA to assigned protection domain DMA address translation I/O device DMA request to bridge contains memory address VT-d treats request address as DMA virtual address (DVA) Guest Physical Address (GPA) of guest OS General software-generated virtual I/O address DVA translated to Host Physical Address (HPA)

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 34 Mapping I/O Devices to Protection Domains PCI device requester ID Identifies DMA device and request PCI Bus Device Function Assigned by PCI configuration software during device discovery Root-Entry Table Index — 8-bit bus number from requester ID Entry — Pointer to context-entry table Context-Entry Table Index — 8-bit device/function number from requester ID Entry — pointer to page structure used to translate DVA Page structure Multilevel table structure Similar to IA-32 page tables

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 35 Address Space Overview

VMM

Page Page Tables HPA Structures Context Entry Host Table GVA Physical HPA DMA Guest Memory DVA Virtual Virtual GPA Memory Memory Root Entry Guest Table Physical Memory PCI Bus Device Function I/O device DMA Request ID

Virtual Emulated Physical Virtual Memory Physical Memory Memory Memory

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 36 IA‐32 Interrupt Handling Legacy interrupts Interrupt controller in chipset handles device interrupts Programmable Interrupt Controller (PIC) integrated into ISA chipset APIC (Advanced PIC) integrated into PCI chipset I/O device assigned interrupt request (IRQ) connection to APIC APIC Translates device IRQ to 8-bit CPU interrupt number n Sends hardware interrupt signal (INTR) to processor CPU Loads 64-bit entry n from Interrupt Descriptor Table (IDT) Entry points to Interrupt Service Routine (ISR) Message signaled interrupts (MSI) I/O APIC in PCI chipset formats IRQ signal into structured message Message transferred on PCI bus as device-initiated DMA operation Local APIC in CPU receives and decodes message

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 37 Message Interrupt Handling Local APIC CPU interrupt controller Receives/decodes local interrupt signals Receives interrupt messages from I/O APIC

I/O APIC PCI chipset interrupt controller Receives/decodes device IRQ signals Sends/receives interrupt messages

IA-32 Intel Architecture Software Developer’s Manual

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 38 Interprocessor Interrupts Interprocessor Interrupt (IPI) Subset of APIC interrupt message table CPU writes to interrupt command register (ICR) in local APIC Local APIC issues IPI message on system bus Used to boot and spawn threads in multiprocessor system

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 39 Interrupt Remapping Message signaled interrupt (MSI) Encodes interrupt vector and destination processor Real I/O device not aware of guest OS view of emulated I/O device VMM must intercept MSI

VMM redefines interrupt message format Provides substitute MSI DMA write request contains Message identifier No interrupt attributes (vector and destination processor) Requester ID of real I/O device generating interrupt Requester ID mapped through table structure (root/context tables) Points to interrupt remapping table (IRT) Entry provides vector and destination processor

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 40 Caching of Remapping Structures VT-d supports hardware caching of remapping tables Root/Context tables Paging structures IOTLB Interrupt remapping table entries

VMM responsible for maintaining remapping cache Must invalidate stale cache entries

Remapping errors DMA access request returns error message Device response to error implementation dependent Errors logged to VMM VMM may reset cache or I/O device configuration tables

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 41 VirtualBox Open source hosted VMM by Oracle (Sun Microsystems) Runs on Intel and AMD hardware Runs above Windows, Linux, Mac OS X (Intel), Solaris Provides VM with guest OS Standard DOS, Windows, Linux, OS/2, FreeBSD, Solaris Uses support if available (not required)

Scheduling Host OS grants timeslice to VM VM sub-processes scheduled by guest OS

Application Application Application Guest OS VirtualBox Hypervisor Host OS x86 Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 42 VirtualBox Architecture Front-end (client) VirtualBox hypervisor Runs above host OS Without Intel VT performs workaround virtualization Hypervisor runs in ring 0 of guest context Guest OS runs as user program in ring 1 of guest context Limited use of Intel VT if available Application Guest OS Hypervisor Back-end (server) Host OS Ring 0 driver in host OS VirtualBox Driver Copes with "gory details of x86 architecture" Allocates physical memory for VM (guest OS) Saves/restores guest CPU context during host interrupt Registers and descriptor tables No intervention in guest OS process management

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 43 CPU Operating Modes Runs native (not emulated) on CPU Host applications at ring 3 Host OS code at ring 0 Guest "safe" application code at ring 3 Non-system activities Makes system calls to guest OS Runs emulated on CPU at ring 3 Guest application code that causes guest OS interventions Disable interrupts Trap of prohibited accesses Executes code Each instruction interpreted by VirtualBox driver Interpreted code run in CPU instead of native code Runs native on CPU at ring 1 Guest OS ring 0 code VirtualBox driver handles "gory details" of workaround

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 44 Open source system VMM Runs on Intel and AMD x86 hardware Runs directly above hardware Linux required to build and install Xen Provides VMs with guest OSs Linux, Solaris, Windows XP, 2003 Server Hardware virtualization support required for Windows guest OS

Para-virtualization for Linux/Unix guest OS OS kernel modified to support Xen explicitly Operating systems ported to run on Xen Similar effort to porting OS to new hardware platform Para-virtual machine architecture very similar to native hardware User space applications and libraries not modified

Xen Architecture Overview, http://wiki.xensource.com/xenwiki

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 45 Xen Architecture Xen hypervisor Directly above hardware Boots system on on start-up Domain 0 Initialized by hypervisor on boot Runs XenLinux — modified Linux kernel Provides Domain Management and Control (DMC) Domain U VM running guest OS

DMC Application Application Application XenLinux OS Guest OS Guest OS Guest OS Domain 0 Domain U Domain U Domain U Xen Hypervisor x86 Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 46 Hypervisor Full privilege Operates directly on hardware in ring 0 Functions CPU scheduling for virtual machines Memory partitioning Provides hardware abstraction to virtual machines No awareness of Networking External storage devices

Video Domain Domain Domain Domain Common I/O U U U 0 Xen Hypervisor Scheduler Partitioner

Process Page I/O x86 Hardware CPU List Tables Memory

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 47 Domain 0 XenLinux Modified Linux kernel running in unique VM over hypervisor Direct privileged access rights to physical I/O resources Provides I/O virtualization to Domain U guest VMs Generic I/O drivers Network Backend Driver Manages local networking hardware Processes all VM networking requests from Domain U guests Block Backend Driver Manages local storage disk Domain Domain Domain 0 Processes all read/write data requests U U I/O Drivers from Domain U guests Scheduler Partitioner

Process Page I/O CPU List Tables Memory

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 48 Domain U PV Domain U PV guests Paravirtualized VM running modified Linux/UNIX kernels OS expectations No direct access to host hardware Shares host hardware with other VMs Guest drivers provide I/O access Access backend drivers in Domain 0 PV Network Driver PV Block Driver

Domain 0 Domain 0 Domain U PV Driver PV Driver Backend Driver

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 49 Domain U HVM Domain U HVM Guests Fully virtualized machines Run standard Windows or other unmodified OS OS runs as VMX non-root operation with VT-x

OS expectations Domain 0 Domain 0 Domain U No hardware virtualization OS Driver OS Driver daemon Not sharing with other VMs Normal hardware access for boot Xen virtual firmware runs as VMX root operation with VT-x Simulates BIOS expected by OS on initial startup

I/O support No special drivers Domain 0 runs Qemu-dm daemon for each HVM Guest Supports Domain U HVM Guest for networking and disk access requests

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 50 Domain Management Xend daemon Python application running in Domain 0 System manager for Xen environment Processes requests as XML remote procedure call (RPC) Qemu-dm Daemon handles networking and disk requests from Domain U HVM Provides full emulation of hardware for standard OS I/O drivers Virtual firmware Provides full emulation of BIOS for Domain U HVM Guest OS

Xend Unix Application Linux Application Windows Application Qemu XenLinux OS XenUnix XenLinux Standard Windows Domain 0 Domain U PV Domain U PV Domain U HVM Xen Hypervisor x86 Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 51 Domain U PV to Domain 0 Communication Domain U PV Guest requests I/O from Domain 0 via hypervisor No direct support in hypervisor for I/O Inter-Domain event channel Domain 0 and each Domain U have shared memory area Asynchronous inter-domain interrupts implemented in hypervisor Example — Domain U PV Guest data write to hard disk Guest OS sends write request to PV block driver Guest PV block driver Writes data to Domain 0 shared memory through hypervisor Sends inter-domain interrupt to Domain-0 through hypervisor Domain 0 receives interrupt from hypervisor Triggers PV Block Backend Driver access to shared memory Backend Driver Reads blocks from Domain U PV Guest shared memory Writes data to hard disk

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 52 I/O Driver Communication

DMC Unix Application

Write request

Backend block Driver PV block driver Write XenLinux XenLinux disk Interrupt Domain 0 Domain U PV Interrupt

Read Write shared shared memory memory Xen Hypervisor

Interrupt Interrupt x86 Hardware

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 53 Xen PV and HVM Performance

Test Configuration Intel Xeon @ 2.3 GHz 4 GB DDR2 533 MHz memory 160 GB Seagate SATA disk Intel E100 Ethernet controller

Dong, et. al., "Extending Xen with Intel Virtualization Technology", Intel Technical Journal

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 54 I/O Bottleneck Bottleneck — Single Ethernet controller Guest OS tasks waiting for I/O access hides performance degradation caused by virtualization

Web server running over native Linux without Xen Threads compete above 2.5 Gbps

Web server running over XenLinux in Domain 0 Threads compete above 1.9 Gbps

Web server running over XenLinux in Domain U PV Threads compete above 0.9 Gbps

Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 55