Xen and the Art of

Ian Pratt University of Cambridge and XenSource Outline

æVirtualization Overview æXen Architecture æVM Relocation deep dive æDebugging and profiling æXen Research æDemo æQuestions Virtualization Overview

æSingle OS image: OpenVZ, Vservers, Zones ° Group user processes into resource containers ° Hard to get strong isolation æ Full virtualization: VMware, VirtualPC, QEMU ° Run multiple unmodified guest OSes ° Hard to efficiently virtualize æPara-virtualization: ° Run multiple guest OSes ported to special arch ° Arch Xen/x86 is very close to normal Benefits

Consolidate under-utilized servers

X Avoid downtime with VM Relocation

Dynamically re-balance workload to guarantee application SLAs X X Enforce security policy Virtualization Possibilities

æStanding outside the OS looking in: ° Firewalling/ network IDS / Inverse Firewall ° VPN tunneling; LAN authentication ° Virus, mal-ware and exploit detection ° OS patch-level status monitoring ° Performance monitoring and instrumentation ° Storage backup and optimization ° Debugging support Virtualization Benefits

æSeparating the OS from the hardware ° Users no longer forced to upgrade OS to run on latest hardware æDevice support is part of the platform ° Write one rather than N ° Better for system reliability/availability ° Faster to get new hardware deployed æEnables —Virtual Appliances“ ° Applications encapsulated with their OS ° Easy configuration and management Xen 3.0 Highlights

æx86, x86_64, ia64 and initial Power support æLeading performance æSecure isolation and QoS control æSMP guest OSes æHotplugCPUs, memory and devices æGuest save/restore and live relocation æVT/AMDV support: "HVM— ° Run unmodified guest kernels ° Support for progressive Xen 3.0 Architecture

VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software

GuestOS GuestOS GuestOS Unmodified (XenLinux) (XenLinux) (XenLinux) GuestOS AGP (WinXP)) ACPI Back-End SMP PCI Native Device Front-End Front-End Front-End Drivers Device Drivers Device Drivers Device Drivers VT-x x86_32 Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU x86_64 Xen Monitor IA64 Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Para-Virtualization in Xen

æXen extensions to x86 arch ° Like x86, but Xen invoked for privileged ops ° Avoids binary rewriting ° Minimize number of privilege transitions into Xen ° Modifications relatively simple and self-contained æModify kernel to understand virtualised env. ° Wall-clock time vs. virtual processor time • Desire both types of alarm timer ° Expose real resource availability • Enables OS to optimise its own behaviour Xen 3 API support

æLinux 2.6.16/17/18 and -rc/-tip æ2.6.5 2.6.9.EL 2.4.21.EL æAvailable in distros: FC4, FC5, FC6, SuSELinux10, SLES10, , Gentoo,… æNetBSD3, FreeBSD 7.0, OpenSolaris10, Plan9, minix, …

æLinux upstream submission process agreed at Kernel Summit Xen 2.0 Architecture

VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software

GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (Solaris)

Back-Ends

Native Device Front-End Front-End Front-End Drivers Device Drivers Device Drivers Device Drivers

Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor

Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) I/O Architecture

æXen IO-Spaces delegate guest OSes protected access to specified h/w devices ° Virtual PCI configuration space ° Virtual interrupts ° (Need IOMMU for full DMA protection) æDevices are virtualised and exported to other VMsvia Device Channels ° Safe asynchronous shared memory transport ° ”Backend‘ drivers export to ”frontend‘ drivers ° Net: use normal bridging, routing, iptables ° Block: export any blkdev e.g. sda4,loop0,vg3 æ(Infiniband/ —Smart NICs“ for direct guest IO) System Perform ance

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 L X V U L X V U L X V U L X V U SPEC INT2000 (score) build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U) specjbb2005

9000 Native 0.53% 0.73% 8000 Xen 7000 6000 5000 4000 3000 2000 1000 0 Opteron P4 Xeon dbench

300 Native Xen 250

200 s /

B 150 M

100

50

0 1 2 4 8 16 32 # processes Kernel build

350 5% Native 300 Xen

250 32b PAE; Parallel m ake, 4 processes per CPU ) 200 s (

e 8% m i 150 T

100 13% 18% 23% 50

0 1 2 4 6 Source: 8XenSource, Inc: 10/06 # Virtual CPUs TCP results

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 L X V U L X V U L X V U L X V U Tx, MTU 1500 (Mbps) Rx, MTU 1500 (Mbps) Tx, MTU 500 (Mbps) Rx, MTU 500 (Mbps) TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U) Scalability

1000

800

600

400

200

0 L X L X L X L X 2 4 8 16 Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X) Scalability • Scaling many domains – One domain per processor core, 1 to many cores/domains – Presented data showing problems at OLS on 64-bit domains • Since then, tested 32-bit domains, which look fine • 64-bit did not scale because of excessive TLB flushes, which put too much pressure on global update of tlbflush_clock in Xen. Jun has fixed with TLB global bit patch –no need to flush all TLB entries for syscalls, etc, which also does not update tlbflush_clock. Patch also improves single domain performance by keeping user TLB entries.

Big jump with less atomic updates to tlbflush_clock

64-bit Xen Paxville processors 1 domain/processor core 2 vCPUs/domain, 1 per hyperthread 8 cores total OLS2006 1st core for dom019 Scalability • Scaling a large domain, 1 to many vCPUs – Elimination of writable page-tables in xen-unstable helps a lot (red bars below) but still have significant scaling problems beyond 6-way – Now need fine grain locking in mm functions; should be easier to implement with writable page-tables gone. (green bars have domain big-lock removed) – TLB global bit optimization also helps scalability (blue bars)

Need to confirm this data point

64-bit Xen Paxville processors 2-way = 1 processor core 4-way = 2 cores, etc, (HT enabled) 8 cores total

OLS2006 20 3 0 4 G G G x B B B 8 6 _ K 3 U e X s r e 2 n e n r e

l U S S

ring 3 ring 1 ring 0 æ æ æ æ u p V X X S S P k e A r n y e e e A o r s g n n E

n t t s h m

e e e p r 3 f a c e o m l e a

n t n s r n c s

g

e o e c > t

e r X a w a v 4 d t e l

e l i G

o n s s s u B n

p f t p

r e o m o p e p m o e d

r o m

t f s

x86_64

264 æLarge VA space makes life Kernel U a lot easier, but: æNo segment limit support S Xen ÏNeed to use page-level 264-247 Reserved protection to protect 247

User U

0 x86_64

æRun user-space and kernel in ring 3 using different r3 User U pagetables ° Two PGD‘s(PML4‘s): one with user entries; one with user r3 Kernel U plus kernel entries syscall/sysret æSystem calls require an r0 Xen S additional syscall/ret via Xen æPer-CPU trampoline to avoid needing GS in Xen x86 CPU virtualization

æXen runs in ring 0 (most privileged) æRing 1/2 for guest OS, 3 for user-space ° GPF if guest attempts to use privileged instr æXen lives in top 64MB of linear addr space ° Segmentation used to protect Xen as switching page tables too slow on standard x86 æHypercallsjump to Xen in ring 0 æGuest OS may install ”fast trap‘ handler ° Direct user-space to guest OS system calls æMMU virtualisation: shadow vs. direct-mode Para-Virtualizing the M M U

æGuest OSes allocate and manage own PTs ° Hypercall to change PT base æXen must validate PT updates before use ° Allows incremental updates, avoids revalidation æValidation rules applied to each PTE: 1. Guest may only map pages it owns* 2. Pagetable pages may only be mapped RO æXen traps PTE updates and emulates, or ”unhooks‘ PTE page for bulk updates M M U Virtualization : Direct-M ode

guest reads Virtual  Machine guest writes Guest OS

Xen VMM Hardware MMU W riteable Page Tables : 1 – W rite fault

guest reads Virtual  Machine first guest write Guest OS

page fault Xen VMM Hardware MMU W riteable Page Tables : 2 – Em ulate?

guest reads Virtual  Machine first guest write Guest OS

yes emulate? Xen VMM Hardware MMU W riteable Page Tables : 3 - Unhook

guest reads Virtual  Machine guest writes X Guest OS

Xen VMM Hardware MMU W riteable Page Tables : 4 - First Use

guest reads Virtual  Machine guest writes X Guest OS

page fault Xen VMM Hardware MMU W riteable Page Tables : 5 – Re-hook

guest reads Virtual  Machine guest writes Guest OS

validate

Xen VMM Hardware MMU M M U M icro-Benchm arks

1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U Page fault (µs) Process fork (µs) lmbench results on Linux (L), Xen (X), VMWare Workstation (V), and UML (U) SM P Guest Kernels

æXen support multiple VCPUsper guest ° Virtual IPI‘ssent via Xen event channels ° Currently up to 32 VCPUssupported æSimple hotplug/unplug of VCPUs ° From within VM or via control tools ° Optimize one active VCPU case by binary patching spinlocks æNB: Many applications exhibit poor SMP scalability œ often better off running multiple instances each in their own OS SM P Guest Kernels

æTakes great care to get good SMP performance while remaining secure ° Requires extra TLB syncronization IPIs æSMP scheduling is a tricky problem ° Wish to run all VCPUsat the same time ° But, strict gang scheduling is not work conserving ° Opportunity for a hybrid approach æParavirtualized approach enables several important benefits ° Avoids many virtual IPIs ° Allows ”bad preemption‘ avoidance ° Auto hot plug/unplug of CPUs Driver Dom ains

VM0 VM1 VM2 VM3 Device Unmodified Unmodified Manager & Driver User User Control s/w Domain Software Software

GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (XenBSD)

Back-End Back-End

Native Native Device Device Front-End Front-End Driver Driver Device Drivers Device Drivers

Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor

Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Device Channel Interface Isolated Driver VM s

æRun device drivers in separate domains 350

æDetect failure e.g. 300 ° Illegal access 250 ° Timeout 200 æKill domain, restart 150 100

æE.g. 275ms outage 50

from failed Ethernet 0 0 5 10 15 20 25 30 35 40 driver time (s) IO Virtualization

æIO virtualization in s/w incurs overhead ° Latency vs. overhead tradeoff • More of an issue for network than storage ° Can burn 10-30% more CPU than native æDirect h/w access from VMs ° Multiplexing and protection in h/w ° Xen infinibandsupport ° Smart NICs / HCAs Sm art NICs e.g. Solarflare

æ Accelerated routes set up by Dom0 ° Then DomU can access hardware directly æ NIC has many Virtual Interfaces (VIs) ° VI = Filter + DMA queue + event queue æ Allow untrustedentities to access the NIC without compromising system integrity

Dom0 DomU DomU Dom0 DomU DomU

Hypervisor Hypervisor Hardware Hardware Early results

1400 1200 1000 800 600 400 200 0 STREAM rx STREAM tx TCP/RR Vanilla Xen (M bps) (M bps) (trans per Direct Guest IO 100m s) Xen Dom 0 CPU Usage

140

120

100 Vanilla Xen 80 (Dom 0+Dom U) Direct Guest IO 60 (Dom 0 + Dom U) Xen Dom 0 (Dom 0) 40

20

0 STREAM rx STREAM tx TCP/RR VT-x / AM D-V : hvm

æEnable Guest OSes to be run without modification ° E.g. legacy Linux, Windows XP/2003 æCPU provides vmexitsfor certain privileged instrs æShadow page tables used to virtualize MMU æXen provides simple platform emulation ° BIOS, apic, iopaic, rtc, Net (pcnet32), IDE emulation æInstall paravirtualized drivers after booting for high-performance IO æPossibility for CPU and memory paravirtualization ° Non-invasive hypervisor hints from OS HVM Architecture

Domain 0 Domain N Guest VM (VMX) Guest VM (VMX) (32-bit) (64-bit) Linux xen64 ( x C M D m P o e o a / n v x d

n Unmodified OS Unmodified OS 3D

3P t i e e e r c n l o l e s

d l

Linux xen64 ) F r o V n i r B t D t

u e a r i n a c v l k d e Guest BIOS Guest BIOS e d

r V n r s 0D i Native i

Native d v 1/3P r t e Device u

Device r Virtual Platform Virtual Platform a l

Drivers Drivers Callback / Hypercall VMExit VMExit PIC/APIC/IOAPIC Event channel emulation 0P

Control Interface Scheduler Event Channel Hypercalls Processor Memory I/O: PIT, APIC, PIC, IOAPIC Xen Hypervisor Progressive paravirtualization

æHypercall API available to HVM guests æSelectively add PV extensions to optimize ° Net and Block IO ° XenPIC(event channels) ° MMU operations • multicast TLB flush • PTE updates (faster than page fault) • Page sharing ° Time ° CPU and memory hotplug PV Drivers

Domain 0 Domain N Guest VM (VMX) Guest VM (VMX) (32-bit) (64-bit) Linux xen64 ( x C M D m P o e o a / n v x d

n Unmodified OS Unmodified OS 3D

3P t i e e e r c n l o l e s

d l

Linux xen64 ) F F D D E E

r r V V i i v v F i i r r e e r t t r r o u u V s s a a n i r B l l t D

t

u e a r i n a c v l k d e Guest BIOS Guest BIOS e d

r V n r s 0D i Native i

Native d v 1/3P r t e Device u

Device r Virtual Platform Virtual Platform a l

Drivers Drivers Callback / Hypercall VMExit VMExit PIC/APIC/IOAPIC Event channel emulation 0P

Control Interface Scheduler Event Channel Hypercalls Processor Memory I/O: PIT, APIC, PIC, IOAPIC Xen Hypervisor HVM I/O Perform ance

1000

900 rx tx 800

700

600 s / 500 b M

400

300

200

100

0 ioemu PV-on-HVM PV M easured with ttcp, 1500 byte M TU Source: XenSource, Sep 06 IOEm u stage #2 Guest VM (VMX) Guest VM (VMX) (32-bit) (64-bit) Domain 0 Domain N

Linux xen64 Unmodified OS Unmodified OS ( x C m P o F F a / n D D E E x

n 3D

3P t

e r r e r V V i i n o v v l i i

d r r l e e

Linux xen64 t t ) r r u u s s a a l l

F r o V Guest BIOS Guest BIOS n i r B t D t

u e a r i n a c v Virtual Platform Virtual Platform l k d e

e d

r V n r s 0D i Native i

Native d v 1/3P r

t VMExit VMExit e Device u

Device r a l

Drivers Drivers IO Emulation IO Emulation Callback / Hypercall

Event channel 0P

Control Interface Scheduler Event Channel Hypercalls Processor Memory I/O: PIT, APIC, PIC, IOAPIC Xen Hypervisor HVM Perform ance

æVery application dependent ° 5% SPECJBB (0.5% for fully PV) ° OS-intensive applications suffer rather more ° Performance certainly in-line with existing products ° Hardware support gets better every new CPU æMore optimizations forthcoming ° —V2E“ for trap-intensive code sequences ° New shadow pagetable code Full Virtualization • Shadow2 code shows vast improvement in processor performance/scaling – Boosts single domain throughput significantly -Dbench (running in tmpfs filesystem, NO I/O) – Many HVM domain scalability also improved dramatically • 1 to 7 HVMs scaled 1 : 6.47 with shadow2. Same test scaled 1 : 2.63 with old shadow code

DomU

HVM, original shadow

HVM shadow2

32-bit PAE Xen Paxville processors Uniprocessor domains, One domain/core. 8 cores total 1st core for dom0 HT not used 1 2 3 4 OLS20506 6 7 SMP HVM not st4a9ble

Number of domains Shadow Pagetables

guest reads Virtual  Pseudo-physical

guest writes Guest OS Accessed & Updates dirty bits Virtual  Machine

VMM Hardware MMU Virtual Disk Storage

æLVM Logical Volumes are typically used to store guest virtual disk images today ° Not as intuitive as using files ° Copy-on-write and snapshot support far from ideal æStoring disk images in files using Linux‘s —loop“ driver doesn‘t work well æThe —Blktap“ driver new in Xen 3.0.3 provides an alternative : ° Allows all block requests to be serviced in user-space using zero-copy AIO approach Blktap and tapdisk plug-ins

æ Char device mapped by æ Flat files and qcow user-space driver æ Sparse allocation, CoW, æ Request/completion encryption, compression queues and data areas æ Correct metadata update æ Grant table mapping for safety zero-copy to/from guest æ Optimized qcowformat Blktap IO perform ance Tim e API

æShared info page contains per VCPU time records ° —at TSC X the time was Y, and the current TSC frequency has been measured as Z“ ° gettime: Read current TSC and extrapolate æWhen VCPUs migrate, update record for new physical CPU æPeriodic calibration of TSC's against pit/hpet ° Works even on systems with un-synced TSCs ° Update record after CPU frequency changes (p-state) ° Also, resynchronize records after CPU halt ° Only issue is thermal thottling Xen Tools

dom0 dom1

CIM xm Web svcs

xend

builder control save/ control restore

libxc

Priv Cmd Back xenbus xenbus Front

dom0_op Xen VM Relocation : M otivation

æVM relocation enables: ° High-availability Xen • Machine maintenance ° Load balancing • Statistical multiplexing gain Xen Assum ptions

æNetworked storage ° NAS: NFS, CIFS ° SAN: Fibre Channel Xen ° iSCSI, network block dev ° drdbnetwork RAID æGood connectivity Xen ° common L2 network ° L3 re-routeing

Storage Challenges

æVMs have lots of state in memory æSome VMs have soft real-time requirements ° E.g. web servers, databases, game servers ° May be members of a cluster quorum Ï Minimize down-time æPerforming relocation requires resources Ï Bound and control resources used Relocation Strategy

Stage 0: pre-migration VM active on host A Destination host selected (Block devices mirrored) Stage 1: reservation Initialize container on target host Stage 2: iterative pre-copy Copy dirty pages in successive rounds Stage 3: stop-and-copy Suspend VM on host A Redirect network traffic Synch remaining state Stage 4: commitment Activate on host B VM state on host A released Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Final W ritable W orking Set

æPages that are dirtied must be re-sent ° Super hot pages • e.g. process stacks; top of page free list ° Buffer cache ° Network receive / disk buffers æDirtying rate determines VM down-time ° Shorter iterations  less dirtying  … Rate Lim ited Relocation

æDynamically adjust resources committed to performing page transfer ° Dirty logging costs VM ~2-3% ° CPU and network usage closely linked æE.g. first copy iteration at 100Mb/s, then increase based on observed dirtying rate ° Minimize impact of relocation on server while minimizing down-time W eb Server Relocation

Effect of Migration on Web Server Transmission Rate 1000 1st precopy, 62 secs further iterations 870 Mbit/sec 9.8 secs 765 Mbit/sec 800

600 694 Mbit/sec

400 165ms total downtime

200 Throughput (Mbit/sec) 512Kb files Sample over 100ms 100 concurrent clients Sample over 500ms 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Elapsed time (secs) Iterative Progress: SPECW eb

52s Iterative Progress: Quake3 Quake 3 Server relocation

Packet interarrival time during Quake 3 migration

0.12

0.1

0.08 Migration 2 Migration 1 downtime: 48ms 0.06 downtime: 50ms

0.04

0.02

0 Packet flight time (secs) 0 10 20 30 40 50 60 70 Elapsed time (secs) System Debugging on Xen

æGuest crash dump support ° Post-mortem analysis ægdbserver ° No need for a kernel debugger æXentrace ° Fine-grained event tracing ° xenmon/xentop æXen oprofile ° Sample-based profiling Xen Oprofile

Oprofile dom0 dom1 dom2 report Sample log

OProfile

PC sample PC sample PC sample HW event HW event HW event

æSample-based profiling for the whole system Xen Oprofile exam ple

CPU: P4 / Xeon with 2 hyper-threads, speed 2794.57 MHz (estimated) Counted GLOBAL_POWER_EVENTS events with a unit mask of 0x01 (mandatory) count 1000000 samples % image name app name symbol name 30353 12.0434 domain1-kernel domain1-kernel __ copy_to_user_ll 7174 2.8465 xen-syms-3.0-unstable xen-syms-3.0-unstable do_grant_table_op 6040 2.3965 -syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 net_tx_action 5508 2.1854 xen-syms-3.0-unstable xen-syms-3.0-unstable find_domain_by_id 4944 1.9617 xen-syms-3.0-unstable xen-syms-3.0-unstable gnttab_transfer 4848 1.9236 domain1-xen domain1-xen evtchn_set_pending 4631 1.8375 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 net_rx_action 4322 1.7149 domain1-kernel domain1-kernel tcp_v4_rcv 4145 1.6446 xen-syms-3.0-unstable xen-syms-3.0-unstable hypercall 4005 1.5891 domain1-xen domain1-xen guest_remove_page 3644 1.4459 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 hypercall_page 3589 1.4240 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 eth_type_trans 3425 1.3590 domain1-xen domain1-xen get_page_from_l1e 2846 1.1292 domain1-kernel domain1-kernel eth_type_trans 2770 1.0991 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 e1000_intr

75 0.0298 domain1-apps domain1-apps (no symbols) 69 0.0274 domain1-kernel domain1-kernel ns_to_timespec 69 0.0274 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 delay_tsc 68 0.0270 domain1-kernel domain1-kernel do_IRQ 66 0.0262 oprofiled oprofiled odb_insert Xen Research Projects

æWhole-system pervasive debugging ° Lightweight checkpointing and replay ° Cluster/distributed system debugging æSoftware implemented h/wfault tolerance ° Exploit deterministic replay ° Explore possibilities for replay on SMP systems æMulti-level secure systems with Xen ° XenSE/OpenTC : Cambridge, , GCHQ, HP, … æVM forking ° Lightweight service replication, isolation ° UCSD Potemkin honeyfarm project Parallax æManaging storage in VM clusters. æVirtualizes storage, fast snapshots æAccess optimized storage

L1 Root A Snapshot Root B

L2

Data

V2E : Taint tracking I/O Taint Protected VM VN VD

Qemu*

Control VM Protected VM

ND DD VN VD

Taint Pagemap VMM

Net Disk

1234. IVTnaMbino trtur amnpdsas r pionkain nge mgeascu calearsteis omp ntroa,o rtpakr aetcgadkia naitntesegd dt att apoinia ndtgetiesdd.k. .TdFDaintiineastk-.eg deQra xpeitneamengudse isotanint MmDeaictrrakoieslcs dot a idnien oe ttmex-ptdoer ednsiafeiteoandn,t ,.at op nT adrhge refoele-w-gtca rtVa intnMatusi nl tamotirn ietgym aboucirltraymot iosaosnpn .drinea atVad M.mMov. ement. V2E : Taint tracking I/O Taint Protected VM VN VD

Qemu*

Control VM Protected VM

ND DD VN VD

Taint Pagemap VMM

Net Disk

1234. IVTnaMbino trtur amnpdsas r pionkain nge mgeascu calearsteis omp ntroa,o rtpakr aetcgadkia naitntesegd dt att apoinia ndtgetiesdd.k. .TdFDaintiineastk-.eg deQra xpeitneamengudse isotanint MmDeaictrrakoieslcs dot a idnien oe ttmex-ptdoer ednsiafeiteoandn,t ,.at op nT adrhge refoele-w-gtca rtVa intnMatusi nl tamotirn ietgym aboucirltraymot iosaosnpn .drinea atVad M.mMov. ement. Current Xen Status

x86_32 x86_32p x86_64 IA64 Power

Privileged Domains

Guest Domains

SMP Guests Save/Restore/Migrate >4GB memory Progressive PV Driver Domains Post-3.0.0 Rough Code Stats

aliases checkins insertions xensource.com 16 1281 363449 .com 30 271 40928 intel.com 26 290 29545 hp.com 8 126 19275 .com 8 78 17108 valinux.co.jp 3 156 12143 bull.net 1 145 11926 ncsc.mil 3 25 6048 fujitsu.com 13 119 6442 redhat.com 7 68 4822 amd.com 5 61 2671 virtualiron.com 5 23 1434 cam.ac.uk 1 9 1211 sun.com 2 9 826 unisys.com 3 7 857 Stats since other 30 189 48132 3.0.0 Release Xen Developm ent Roadm ap

æPerformance tuning and optimization ° Particularly for HVM and x86_64 æEnhanced control stack æMore automated system tuning æScalability and NUMA optimizations æBetter laptop/desktop support ° OpenGL virtualization, power management æNetwork optimziations Conclusions

æXen is a complete and robust hypervisor æOutstanding performance and scalability æExcellent resource control and protection æVibrant development community æStrong vendor support

æTry the Xen demo CD to find out more! (or Fedora Core 6, Suse 10.x)

æhttp://xensource.com/community Thanks!

æIf you‘re interested in working on Xen we‘re looking for Research Assistants and Associates in the Lab, and also folk to work in the XenSource Cambridge office.

æ[email protected] Current Device M odel Current Device M odel æQEMU Device Emulation ° Runs in Domain0 as User Process ° No Isolation to Properly Accounting for HVM I/O ° Doesn't Scale œ Launch New QEMU-DM For Every HVM Domain, Currently Maps All HVM Memory ° Many Costly Ring Transitions • On Every I/O Instruction Event Between QEMU and Kernel • On Every I/O Command Between QEMU and Real Device Drivers ° Many Costly VM Exits for Every I/O Stub Dom ain Stub Dom ain Requirem ents æRequirements ° Need User Space Context for QEMU • Some Devices Require User Process - vncserver ° Need Library Support for QEMU ° Need SMP Support ° Need to Run IO Emulation in Kernel Space ° Need to Run Frontend Drivers ° Need to Limit Impact to Build ° Need to Limit QEMU Maintenance Impact