Xen and the Art of Virtualization
Ian Pratt University of Cambridge and XenSource Outline
æVirtualization Overview æXen Architecture æVM Relocation deep dive æDebugging and profiling æXen Research æDemo æQuestions Virtualization Overview
æSingle OS image: OpenVZ, Vservers, Zones ° Group user processes into resource containers ° Hard to get strong isolation æ Full virtualization: VMware, VirtualPC, QEMU ° Run multiple unmodified guest OSes ° Hard to efficiently virtualize x86 æPara-virtualization: Xen ° Run multiple guest OSes ported to special arch ° Arch Xen/x86 is very close to normal x86 Virtualization Benefits
Consolidate under-utilized servers
X Avoid downtime with VM Relocation
Dynamically re-balance workload to guarantee application SLAs X X Enforce security policy Virtualization Possibilities
æStanding outside the OS looking in: ° Firewalling/ network IDS / Inverse Firewall ° VPN tunneling; LAN authentication ° Virus, mal-ware and exploit detection ° OS patch-level status monitoring ° Performance monitoring and instrumentation ° Storage backup and optimization ° Debugging support Virtualization Benefits
æSeparating the OS from the hardware ° Users no longer forced to upgrade OS to run on latest hardware æDevice support is part of the platform ° Write one device driver rather than N ° Better for system reliability/availability ° Faster to get new hardware deployed æEnables —Virtual Appliances“ ° Applications encapsulated with their OS ° Easy configuration and management Xen 3.0 Highlights
æx86, x86_64, ia64 and initial Power support æLeading performance æSecure isolation and QoS control æSMP guest OSes æHotplugCPUs, memory and devices æGuest save/restore and live relocation æVT/AMDV support: "HVM— ° Run unmodified guest kernels ° Support for progressive paravirtualization Xen 3.0 Architecture
VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software
GuestOS GuestOS GuestOS Unmodified (XenLinux) (XenLinux) (XenLinux) GuestOS AGP (WinXP)) ACPI Back-End SMP PCI Native Device Front-End Front-End Front-End Drivers Device Drivers Device Drivers Device Drivers VT-x x86_32 Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU x86_64 Xen Virtual Machine Monitor IA64 Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Para-Virtualization in Xen
æXen extensions to x86 arch ° Like x86, but Xen invoked for privileged ops ° Avoids binary rewriting ° Minimize number of privilege transitions into Xen ° Modifications relatively simple and self-contained æModify kernel to understand virtualised env. ° Wall-clock time vs. virtual processor time • Desire both types of alarm timer ° Expose real resource availability • Enables OS to optimise its own behaviour Xen 3 API support
æLinux 2.6.16/17/18 and -rc/-tip æ2.6.5 2.6.9.EL 2.4.21.EL æAvailable in distros: FC4, FC5, FC6, SuSELinux10, SLES10, Ubuntu, Gentoo,… æNetBSD3, FreeBSD 7.0, OpenSolaris10, Plan9, minix, …
æLinux upstream submission process agreed at Kernel Summit Xen 2.0 Architecture
VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software
GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (Solaris)
Back-Ends
Native Device Front-End Front-End Front-End Drivers Device Drivers Device Drivers Device Drivers
Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor
Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) I/O Architecture
æXen IO-Spaces delegate guest OSes protected access to specified h/w devices ° Virtual PCI configuration space ° Virtual interrupts ° (Need IOMMU for full DMA protection) æDevices are virtualised and exported to other VMsvia Device Channels ° Safe asynchronous shared memory transport ° ”Backend‘ drivers export to ”frontend‘ drivers ° Net: use normal bridging, routing, iptables ° Block: export any blkdev e.g. sda4,loop0,vg3 æ(Infiniband/ —Smart NICs“ for direct guest IO) System Perform ance
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 L X V U L X V U L X V U L X V U SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U) specjbb2005
9000 Native 0.53% 0.73% 8000 Xen 7000 6000 5000 4000 3000 2000 1000 0 Opteron P4 Xeon dbench
300 Native Xen 250
200 s /
B 150 M
100
50
0 1 2 4 8 16 32 # processes Kernel build
350 5% Native 300 Xen
250 32b PAE; Parallel m ake, 4 processes per CPU ) 200 s (
e 8% m i 150 T
100 13% 18% 23% 50
0 1 2 4 6 Source: 8XenSource, Inc: 10/06 # Virtual CPUs TCP results
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 L X V U L X V U L X V U L X V U Tx, MTU 1500 (Mbps) Rx, MTU 1500 (Mbps) Tx, MTU 500 (Mbps) Rx, MTU 500 (Mbps) TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U) Scalability
1000
800
600
400
200
0 L X L X L X L X 2 4 8 16 Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X) Scalability • Scaling many domains – One domain per processor core, 1 to many cores/domains – Presented data showing problems at OLS on 64-bit domains • Since then, tested 32-bit domains, which look fine • 64-bit did not scale because of excessive TLB flushes, which put too much pressure on global update of tlbflush_clock in Xen. Jun has fixed with TLB global bit patch –no need to flush all TLB entries for syscalls, etc, which also does not update tlbflush_clock. Patch also improves single domain performance by keeping user TLB entries.
Big jump with less atomic updates to tlbflush_clock
64-bit Xen Paxville processors 1 domain/processor core 2 vCPUs/domain, 1 per hyperthread 8 cores total OLS2006 1st core for dom019 Scalability • Scaling a large domain, 1 to many vCPUs – Elimination of writable page-tables in xen-unstable helps a lot (red bars below) but still have significant scaling problems beyond 6-way – Now need fine grain locking in mm functions; should be easier to implement with writable page-tables gone. (green bars have domain big-lock removed) – TLB global bit optimization also helps scalability (blue bars)
Need to confirm this data point
64-bit Xen Paxville processors 2-way = 1 processor core 4-way = 2 cores, etc, (HT enabled) 8 cores total
OLS2006 20 3 0 4 G G G x B B B 8 6 _ K 3 U e X s r e 2 n e n r e
l U S S
ring 3 ring 1 ring 0 æ æ æ æ u p V X X S S P k e A r n y e e e A o r s g n n c E
n t t s h m
e e e p r 3 f a c e o m l e a
n t n s r n c s
g
e o e c > t
e r X a w a v 4 d t e l
e l i G
o n s s s u B n
p f t p
r e o m o p e p m o e d
r o m
t f s
x86_64
264 æLarge VA space makes life Kernel U a lot easier, but: æNo segment limit support S Xen ÏNeed to use page-level 264-247 Reserved protection to protect 247 hypervisor
User U
0 x86_64
æRun user-space and kernel in ring 3 using different r3 User U pagetables ° Two PGD‘s(PML4‘s): one with user entries; one with user r3 Kernel U plus kernel entries syscall/sysret æSystem calls require an r0 Xen S additional syscall/ret via Xen æPer-CPU trampoline to avoid needing GS in Xen x86 CPU virtualization
æXen runs in ring 0 (most privileged) æRing 1/2 for guest OS, 3 for user-space ° GPF if guest attempts to use privileged instr æXen lives in top 64MB of linear addr space ° Segmentation used to protect Xen as switching page tables too slow on standard x86 æHypercallsjump to Xen in ring 0 æGuest OS may install ”fast trap‘ handler ° Direct user-space to guest OS system calls æMMU virtualisation: shadow vs. direct-mode Para-Virtualizing the M M U
æGuest OSes allocate and manage own PTs ° Hypercall to change PT base æXen must validate PT updates before use ° Allows incremental updates, avoids revalidation æValidation rules applied to each PTE: 1. Guest may only map pages it owns* 2. Pagetable pages may only be mapped RO æXen traps PTE updates and emulates, or ”unhooks‘ PTE page for bulk updates M M U Virtualization : Direct-M ode
guest reads Virtual Machine guest writes Guest OS
Xen VMM Hardware MMU W riteable Page Tables : 1 – W rite fault
guest reads Virtual Machine first guest write Guest OS
page fault Xen VMM Hardware MMU W riteable Page Tables : 2 – Em ulate?
guest reads Virtual Machine first guest write Guest OS
yes emulate? Xen VMM Hardware MMU W riteable Page Tables : 3 - Unhook
guest reads Virtual Machine guest writes X Guest OS
Xen VMM Hardware MMU W riteable Page Tables : 4 - First Use
guest reads Virtual Machine guest writes X Guest OS
page fault Xen VMM Hardware MMU W riteable Page Tables : 5 – Re-hook
guest reads Virtual Machine guest writes Guest OS
validate
Xen VMM Hardware MMU M M U M icro-Benchm arks
1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U Page fault (µs) Process fork (µs) lmbench results on Linux (L), Xen (X), VMWare Workstation (V), and UML (U) SM P Guest Kernels
æXen support multiple VCPUsper guest ° Virtual IPI‘ssent via Xen event channels ° Currently up to 32 VCPUssupported æSimple hotplug/unplug of VCPUs ° From within VM or via control tools ° Optimize one active VCPU case by binary patching spinlocks æNB: Many applications exhibit poor SMP scalability œ often better off running multiple instances each in their own OS SM P Guest Kernels
æTakes great care to get good SMP performance while remaining secure ° Requires extra TLB syncronization IPIs æSMP scheduling is a tricky problem ° Wish to run all VCPUsat the same time ° But, strict gang scheduling is not work conserving ° Opportunity for a hybrid approach æParavirtualized approach enables several important benefits ° Avoids many virtual IPIs ° Allows ”bad preemption‘ avoidance ° Auto hot plug/unplug of CPUs Driver Dom ains
VM0 VM1 VM2 VM3 Device Unmodified Unmodified Manager & Driver User User Control s/w Domain Software Software
GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (XenBSD)
Back-End Back-End
Native Native Device Device Front-End Front-End Driver Driver Device Drivers Device Drivers
Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor
Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Device Channel Interface Isolated Driver VM s
æRun device drivers in separate domains 350
æDetect failure e.g. 300 ° Illegal access 250 ° Timeout 200 æKill domain, restart 150 100
æE.g. 275ms outage 50
from failed Ethernet 0 0 5 10 15 20 25 30 35 40 driver time (s) IO Virtualization
æIO virtualization in s/w incurs overhead ° Latency vs. overhead tradeoff • More of an issue for network than storage ° Can burn 10-30% more CPU than native æDirect h/w access from VMs ° Multiplexing and protection in h/w ° Xen infinibandsupport ° Smart NICs / HCAs Sm art NICs e.g. Solarflare
æ Accelerated routes set up by Dom0 ° Then DomU can access hardware directly æ NIC has many Virtual Interfaces (VIs) ° VI = Filter + DMA queue + event queue æ Allow untrustedentities to access the NIC without compromising system integrity
Dom0 DomU DomU Dom0 DomU DomU
Hypervisor Hypervisor Hardware Hardware Early results
1400 1200 1000 800 600 400 200 0 STREAM rx STREAM tx TCP/RR Vanilla Xen (M bps) (M bps) (trans per Direct Guest IO 100m s) Xen Dom 0 CPU Usage
140
120
100 Vanilla Xen 80 (Dom 0+Dom U) Direct Guest IO 60 (Dom 0 + Dom U) Xen Dom 0 (Dom 0) 40
20
0 STREAM rx STREAM tx TCP/RR VT-x / AM D-V : hvm
æEnable Guest OSes to be run without modification ° E.g. legacy Linux, Windows XP/2003 æCPU provides vmexitsfor certain privileged instrs æShadow page tables used to virtualize MMU æXen provides simple platform emulation ° BIOS, apic, iopaic, rtc, Net (pcnet32), IDE emulation æInstall paravirtualized drivers after booting for high-performance IO æPossibility for CPU and memory paravirtualization ° Non-invasive hypervisor hints from OS HVM Architecture
Domain 0 Domain N Guest VM (VMX) Guest VM (VMX) (32-bit) (64-bit) Linux xen64 ( x C M D m P o e o a / n v x d
n Unmodified OS Unmodified OS 3D
3P t i e e e r c n l o l e s
d l
Linux xen64 ) F r o V n i r B t D t
u e a r i n a c v l k d e Guest BIOS Guest BIOS e d
r V n r s 0D i Native i
Native d v 1/3P r t e Device u
Device r Virtual Platform Virtual Platform a l
Drivers Drivers Callback / Hypercall VMExit VMExit PIC/APIC/IOAPIC Event channel emulation 0P
Control Interface Scheduler Event Channel Hypercalls Processor Memory I/O: PIT, APIC, PIC, IOAPIC Xen Hypervisor Progressive paravirtualization
æHypercall API available to HVM guests æSelectively add PV extensions to optimize ° Net and Block IO ° XenPIC(event channels) ° MMU operations • multicast TLB flush • PTE updates (faster than page fault) • Page sharing ° Time ° CPU and memory hotplug PV Drivers
Domain 0 Domain N Guest VM (VMX) Guest VM (VMX) (32-bit) (64-bit) Linux xen64 ( x C M D m P o e o a / n v x d
n Unmodified OS Unmodified OS 3D
3P t i e e e r c n l o l e s
d l
Linux xen64 ) F F D D E E
r r V V i i v v F i i r r e e r t t r r o u u V s s a a n i r B l l t D
t
u e a r i n a c v l k d e Guest BIOS Guest BIOS e d
r V n r s 0D i Native i
Native d v 1/3P r t e Device u
Device r Virtual Platform Virtual Platform a l
Drivers Drivers Callback / Hypercall VMExit VMExit PIC/APIC/IOAPIC Event channel emulation 0P
Control Interface Scheduler Event Channel Hypercalls Processor Memory I/O: PIT, APIC, PIC, IOAPIC Xen Hypervisor HVM I/O Perform ance
1000
900 rx tx 800
700
600 s / 500 b M
400
300
200
100
0 ioemu PV-on-HVM PV M easured with ttcp, 1500 byte M TU Source: XenSource, Sep 06 IOEm u stage #2 Guest VM (VMX) Guest VM (VMX) (32-bit) (64-bit) Domain 0 Domain N
Linux xen64 Unmodified OS Unmodified OS ( x C m P o F F a / n D D E E x
n 3D
3P t
e r r e r V V i i n o v v l i i
d r r l e e
Linux xen64 t t ) r r u u s s a a l l
F r o V Guest BIOS Guest BIOS n i r B t D t
u e a r i n a c v Virtual Platform Virtual Platform l k d e
e d
r V n r s 0D i Native i
Native d v 1/3P r
t VMExit VMExit e Device u
Device r a l
Drivers Drivers IO Emulation IO Emulation Callback / Hypercall
Event channel 0P
Control Interface Scheduler Event Channel Hypercalls Processor Memory I/O: PIT, APIC, PIC, IOAPIC Xen Hypervisor HVM Perform ance
æVery application dependent ° 5% SPECJBB (0.5% for fully PV) ° OS-intensive applications suffer rather more ° Performance certainly in-line with existing products ° Hardware support gets better every new CPU æMore optimizations forthcoming ° —V2E“ for trap-intensive code sequences ° New shadow pagetable code Full Virtualization • Shadow2 code shows vast improvement in processor performance/scaling – Boosts single domain throughput significantly -Dbench (running in tmpfs filesystem, NO I/O) – Many HVM domain scalability also improved dramatically • 1 to 7 HVMs scaled 1 : 6.47 with shadow2. Same test scaled 1 : 2.63 with old shadow code
DomU
HVM, original shadow
HVM shadow2
32-bit PAE Xen Paxville processors Uniprocessor domains, One domain/core. 8 cores total 1st core for dom0 HT not used 1 2 3 4 OLS20506 6 7 SMP HVM not st4a9ble
Number of domains Shadow Pagetables
guest reads Virtual Pseudo-physical
guest writes Guest OS Accessed & Updates dirty bits Virtual Machine
VMM Hardware MMU Virtual Disk Storage
æLVM Logical Volumes are typically used to store guest virtual disk images today ° Not as intuitive as using files ° Copy-on-write and snapshot support far from ideal æStoring disk images in files using Linux‘s —loop“ driver doesn‘t work well æThe —Blktap“ driver new in Xen 3.0.3 provides an alternative : ° Allows all block requests to be serviced in user-space using zero-copy AIO approach Blktap and tapdisk plug-ins
æ Char device mapped by æ Flat files and qcow user-space driver æ Sparse allocation, CoW, æ Request/completion encryption, compression queues and data areas æ Correct metadata update æ Grant table mapping for safety zero-copy to/from guest æ Optimized qcowformat Blktap IO perform ance Tim e API
æShared info page contains per VCPU time records ° —at TSC X the time was Y, and the current TSC frequency has been measured as Z“ ° gettime: Read current TSC and extrapolate æWhen VCPUs migrate, update record for new physical CPU æPeriodic calibration of TSC's against pit/hpet ° Works even on systems with un-synced TSCs ° Update record after CPU frequency changes (p-state) ° Also, resynchronize records after CPU halt ° Only issue is thermal thottling Xen Tools
dom0 dom1
CIM xm Web svcs
xend
builder control save/ control restore
libxc
Priv Cmd Back xenbus xenbus Front
dom0_op Xen VM Relocation : M otivation
æVM relocation enables: ° High-availability Xen • Machine maintenance ° Load balancing • Statistical multiplexing gain Xen Assum ptions
æNetworked storage ° NAS: NFS, CIFS ° SAN: Fibre Channel Xen ° iSCSI, network block dev ° drdbnetwork RAID æGood connectivity Xen ° common L2 network ° L3 re-routeing
Storage Challenges
æVMs have lots of state in memory æSome VMs have soft real-time requirements ° E.g. web servers, databases, game servers ° May be members of a cluster quorum Ï Minimize down-time æPerforming relocation requires resources Ï Bound and control resources used Relocation Strategy
Stage 0: pre-migration VM active on host A Destination host selected (Block devices mirrored) Stage 1: reservation Initialize container on target host Stage 2: iterative pre-copy Copy dirty pages in successive rounds Stage 3: stop-and-copy Suspend VM on host A Redirect network traffic Synch remaining state Stage 4: commitment Activate on host B VM state on host A released Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 1 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Round 2 Pre-Copy M igration: Final W ritable W orking Set
æPages that are dirtied must be re-sent ° Super hot pages • e.g. process stacks; top of page free list ° Buffer cache ° Network receive / disk buffers æDirtying rate determines VM down-time ° Shorter iterations less dirtying … Rate Lim ited Relocation
æDynamically adjust resources committed to performing page transfer ° Dirty logging costs VM ~2-3% ° CPU and network usage closely linked æE.g. first copy iteration at 100Mb/s, then increase based on observed dirtying rate ° Minimize impact of relocation on server while minimizing down-time W eb Server Relocation
Effect of Migration on Web Server Transmission Rate 1000 1st precopy, 62 secs further iterations 870 Mbit/sec 9.8 secs 765 Mbit/sec 800
600 694 Mbit/sec
400 165ms total downtime
200 Throughput (Mbit/sec) 512Kb files Sample over 100ms 100 concurrent clients Sample over 500ms 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Elapsed time (secs) Iterative Progress: SPECW eb
52s Iterative Progress: Quake3 Quake 3 Server relocation
Packet interarrival time during Quake 3 migration
0.12
0.1
0.08 Migration 2 Migration 1 downtime: 48ms 0.06 downtime: 50ms
0.04
0.02
0 Packet flight time (secs) 0 10 20 30 40 50 60 70 Elapsed time (secs) System Debugging on Xen
æGuest crash dump support ° Post-mortem analysis ægdbserver ° No need for a kernel debugger æXentrace ° Fine-grained event tracing ° xenmon/xentop æXen oprofile ° Sample-based profiling Xen Oprofile
Oprofile dom0 dom1 dom2 report Sample log
OProfile
PC sample PC sample PC sample HW event HW event HW event
æSample-based profiling for the whole system Xen Oprofile exam ple
CPU: P4 / Xeon with 2 hyper-threads, speed 2794.57 MHz (estimated) Counted GLOBAL_POWER_EVENTS events with a unit mask of 0x01 (mandatory) count 1000000 samples % image name app name symbol name 30353 12.0434 domain1-kernel domain1-kernel __ copy_to_user_ll 7174 2.8465 xen-syms-3.0-unstable xen-syms-3.0-unstable do_grant_table_op 6040 2.3965 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 net_tx_action 5508 2.1854 xen-syms-3.0-unstable xen-syms-3.0-unstable find_domain_by_id 4944 1.9617 xen-syms-3.0-unstable xen-syms-3.0-unstable gnttab_transfer 4848 1.9236 domain1-xen domain1-xen evtchn_set_pending 4631 1.8375 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 net_rx_action 4322 1.7149 domain1-kernel domain1-kernel tcp_v4_rcv 4145 1.6446 xen-syms-3.0-unstable xen-syms-3.0-unstable hypercall 4005 1.5891 domain1-xen domain1-xen guest_remove_page 3644 1.4459 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 hypercall_page 3589 1.4240 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 eth_type_trans 3425 1.3590 domain1-xen domain1-xen get_page_from_l1e 2846 1.1292 domain1-kernel domain1-kernel eth_type_trans 2770 1.0991 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 e1000_intr
…
75 0.0298 domain1-apps domain1-apps (no symbols) 69 0.0274 domain1-kernel domain1-kernel ns_to_timespec 69 0.0274 vmlinux-syms-2.6.16.13-xen0 vmlinux-syms-2.6.16.13-xen0 delay_tsc 68 0.0270 domain1-kernel domain1-kernel do_IRQ 66 0.0262 oprofiled oprofiled odb_insert Xen Research Projects
æWhole-system pervasive debugging ° Lightweight checkpointing and replay ° Cluster/distributed system debugging æSoftware implemented h/wfault tolerance ° Exploit deterministic replay ° Explore possibilities for replay on SMP systems æMulti-level secure systems with Xen ° XenSE/OpenTC : Cambridge, Intel, GCHQ, HP, … æVM forking ° Lightweight service replication, isolation ° UCSD Potemkin honeyfarm project Parallax æManaging storage in VM clusters. æVirtualizes storage, fast snapshots æAccess optimized storage
L1 Root A Snapshot Root B
L2
Data
V2E : Taint tracking I/O Taint Protected VM VN VD
Qemu*
Control VM Protected VM
ND DD VN VD
Taint Pagemap VMM
Net Disk
1234. IVTnaMbino trtur amnpdsas r pionkain nge mgeascu calearsteis omp ntroa,o rtpakr aetcgadkia naitntesegd dt att apoinia ndtgetiesdd.k. .TdFDaintiineastk-.eg deQra xpeitneamengudse isotanint MmDeaictrrakoieslcs dot a idnien oe ttmex-ptdoer ednsiafeiteoandn,t ,.at op nT adrhge refoele-w-gtca rtVa intnMatusi nl tamotirn ietgym aboucirltraymot iosaosnpn .drinea atVad M.mMov. ement. V2E : Taint tracking I/O Taint Protected VM VN VD
Qemu*
Control VM Protected VM
ND DD VN VD
Taint Pagemap VMM
Net Disk
1234. IVTnaMbino trtur amnpdsas r pionkain nge mgeascu calearsteis omp ntroa,o rtpakr aetcgadkia naitntesegd dt att apoinia ndtgetiesdd.k. .TdFDaintiineastk-.eg deQra xpeitneamengudse isotanint MmDeaictrrakoieslcs dot a idnien oe ttmex-ptdoer ednsiafeiteoandn,t ,.at op nT adrhge refoele-w-gtca rtVa intnMatusi nl tamotirn ietgym aboucirltraymot iosaosnpn .drinea atVad M.mMov. ement. Current Xen Status
x86_32 x86_32p x86_64 IA64 Power
Privileged Domains
Guest Domains
SMP Guests Save/Restore/Migrate >4GB memory Progressive PV Driver Domains Post-3.0.0 Rough Code Stats
aliases checkins insertions xensource.com 16 1281 363449 ibm.com 30 271 40928 intel.com 26 290 29545 hp.com 8 126 19275 novell.com 8 78 17108 valinux.co.jp 3 156 12143 bull.net 1 145 11926 ncsc.mil 3 25 6048 fujitsu.com 13 119 6442 redhat.com 7 68 4822 amd.com 5 61 2671 virtualiron.com 5 23 1434 cam.ac.uk 1 9 1211 sun.com 2 9 826 unisys.com 3 7 857 Stats since other 30 189 48132 3.0.0 Release Xen Developm ent Roadm ap
æPerformance tuning and optimization ° Particularly for HVM and x86_64 æEnhanced control stack æMore automated system tuning æScalability and NUMA optimizations æBetter laptop/desktop support ° OpenGL virtualization, power management æNetwork optimziations Conclusions
æXen is a complete and robust hypervisor æOutstanding performance and scalability æExcellent resource control and protection æVibrant development community æStrong vendor support
æTry the Xen demo CD to find out more! (or Fedora Core 6, Suse 10.x)
æhttp://xensource.com/community Thanks!
æIf you‘re interested in working on Xen we‘re looking for Research Assistants and Associates in the Lab, and also folk to work in the XenSource Cambridge office.
æ[email protected] Current Device M odel Current Device M odel æQEMU Device Emulation ° Runs in Domain0 as User Process ° No Isolation to Properly Accounting for HVM I/O ° Doesn't Scale œ Launch New QEMU-DM For Every HVM Domain, Currently Maps All HVM Memory ° Many Costly Ring Transitions • On Every I/O Instruction Event Between QEMU and Kernel • On Every I/O Command Between QEMU and Real Device Drivers ° Many Costly VM Exits for Every I/O Stub Dom ain Stub Dom ain Requirem ents æRequirements ° Need User Space Context for QEMU • Some Devices Require User Process - vncserver ° Need Library Support for QEMU ° Need SMP Support ° Need to Run IO Emulation in Kernel Space ° Need to Run Frontend Drivers ° Need to Limit Impact to Build ° Need to Limit QEMU Maintenance Impact