Xen and the Art of Virtualization

Xen and the Art of Virtualization

Xen and the Art of Virtualization Ian Pratt University of Cambridge and XenSource Outline Virtualization Overview Xen Architecture VM Relocation deep dive Debugging and profiling Xen Research Demo Questions Virtualization Overview Single OS image: OpenVZ, Vservers, Zones ° Group user processes into resource containers ° Hard to get strong isolation Full virtualization: VMware, VirtualPC, QEMU ° Run multiple unmodified guest OSes ° Hard to efficiently virtualize x86 Para-virtualization: Xen ° Run multiple guest OSes ported to special arch ° Arch Xen/x86 is very close to normal x86 Virtualization Benefits Consolidate under-utilized servers X Avoid downtime with VM Relocation Dynamically re-balance workload to guarantee application SLAs X X Enforce security policy Virtualization Possibilities Standing outside the OS looking in: ° Firewalling/ network IDS / Inverse Firewall ° VPN tunneling; LAN authentication ° Virus, mal-ware and exploit detection ° OS patch-level status monitoring ° Performance monitoring and instrumentation ° Storage backup and optimization ° Debugging support Virtualization Benefits Separating the OS from the hardware ° Users no longer forced to upgrade OS to run on latest hardware Device support is part of the platform ° Write one device driver rather than N ° Better for system reliability/availability ° Faster to get new hardware deployed Enables —Virtual Appliances“ ° Applications encapsulated with their OS ° Easy configuration and management Xen 3.0 Highlights x86, x86_64, ia64 and initial Power support Leading performance Secure isolation and QoS control SMP guest OSes HotplugCPUs, memory and devices Guest save/restore and live relocation VT/AMDV support: "HVM— ° Run unmodified guest kernels ° Support for progressive paravirtualization Xen 3.0 Architecture VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software GuestOS GuestOS GuestOS Unmodified (XenLinux) (XenLinux) (XenLinux) GuestOS AGP (WinXP)) ACPI Back-End SMP PCI Native Device Front-End Front-End Front-End Drivers Device Drivers Device Drivers Device Drivers VT-x x86_32 Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU x86_64 Xen Virtual Machine Monitor IA64 Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Para-Virtualization in Xen Xen extensions to x86 arch ° Like x86, but Xen invoked for privileged ops ° Avoids binary rewriting ° Minimize number of privilege transitions into Xen ° Modifications relatively simple and self-contained Modify kernel to understand virtualised env. ° Wall-clock time vs. virtual processor time • Desire both types of alarm timer ° Expose real resource availability • Enables OS to optimise its own behaviour Xen 3 API support 4inux 2.6.16/17/18 and -rc/-tip 2.6.5 2.6.9.EL 2.4.21.EL Available in distros: FC4, FC5, FC6, SuSELinux10, SLES10, Ubuntu, Gentoo,… NetBSD3, FreeBSD 7.0, OpenSolaris10, Plan9, minix, … Linux upstream submission process agreed at Kernel Summit Xen 2.0 Architecture VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (Solaris) Back-Ends Native Device Front-End Front-End Front-End Drivers Device Drivers Device Drivers Device Drivers Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) I/O Architecture Xen IO-Spaces delegate guest OSes protected access to specified h/w devices ° Virtual PCI configuration space ° Virtual interrupts ° (Need IOMMU for full DMA protection) Devices are virtualised and exported to other VMsvia Device Channels ° Safe asynchronous shared memory transport ° ”Backend‘ drivers export to ”frontend‘ drivers ° Net: use normal bridging, routing, iptables ° Block: export any blkdev e.g. sda4,loop0,vg3 (Infiniband/ —Smart NICs“ for direct guest IO) System Perform ance 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U L X V U L X V U SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U) specjbb2005 9000 Native 0.53% 0.73% 8000 Xen 7000 6000 5000 4000 3000 2000 1000 0 Opteron P4 Xeon dbench 300 Native Xen 250 200 s / B 150 M 100 50 0 1 2 4 8 16 32 # processes )ernel build 350 5L Native 300 Xen 250 32b PAE; Parallel m ake, 4 processes per CPU ) 200 s ( e ,L m i 150 T 100 13% 18% 23% 50 0 1 2 4 6 Source: 8XenSource, Inc: 10/06 # Virtual CPUs TCP results 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U L X V U L X V U Tx, MTU 1500 (Mbps) Rx, MTU 1500 (Mbps) Tx, MTU 500 (Mbps) Rx, MTU 500 (Mbps) TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U) Scalability 1000 800 600 400 200 0 L X L X L X L X 2 4 8 16 Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X) Scalability • Scaling many domains – One domain per processor core, 1 to many cores/domains – Presented data showing problems at OLS on 64-bit domains • Since then, tested 32-bit domains, which look fine • 64-bit did not scale because of excessive TLB flushes, which put too much pressure on global update of tlbflush_clock in Xen. Jun has fixed with TLB global bit patch –no need to flush all TLB entries for syscalls, etc, which also does not update tlbflush_clock. Patch also improves single domain performance by keeping user TLB entries. Big jump with less atomic updates to tlbflush_clock 64-bit Xen Paxville processors 1 domain/processor core 2 vCPUs/domain, 1 per hyperthread 8 cores total OLS2006 1st core for dom019 Scalability • Scaling a large domain, 1 to many vCPUs – Elimination of writable page-tables in xen-unstable helps a lot (red bars below) but still have significant scaling problems beyond 6-way – Now need fine grain locking in mm functions; should be easier to implement with writable page-tables gone. (green bars have domain big-lock removed) – TLB global bit optimization also helps scalability (blue bars) Need to confirm this data point 64-bit Xen Paxville processors 2-way = 1 processor core 4-way = 2 cores, etc, (HT enabled) 8 cores total OLS2006 20 ,-./32 Xen reserves top of 4GB Xen S VA space Kernel S Segmentation 3GB protects Xen from 0 1 g kernel n g 3 i r n i g User U r System call speed n i r unchanged 0GB Xen 3 now supports PAE for >4GB mem ,-./.0 264 4arge VA space makes life Kernel U a lot easier, but: No segment limit support S Xen Need to use page-level 264-247 Reserved protection to protect 247 hypervisor User U 0 ,-./.0 Run user-space and kernel in ring 3 using different r3 User U pagetables ° Two PGD‘s(PML4‘s): one with user entries; one with user r3 Kernel U plus kernel entries syscall/sysret System calls require an r0 Xen S additional syscall/ret via Xen Per-CPU trampoline to avoid needing GS in Xen ,-. CPU virtualization Xen runs in ring 0 (most privileged) Ring 1/2 for guest OS, 3 for user-space ° GPF if guest attempts to use privileged instr Xen lives in top 64MB of linear addr space ° Segmentation used to protect Xen as switching page tables too slow on standard x86 Hypercallsjump to Xen in ring 0 Guest OS may install ”fast trap‘ handler ° Direct user-space to guest OS system calls MMU virtualisation: shadow vs. direct-mode Para-Virtualizing the M M U Guest OSes allocate and manage own PTs ° Hypercall to change PT base Xen must validate PT updates before use ° Allows incremental updates, avoids revalidation Validation rules applied to each PTE: 1. Guest may only map pages it owns* 2. Pagetable pages may only be mapped RO Xen traps PTE updates and emulates, or ”unhooks‘ PTE page for bulk updates 2 2 1 Virtualization : Direct-M ode guest reads Virtual Machine guest writes Guest OS Xen VMM Hardware MMU 5 riteable Page Tables : 1 – W rite fault guest reads Virtual Machine first guest write Guest OS page fault Xen VMM Hardware MMU 5 riteable Page Tables : 2 – Em ulate? guest reads Virtual Machine first guest write Guest OS yes emulate? Xen VMM Hardware MMU 5 riteable Page Tables : 3 - Unhook guest reads Virtual Machine guest writes X Guest OS Xen VMM Hardware MMU 5 riteable Page Tables : 4 - First Use guest reads Virtual Machine guest writes X Guest OS page fault Xen VMM Hardware MMU 5 riteable Page Tables : 5 – Re-hook guest reads Virtual Machine guest writes Guest OS validate Xen VMM Hardware MMU 2 2 1 M icro-Benchm arks 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U Page fault (µs) Process fork (µs) lmbench results on Linux (L), Xen (X), VMWare Workstation (V), and UML (U) SM P Guest Kernels Xen support multiple VCPUsper guest ° Virtual IPI‘ssent via Xen event channels ° Currently up to 32 VCPUssupported Simple hotplug/unplug of VCPUs ° From within VM or via control tools ° Optimize one active VCPU case by binary patching spinlocks NB: Many applications exhibit poor SMP scalability œ often better off running multiple instances each in their own OS SM P Guest Kernels Takes great care to get good SMP performance while remaining secure ° Requires extra TLB syncronization IPIs SMP scheduling is a tricky problem ° Wish to run all VCPUsat the same time ° But, strict gang scheduling is not work conserving ° Opportunity for a hybrid approach Paravirtualized approach enables several important benefits ° Avoids many virtual IPIs ° Allows ”bad preemption‘ avoidance ° Auto hot plug/unplug of CPUs 4river Dom ains VM0 VM1 VM2 VM3 Device Unmodified Unmodified Manager & Driver User User Control s/w Domain Software Software GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (XenBSD) Back-End Back-End Native Native Device Device Front-End Front-End Driver Driver Device Drivers Device Drivers Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) 4evice Channel Interface Isolated Driver VM s Run device drivers in separate domains 350 Detect failure e.g.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    93 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us