Xen and the Art of Virtualization

Xen and the Art of Virtualization

Xen and The Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt & Andrew Warfield SOSP 2003 Additional source: Ian Pratt on xen (xen source) 1 www.cl.cam.ac.uk/research/srg/netos/papers/2005-xen-may.ppt Para virtualization 2 Virtualization approaches z Full virtualization z OS sees exact h/w OS z OS runs unmodified z Requires virtualizable VMM architecture or work around H/W z Example: Vmware z Para Virtualization z OS knows about VMM OS z Requires porting (source code) z Execution overhead VMM z Example Xen, denali H/W 3 The Xen approach z Support for unmodified binaries (but not OS) essential z Important for app developers z Virtualized system exports has same Application Binary Interface (ABI) z Modify guest OS to be aware of virtualization z Gets around problems of x86 architecture z Allows better performance to be achieved z Expose some effects of virtualization z Translucent VM OS can be used to optimize for performance z Keep hypervisor layer as small and simple as possible z Resource management, Device drivers run in privileged VMM z Enhances security, resource isolation 4 Paravirtualization 5 z Solution to issues with x86 instruction set z Don’t allow guest OS to issue sensitive instructions z Replace those sensitive instructions that don’t trap to ones that will trap z Guest OS makes “hypercalls” (like system calls) to interact with system resources z Allows hypervisor to provide protection between VMs z Exceptions handled by registering handler table with Xen z Fast handler for OS system calls invoked directly z Page fault handler modified to read address from replica location z Guest OS changes largely confined to arch-specific code z Compile for ARCH=xen instead of ARCH=i686 z Original port of Linux required only 1.36% of OS to be modified 5 Para-Virtualization in Xen z Arch xen_x86 : like x86, but Xen hypercalls required for privileged operations z Avoids binary rewriting z Minimize number of privilege transitions into Xen z Modifications relatively simple and self-contained z Modify kernel to understand virtualized environment. z Wall-clock time vs. virtual processor time z Xen provides both types of alarm timer z Expose real resource availability z Enables OS to optimise behaviour 6 x86 CPU virtualization z Xen runs in ring 0 (most privileged) z Ring 1/2 for guest OS, 3 for user-space z General Processor Fault if guest attempts to use privileged instruction z Xen lives in top 64MB of linear address space z Segmentation used to protect Xen as switching page tables too slow on standard x86 z Hypercalls jump to Xen in ring 0 z Guest OS may install ‘fast trap’ handler z Direct user-space to guest OS system calls z MMU virtualisation: shadow vs. direct-mode 7 x86_32 z Xen reserves top of VA 4GB space Xen S z Segmentation protects Kernel S Xen from kernel 3GB z System call speed unchanged ring 0 User U ring 1 ring 3 z Xen 3.0 now supports >4GB mem with 0GB Processor Address Extension (64 bit etc) 8 Xen VM interface: CPU z CPU z Guest runs at lower privilege than VMM z Exception handlers must be registered with VMM z Fast system call handler can be serviced without trapping to VMM z Hardware interrupts replaced by lightweight event notification system z Timer interface: both real and virtual time 9 Xen virtualizing CPU z Many processor architectures provide only 2 levels (0/1) z Guest and apps in 1, VMM in 0 z Run Guest and app as separate processes z Guest OS can use the VMM to pass control between address spaces z Use of software TLB with address space tags to minimize CS overhead 10 XEN: virtualizing CPU in x86 z x86 provides 4 rings (even VAX processor provided 4) z Leverages availability of multiple “rings” z Intermediate rings have not been used in practice since OS/2; x86- specific z An O/S written to only use rings 0 and 3 can be ported; needs to modify kernel to run in ring 1 11 CPU virtualization z Exceptions that are called often: z Software interrupts for system calls z Page faults z Improve Allow “guest” to register a ‘fast’ exception handler for system calls that can be accessed directly by CPU in ring 1, without switching to ring-0/Xen z Handler is validated before installing in hardware exception table: To make sure nothing executed in Ring 0 privilege. z Doesn’t work for Page Fault z Only code in ring 0 can read the faulting address from register 12 Xen 13 Some Xen hypercalls 14 z See http://lxr.xensource.com/lxr/source/xen/include/public/xen.h #define __HYPERVISOR_set_trap_table 0 #define __HYPERVISOR_mmu_update 1 #define __HYPERVISOR_sysctl 35 #define __HYPERVISOR_domctl 36 14 Xen VM interface: Memory z Memory management z Guest cannot install highest privilege level segment descriptors; top end of linear address space is not accessible z Guest has direct (not trapped) read access to hardware page tables; writes are trapped and handled by the VMM z Physical memory presented to guest is not necessarily contiguous 15 Memory virtualization choices z TLB: challenging z Software TLB can be virtualized without flushing TLB entries between VM switches z Hardware TLBs tagged with address space identifiers can also be leveraged to avoid flushing TLB between switches z x86 is hardware-managed and has no tags… z Decisions: z Guest O/Ss allocate and manage their own hardware page tables with minimal involvement of Xen for better safety and isolation z Xen VMM exists in a 64MB section at the top of a VM’s address space that is not accessible from the guest 16 Xen memory management 17 z x86 TLB not tagged z Must optimise context switches: allow VM to see physical addresses z Xen mapped in each VM’s address space z PV: Guest OS manages own page tables z Allocates new page tables and registers with Xen z Can read directly z Updates batched, then validated, applied by Xen 17 Memory virtualization z Guest O/S has direct read access to hardware page tables, but updates are validated by the VMM z Through “hypercalls” into Xen z Also for segment descriptor tables z VMM must ensure access to the Xen 64MB section not allowed z Guest O/S may “batch” update requests to amortize cost of entering hypervisor 18 x86_32 z Xen reserves top of VA 4GB space Xen S z Segmentation protects Kernel S Xen from kernel 3GB z System call speed unchanged ring 0 User U ring 1 ring 3 z Xen 3.0 now supports >4GB mem with 0GB Processor Address Extension (64 bit etc) 19 Virtualized memory 20 management VM1 VM2 z Each process in each VM has 1 2 1 5 1 2 1 5 its own VAS 2 3 2 6 2 3 2 6 z Guest OS deals with real (pseudo-physical) pages, Xen 2 8 maps physical to machine 3 12 z For PV, guest OS uses 5 14 hypercalls to interact with memory 6 9 z For HVM, Xen has shadow 2 4 page tables (VT instructions help) 3 16 5 17 6 18 20 TLB when VM1 is running VP PID PN 1 1 ? 2 1 ? 1 2 ? 2 2 ? 21 MMU Virtualization: shadow mode 22 Shadow page table z Hypervisor responsible for trapping access to virtual page table z Updates need to be propagate back and forth between Guest OS and VMM z Increases cost of managing page table flags (modified, accessed bits) z Can view physical memory as contiguous z Needed for full virtualization 23 MMU virtualization: direct mode z Take advantage of Paravirtualization z OS can be modified to be involved only in page table updates z Restrict guest OSes to read only access z Classify Page frames into frames that holds page table z Once registered as page table frame, make the page frame R_ONLY z Can avoid the use of shadow page tables 24 Single PTE update 25 On write PTE : Emulate guest reads Virtual → Machine first guest write Guest OS yes emulate? Xen VMM Hardware MMU 26 Bulk update z Useful when creating new Virtual address spaces z New Process via fork and Context switch z Requires creation of several PTEs z Multipurpose hypercall z Update PTEs z Update virtual to Machine mapping z Flush TLB z Install new PTBR 27 Batched Update Interface guest reads PD Virtual → Machine guest PT PT PT writes Guest OS validation Xen VMM Hardware MMU 28 Writeable Page Tables: create new entries guest reads PD Virtual → Machine guest writes X PT PT PT Guest OS Xen VMM Hardware MMU 29 Writeable Page Tables : First Use—validate mapping via TLB guest reads Virtual → Machine guest writes X Guest OS page fault Xen VMM Hardware MMU 30 Writeable Page Tables : Re- hook guest reads Virtual → Machine guest writes Guest OS validate Xen VMM Hardware MMU 31 Physical memory z Memory allocation for each VM specified at boot z Statically partitioned z No overlap in machine memory z Strong isolation z Non-contiguous (Sparse allocation) z Balloon driver z Add or remove machine memory from guest OS 32 Xen memory management 33 z Xen does not swap out memory allocated to domains z Provides consistent performance for domains z By itself would create inflexible system (static memory allocation) z Balloon driver allows guest memory to grow/shrink z Memory target set as value in the XenStore z If guest above target, free/swap out, then release to Xen z If guest below target, can increase usage z Hypercalls allow guests to see/change state of memory z Physical-real mappings z “Defragment” allocated memory 33 Xen VM interface: I/O z I/O z Virtual devices (device descriptors) exposed as asynchronous I/O rings to guests z Event notification is by means of an upcall as opposed to interrupts 34 I/O z Handle interrupts z Data transfer z Data written to I/O buffer pools in each domain z These Page frames pinned by Xen 35 Details: I/O z I/O Descriptor Ring: 36 I/O rings REQUEST CONSUMER (XEN) REQUEST PRODUCER

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    53 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us