PAC-498B ESX Server Architectural Directions
Beng-Hong Lim Director of R&D This presentation may contain VMware confidential information.
Copyright © 2005 VMware, Inc. All rights reserved. All other marks and names mentioned herein may be trademarks of their respective companies. Agenda ESX Server overview Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions ESX Server Core Virtualization
VMX VMX VMX VMX VM VM VM VM Virtual I/O Stack VMM VMM VMM VMM Machine Service Monitor Console Device Drivers
Storage Stack Network Stack
Device Drivers
VMkernel VMkernel Hardware Interface
Hardware ESX Server Enterprise-Class Features
Third- Party Agents VMX VMX VMX VMX VM VM VM VM Virtual I/O Stack VMM VMM VMM VMM Machine Service Monitor Console Device Drivers Distributed Virtual NIC & Enterprise Resource VM File System Switch Class Management Virtualization Functionality CPU Scheduling Storage Stack Network Stack Memory Scheduling Storage Bandwidth Network Bandwidth Device Drivers VMkernel VMkernel Hardware Interface
Hardware ESX Server in the Datacenter
Management Third-party Solutions and Distributed Virtualization Services Third- SDK, VirtualCenter Agent Party Agents VMX VMX VMX VMX VM VM VM VM Virtual I/O Stack Machine Service VMM VMM VMM VMM Device Drivers Monitor VMotion Console Provisioning Distributed Virtual NIC & Enterprise VM File System Switch Class Backup Resource Virtualization Management DRS Functionality CPU Scheduling Storage Stack Network Stack DAS Memory Scheduling Storage Bandwidth VirtualCenter Distributed Network Bandwidth Device Drivers Services VMkernel VMkernel Hardware Interface
Hardware Agenda Overview of ESX Server Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions Server Technology Trends Multi-core CPUs 16 to 32 CPU cores per server 64-bit systems Multi-terabytes of memory Power-aware architectures Adaptive throttling of CPUs and server hardware Converged I/O fabrics and interfaces Shared high-speed interface to network and storage Network-based, virtualized storage Stateless servers The Future Datacenter
Each server: Many CPU cores 64-bits: lots of memory Shared, high-bandwidth
… connection to network … and external storage Stateless Power-hungry
Servers Network Storage Virtualization is Key Abundant compute resources on a server Virtualization is inherently scalable and parallel The killer app for efficiently utilizing many CPUs and multi- terabytes of memory Power management increasingly important Higher compute densities and rising utility costs Maximize performance per watt across all servers Distributed resource scheduling can optimize for this metric Transforms system management Breaks the bond between hardware and applications Ease of scale-up management in a scale-out environment The Virtual Datacenter
ESX Server virtualizes individual servers VirtualCenter synthesizes virtualized servers into a giant computer ESX Server and … … VirtualCenter map applications and virtual machine topologies onto physical resources
A Distributed Virtual Computer Agenda Overview of ESX Server Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions CPU Virtualization
VMX VMX VMX VMX VM VM VM VM
I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System
Storage Stack Network Stack
Device Drivers
VMkernel
Hardware CPU Virtualization Basic idea: directly execute code until not safe Handling unsafe code Trap and emulate: classic mainframe Avoid unsafe code, call into VMM: paravirtualization Dynamically transform to safe code: binary translation Tradeoffs among the methods
Trap and Para- Binary Emulate virtualization Translation Performance Average Excellent Good Compatibility Excellent Poor Excellent Sophistication Average Average High CPU Virtualization Directions New technologies: 64-bit CPUs, VT/Pacifica, Linux/Windows paravirtualization. Many guest OS types Flexible architecture supports mix of guests and VMM types . . . Separate VMM per virtual VM VM VM VM machine Para BT VT . . . BT VMM VMM-32 VMM-64 VMM-32 Simultaneously run 32-bit, 64-bit, and paravirtualized VMkernel guests Use most efficient method for the hardware and guest OS Agenda Overview of ESX Server Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions I/O Virtualization
VMX VMX VMX VMX VM VM VM VM
I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System
Storage Stack Network Stack
Device Drivers
VMkernel
Hardware I/O Virtualization Paths
Guest OS VMX Paths to physical device 1. Hosted/Split I/O: via a Device Device Driver Emulation separate host/VM I/O Stack VMM 2. Native I/O: via the Device Service Console Device Driver vmkernel Emulation 3. Passthrough I/O: guest 1. Hosted/ I/O Stack directly drives device Split I/O Device Driver Needs hardware support,
VMkernel 2. Native I/O sacrifices functionality. 3. Passthrough I/O
Which I/O path to use? Evaluating the I/O paths Compatibility Hardware vendors can re-use existing device drivers Performance (per watt) High I/O performance, low CPU occupancy Isolation Contain device driver faults Virtualization Functionality Virtual machine portability Resource sharing and multiplexing Offloading guest functionality into virtualization layer I/O Virtualization: Compatibility
Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Isolation Functionality
Hosted/Split and Passthrough can re-use device drivers from existing OSes Native requires new or ported drivers. Provide DDK and driver APIs to ease driver development and porting Hosted/Split I/O Performance Service Console or Trusted Virtual Virtual Virtual Virtual Machine Machine1 Machine2 Machine3 OS OS OS OS Backend Frontend Frontend Frontend Native Device Device Device Device Driver Drivers Driver Driver
VMkernel
Microkernel-style communication, context switch and scheduling overhead unless CPUs dedicated Scalability limits to Service Console or Driver VM Native I/O Performance Virtual Virtual Virtual Machine1 Machine2 Machine3 OS OS OS
Frontend Frontend Frontend Device Device Device Driver Driver Driver
Backend VMkernel Native Device Drivers
Direct calls between frontend and backend Backend can run on any CPU, scalable Passthrough I/O Performance Virtual Virtual Virtual Machine1 Machine2 Machine3 OS OS OS
Device Device Device Driver Driver Driver
Interrupt routing VMkernel
Guest OS driver drives the device directly VMkernel may have to handle/route interrupts I/O Virtualization: Performance
Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Functionality Hosted/Split incurs switching and scheduling overheads, or consumes dedicated CPUs Native and Passthrough are efficient, scalable Passthrough avoids an extra driver layer, but runs more code non-natively I/O Virtualization: Isolation, Today
Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation None None N/A Functionality
Passthrough allows malicious guest to crash system, so not an option All three methods need I/O MMU to map and protect DMA I/O Virtualization: Isolation, Future
Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Good Good Good Functionality
Hosted/Split and Passthrough can isolate within virtual machine, use I/O MMUs Native can isolate within in-kernel protection domains, use VT/Pacifica and I/O MMUs Not a substitute for testing and qualification I/O Virtualization: Functionality
Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Good Good Good Functionality Good Good Poor
Passthrough precludes offloading functionality from the guest into the virtualization layer, e.g., NIC teaming, SAN multipathing Passthrough sacrifices some key virtualization capabilities; VM portability, VMotion I/O Virtualization Direction
Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Good Good Good Functionality Good Good Poor
Future datacenter implications Power-efficient performance favors Native and Passthrough Stateless servers and converged I/O interfaces: fewer devices to support, eases compatibility I/O Virtualization Direction
Guest OS VMX Optimize Native I/O for selected devices, Device Device Driver Emulation driver isolation
I/O Stack VMM
Device Service Console Device Driver Emulation
I/O Stack
Device Driver VMkernel I/O Virtualization Direction
Guest OS Optimize Native I/O for selected devices, Device Driver driver isolation
VMM Migrate from Device Hosted/Split I/O to Emulation Passthrough I/O I/O Stack when hardware ready
Device Driver VMkernel I/O Virtualization Direction
Guest OS Guest OS Optimize Native I/O I/O Stack for selected devices, Device Driver Device Driver driver isolation
VMM Migrate from Device Hosted/Split I/O to Emulation Passthrough I/O I/O Stack when hardware ready
Device Driver Can synthesize VMkernel Hosted/Split I/O by proxy through Passthrough I/O VM Hardware Support for Passthrough
To preserve key virtualization capabilities Device sharing: multiple virtual end points Snapshots and VMotion: save/restore device state Page sharing, VMotion: demand paging Virtual machine portability: standard device abstraction
Active industry interest in hardware support for Passthrough I/O. Please contact VMware if interested Other I/O Virtualization Directions More network-based storage support iSCSI and NAS in ESX Server 3.0 I/O accelerators Offload engines: offload guest or vmkernel I/O Intel I/OAT: guest and vmkernel usage I/O bandwidth management Important for shared interfaces, converged I/O fabrics Paravirtualization Reduce hardware requirements for Passthrough I/O Define a standard paravirtual I/O interface Agenda Overview of ESX Server Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions Scalable Performance
SDK and VirtualCenter Agent Third Party Agents VMX VMX VMX VMX VM VM VM VM
I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System
Resource Storage Stack Network Stack Management Device Drivers
VMkernel
Hardware Scalable Performance
Service SDK and VirtualCenter Agent Console Third Party Virtual Agents Machine VMX VMX VMX VMX VM VM VM VM
I/O Stack VMM VMM VMM VMM Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System
Resource Storage Stack Network Stack Management Device Drivers
VMkernel
Hardware Scalable Performance
8-CPU DL-760 16GB RAM ESX 2: VMX on Service Console
ESX 3: VMX on VMkernel Windows 2000 Boot Time (s) Windows 2000 Boot
Number of idle Windows 2000 guests Agenda Overview of ESX Server Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions Power Efficiency and Management
Increasing CPU power consumption Increasing compute and power densities with multi-core CPUs and stateless servers Significant cost to power and cool a datacenter Limits to datacenter power and cooling capability Server Power Management
Power consumption varies as cube of Conventional voltage x frequency New hardware support for dynamically adjusting voltage/frequency Voltage/Freq Load-balance across minimally powered CPUs
With parallelism, two half-speed CPUs are more efficient than one full-speed CPU Datacenter Power Management
Dynamic, power-aware load-balancing across servers with VMotion Consider fixed power consumption per server Balance powering off … … servers vs. throttling CPUs
Need an efficient virtualization layer Agenda Overview of ESX Server Technology trends in the datacenter ESX Server architectural directions CPU virtualization I/O virtualization Scalable performance Power efficiency and management Conclusions Recap Consider datacenter of the future Stateless, power-hungry virtualized servers with many CPUs, lots of memory Virtualized network “backplane” Virtualized network-based storage A global virtual computer Impact on ESX Server architecture CPU virtualization: multiple VMM types I/O virtualization: Native and Passthrough I/O Scalable performance: relieve bottlenecks Power efficiency: minimize CPU consumption Role of Virtualization Hardware Support Phase 1: Hardware for correctness Trap or exit on unsafe code Safe device access, driver isolation Phase 2: Hardware as accelerator Speed up virtualization software Fast VM enter/exit, nested paging, I/O offloading
Does not eliminate the need for virtualization software, just as hardware support does not eliminate the need for operating systems ESX Server Architecture Today
SDK and VirtualCenter Agent Third-Party Agents VMX VMX VMX VMX VM VM VM VM
I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch Resource File System Management CPU Scheduling Storage Stack Network Stack Memory Scheduling Storage Bandwidth Network Bandwidth Device Drivers
VMkernel VMkernel Hardware Interface
Hardware ESX Server Architecture in the Future
SDK and Third-Party Management Agents Agents
VM VM VMX VMX VMX VMX VM VM VM VM
Para- Para- VT VMM-64 POSIX API VMM-32 VMM-64 VMM VMM VMM
Distributed Resource Virtual NIC and Virtual Machine Management Switch File System Passthrough I/O CPU Scheduling Memory Scheduling Storage Stack Network Stack Storage Bandwidth Network Bandwidth Power Management Isolated Device Drivers/Modules
VMkernel VMkernel Hardware Interface
Hardware Call to Action Hardware vendors Build performance-focused virtualization assists Build hardware for fully-functional Passthrough I/O Work with VMware on Native I/O devices and drivers Software vendors Support standard interfaces: OS, apps, management VMI for transparent paravirtualization Datacenter architects and administrators Virtualize now, get ready for the future virtual datacenter VMware will use relevant technology to provide the broadest, most flexible, highest performance virtualization platform Backup Slides This presentation covers potential and uncommitted future directions. Details about future releases of our products are available in select sessions at VMworld, including:
PAC879: The Next Phase of Virtual Infrastructure: Introducing ESX Server 3.0 and VirtualCenter 2.0 PAC177: Distributed Availability Services Architecture PAC484: Consolidated Backup with ESX Server: In-Depth Review PAC485: Managing Data Center Resources Using the VirtualCenter Distributed Resource Scheduler PAC532: iSCSI and NAS in ESX Server 3 Overview of ESX Server Mature x86 hypervisor-based virtualization Considered best server virtualization approach Highlights: Sophisticated resource management Enterprise networking and storage support Integrated VMFS for managing virtual disks Broadest support for x86 OSes Transparent VM migration (VMotion) I/O Virtualization Full virtualization of I/O devices Standardized virtual device set provides hardware independence, virtual machine portability Intermediate layer for enterprise-class functionality. Examples: VMFS: store and manipulate virtual disks Storage multipathing: link failover Virtual network switch: VLAN, NIC teaming How to route data from/to physical devices? Other scalability factors Many CPUs/server Scalable scheduler algorithms, large SMP VMs Large memory and storage addressing 64-bit vmkernel Many storage targets VMFS scaling LUN-mapping schemes Many servers Scalable distributed resource scheduling Observations CPU and I/O virtualization No single technique satisfies all the requirements Provide transparent choice of best technique for the hardware and customer needs Hardware support can help virtualization software Evolve towards the vision and requirements of the future datacenter Scalability and power management Stateless servers and converged I/O fabrics