PAC-498B ESX Server Architectural Directions

Beng-Hong Lim Director of R&D This presentation may contain VMware confidential information.

Copyright © 2005 VMware, Inc. All rights reserved. All other marks and names mentioned herein may be trademarks of their respective companies. Agenda ƒ ESX Server overview ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions ESX Server Core Virtualization

VMX VMX VMX VMX VM VM VM VM Virtual I/O Stack VMM VMM VMM VMM Machine Service Monitor Console Device Drivers

Storage Stack Network Stack

Device Drivers

VMkernel VMkernel Hardware Interface

Hardware ESX Server Enterprise-Class Features

Third- Party Agents VMX VMX VMX VMX VM VM VM VM Virtual I/O Stack VMM VMM VMM VMM Machine Service Monitor Console Device Drivers Distributed Virtual NIC & Enterprise Resource VM Switch Class Management Virtualization Functionality CPU Scheduling Storage Stack Network Stack Memory Scheduling Storage Bandwidth Network Bandwidth Device Drivers VMkernel VMkernel Hardware Interface

Hardware ESX Server in the Datacenter

Management Third-party Solutions and Distributed Virtualization Services Third- SDK, VirtualCenter Agent Party Agents VMX VMX VMX VMX VM VM VM VM Virtual I/O Stack Machine Service VMM VMM VMM VMM Device Drivers Monitor VMotion Console Provisioning Distributed Virtual NIC & Enterprise VM File System Switch Class Backup Resource Virtualization Management DRS Functionality CPU Scheduling Storage Stack Network Stack DAS Memory Scheduling Storage Bandwidth VirtualCenter Distributed Network Bandwidth Device Drivers Services VMkernel VMkernel Hardware Interface

Hardware Agenda ƒ Overview of ESX Server ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions Server Technology Trends ƒ Multi-core CPUs ƒ 16 to 32 CPU cores per server ƒ 64-bit systems ƒ Multi-terabytes of memory ƒ Power-aware architectures ƒ Adaptive throttling of CPUs and server hardware ƒ Converged I/O fabrics and interfaces ƒ Shared high-speed interface to network and storage ƒ Network-based, virtualized storage ƒ Stateless servers The Future Datacenter

Each server: ƒ Many CPU cores ƒ 64-bits: lots of memory ƒ Shared, high-bandwidth

… connection to network … and external storage ƒ Stateless ƒ Power-hungry

Servers Network Storage Virtualization is Key ƒ Abundant compute resources on a server ƒ Virtualization is inherently scalable and parallel ƒ The killer app for efficiently utilizing many CPUs and multi- terabytes of memory ƒ Power management increasingly important ƒ Higher compute densities and rising utility costs ƒ Maximize performance per watt across all servers ƒ Distributed resource scheduling can optimize for this metric ƒ Transforms system management ƒ Breaks the bond between hardware and applications ƒ Ease of scale-up management in a scale-out environment The Virtual Datacenter

ƒ ESX Server virtualizes individual servers ƒ VirtualCenter synthesizes virtualized servers into a giant computer ƒ ESX Server and … … VirtualCenter map applications and virtual machine topologies onto physical resources

A Distributed Virtual Computer Agenda ƒ Overview of ESX Server ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions CPU Virtualization

VMX VMX VMX VMX VM VM VM VM

I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System

Storage Stack Network Stack

Device Drivers

VMkernel

Hardware CPU Virtualization ƒ Basic idea: directly execute code until not safe ƒ Handling unsafe code ƒ Trap and emulate: classic mainframe ƒ Avoid unsafe code, call into VMM: paravirtualization ƒ Dynamically transform to safe code: binary translation ƒ Tradeoffs among the methods

Trap and Para- Binary Emulate virtualization Translation Performance Average Excellent Good Compatibility Excellent Poor Excellent Sophistication Average Average High CPU Virtualization Directions ƒ New technologies: 64-bit CPUs, VT/Pacifica, Linux/Windows paravirtualization. Many guest OS types ƒ Flexible architecture supports mix of guests and VMM types . . . ƒ Separate VMM per virtual VM VM VM VM machine Para BT VT . . . BT VMM VMM-32 VMM-64 VMM-32 ƒ Simultaneously run 32-bit, 64-bit, and paravirtualized VMkernel guests ƒ Use most efficient method for the hardware and guest OS Agenda ƒ Overview of ESX Server ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions I/O Virtualization

VMX VMX VMX VMX VM VM VM VM

I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System

Storage Stack Network Stack

Device Drivers

VMkernel

Hardware I/O Virtualization Paths

Guest OS VMX Paths to physical device 1. Hosted/Split I/O: via a Device Emulation separate host/VM I/O Stack VMM 2. Native I/O: via the Device Service Console Device Driver vmkernel Emulation 3. Passthrough I/O: guest 1. Hosted/ I/O Stack directly drives device Split I/O Device Driver Needs hardware support,

VMkernel 2. Native I/O sacrifices functionality. 3. Passthrough I/O

Which I/O path to use? Evaluating the I/O paths ƒ Compatibility ƒ Hardware vendors can re-use existing device drivers ƒ Performance (per watt) ƒ High I/O performance, low CPU occupancy ƒ Isolation ƒ Contain device driver faults ƒ Virtualization Functionality ƒ Virtual machine portability ƒ Resource sharing and multiplexing ƒ Offloading guest functionality into virtualization layer I/O Virtualization: Compatibility

Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Isolation Functionality

ƒ Hosted/Split and Passthrough can re-use device drivers from existing OSes ƒ Native requires new or ported drivers. Provide DDK and driver to ease driver development and porting Hosted/Split I/O Performance Service Console or Trusted Virtual Virtual Virtual Virtual Machine Machine1 Machine2 Machine3 OS OS OS OS Backend Frontend Frontend Frontend Native Device Device Device Device Driver Drivers Driver Driver

VMkernel

ƒ -style communication, and scheduling overhead unless CPUs dedicated ƒ Scalability limits to Service Console or Driver VM Native I/O Performance Virtual Virtual Virtual Machine1 Machine2 Machine3 OS OS OS

Frontend Frontend Frontend Device Device Device Driver Driver Driver

Backend VMkernel Native Device Drivers

ƒ Direct calls between frontend and backend ƒ Backend can run on any CPU, scalable Passthrough I/O Performance Virtual Virtual Virtual Machine1 Machine2 Machine3 OS OS OS

Device Device Device Driver Driver Driver

Interrupt routing VMkernel

ƒ Guest OS driver drives the device directly ƒ VMkernel may have to handle/route I/O Virtualization: Performance

Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Functionality ƒ Hosted/Split incurs switching and scheduling overheads, or consumes dedicated CPUs ƒ Native and Passthrough are efficient, scalable ƒ Passthrough avoids an extra driver layer, but runs more code non-natively I/O Virtualization: Isolation, Today

Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation None None N/A Functionality

ƒ Passthrough allows malicious guest to crash system, so not an option ƒ All three methods need I/O MMU to map and protect DMA I/O Virtualization: Isolation, Future

Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Good Good Good Functionality

ƒ Hosted/Split and Passthrough can isolate within virtual machine, use I/O MMUs ƒ Native can isolate within in-kernel protection domains, use VT/Pacifica and I/O MMUs ƒ Not a substitute for testing and qualification I/O Virtualization: Functionality

Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Good Good Good Functionality Good Good Poor

ƒ Passthrough precludes offloading functionality from the guest into the virtualization layer, e.g., NIC teaming, SAN multipathing ƒ Passthrough sacrifices some key virtualization capabilities; VM portability, VMotion I/O Virtualization Direction

Hosted/Split Native Passthrough Compatibility Good Poor Good Performance Poor Good Good Isolation Good Good Good Functionality Good Good Poor

ƒ Future datacenter implications ƒ Power-efficient performance favors Native and Passthrough ƒ Stateless servers and converged I/O interfaces: fewer devices to support, eases compatibility I/O Virtualization Direction

Guest OS VMX ƒ Optimize Native I/O for selected devices, Device Device Driver Emulation driver isolation

I/O Stack VMM

Device Service Console Device Driver Emulation

I/O Stack

Device Driver VMkernel I/O Virtualization Direction

Guest OS ƒ Optimize Native I/O for selected devices, Device Driver driver isolation

VMM ƒ Migrate from Device Hosted/Split I/O to Emulation Passthrough I/O I/O Stack when hardware ready

Device Driver VMkernel I/O Virtualization Direction

Guest OS Guest OS ƒ Optimize Native I/O I/O Stack for selected devices, Device Driver Device Driver driver isolation

VMM ƒ Migrate from Device Hosted/Split I/O to Emulation Passthrough I/O I/O Stack when hardware ready

Device Driver ƒ Can synthesize VMkernel Hosted/Split I/O by proxy through Passthrough I/O VM Hardware Support for Passthrough

ƒ To preserve key virtualization capabilities ƒ Device sharing: multiple virtual end points ƒ Snapshots and VMotion: save/restore device state ƒ Page sharing, VMotion: demand paging ƒ Virtual machine portability: standard device abstraction

ƒ Active industry interest in hardware support for Passthrough I/O. Please contact VMware if interested Other I/O Virtualization Directions ƒ More network-based storage support ƒ iSCSI and NAS in ESX Server 3.0 ƒ I/O accelerators ƒ Offload engines: offload guest or vmkernel I/O ƒ Intel I/OAT: guest and vmkernel usage ƒ I/O bandwidth management ƒ Important for shared interfaces, converged I/O fabrics ƒ Paravirtualization ƒ Reduce hardware requirements for Passthrough I/O ƒ Define a standard paravirtual I/O interface Agenda ƒ Overview of ESX Server ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions Scalable Performance

SDK and VirtualCenter Agent Third Party Agents VMX VMX VMX VMX VM VM VM VM

I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System

Resource Storage Stack Network Stack Management Device Drivers

VMkernel

Hardware Scalable Performance

Service SDK and VirtualCenter Agent Console Third Party Virtual Agents Machine VMX VMX VMX VMX VM VM VM VM

I/O Stack VMM VMM VMM VMM Device Drivers Distributed Virtual NIC and Virtual Machine Switch File System

Resource Storage Stack Network Stack Management Device Drivers

VMkernel

Hardware Scalable Performance

8-CPU DL-760 16GB RAM ESX 2: VMX on Service Console

ESX 3: VMX on VMkernel Windows 2000 Boot Time (s) Windows 2000 Boot

Number of idle Windows 2000 guests Agenda ƒ Overview of ESX Server ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions Power Efficiency and Management

ƒ Increasing CPU power consumption ƒ Increasing compute and power densities with multi-core CPUs and stateless servers ƒ Significant cost to power and cool a datacenter ƒ Limits to datacenter power and cooling capability Server Power Management

ƒ Power consumption varies as cube of Conventional voltage x frequency ƒ New hardware support for dynamically adjusting voltage/frequency Voltage/Freq ƒ Load-balance across minimally powered CPUs

With parallelism, two half-speed CPUs are more efficient than one full-speed CPU Datacenter Power Management

ƒ Dynamic, power-aware load-balancing across servers with VMotion ƒ Consider fixed power consumption per server ƒ Balance powering off … … servers vs. throttling CPUs

Need an efficient virtualization layer Agenda ƒ Overview of ESX Server ƒ Technology trends in the datacenter ƒ ESX Server architectural directions ƒ CPU virtualization ƒ I/O virtualization ƒ Scalable performance ƒ Power efficiency and management ƒ Conclusions Recap ƒ Consider datacenter of the future ƒ Stateless, power-hungry virtualized servers with many CPUs, lots of memory ƒ Virtualized network “backplane” ƒ Virtualized network-based storage ƒ A global virtual computer ƒ Impact on ESX Server architecture ƒ CPU virtualization: multiple VMM types ƒ I/O virtualization: Native and Passthrough I/O ƒ Scalable performance: relieve bottlenecks ƒ Power efficiency: minimize CPU consumption Role of Virtualization Hardware Support ƒ Phase 1: Hardware for correctness ƒ Trap or exit on unsafe code ƒ Safe device access, driver isolation ƒ Phase 2: Hardware as accelerator ƒ Speed up virtualization software ƒ Fast VM enter/exit, nested paging, I/O offloading

ƒ Does not eliminate the need for virtualization software, just as hardware support does not eliminate the need for operating systems ESX Server Architecture Today

SDK and VirtualCenter Agent Third-Party Agents VMX VMX VMX VMX VM VM VM VM

I/O Stack VMM VMM VMM VMM Service Console Device Drivers Distributed Virtual NIC and Virtual Machine Switch Resource File System Management CPU Scheduling Storage Stack Network Stack Memory Scheduling Storage Bandwidth Network Bandwidth Device Drivers

VMkernel VMkernel Hardware Interface

Hardware ESX Server Architecture in the Future

SDK and Third-Party Management Agents Agents

VM VM VMX VMX VMX VMX VM VM VM VM

Para- Para- VT VMM-64 POSIX API VMM-32 VMM-64 VMM VMM VMM

Distributed Resource Virtual NIC and Virtual Machine Management Switch File System Passthrough I/O CPU Scheduling Memory Scheduling Storage Stack Network Stack Storage Bandwidth Network Bandwidth Power Management Isolated Device Drivers/Modules

VMkernel VMkernel Hardware Interface

Hardware Call to Action ƒ Hardware vendors ƒ Build performance-focused virtualization assists ƒ Build hardware for fully-functional Passthrough I/O ƒ Work with VMware on Native I/O devices and drivers ƒ Software vendors ƒ Support standard interfaces: OS, apps, management ƒ VMI for transparent paravirtualization ƒ Datacenter architects and administrators ƒ Virtualize now, get ready for the future virtual datacenter VMware will use relevant technology to provide the broadest, most flexible, highest performance virtualization platform Backup Slides This presentation covers potential and uncommitted future directions. Details about future releases of our products are available in select sessions at VMworld, including:

PAC879: The Next Phase of Virtual Infrastructure: Introducing ESX Server 3.0 and VirtualCenter 2.0 PAC177: Distributed Availability Services Architecture PAC484: Consolidated Backup with ESX Server: In-Depth Review PAC485: Managing Data Center Resources Using the VirtualCenter Distributed Resource Scheduler PAC532: iSCSI and NAS in ESX Server 3 Overview of ESX Server ƒ Mature x86 hypervisor-based virtualization ƒ Considered best server virtualization approach ƒ Highlights: ƒ Sophisticated resource management ƒ Enterprise networking and storage support ƒ Integrated VMFS for managing virtual disks ƒ Broadest support for x86 OSes ƒ Transparent VM migration (VMotion) I/O Virtualization ƒ Full virtualization of I/O devices ƒ Standardized virtual device set provides hardware independence, virtual machine portability ƒ Intermediate layer for enterprise-class functionality. Examples: ƒ VMFS: store and manipulate virtual disks ƒ Storage multipathing: link failover ƒ Virtual network switch: VLAN, NIC teaming ƒ How to route data from/to physical devices? Other scalability factors ƒ Many CPUs/server ƒ Scalable scheduler algorithms, large SMP VMs ƒ Large memory and storage addressing ƒ 64-bit vmkernel ƒ Many storage targets ƒ VMFS scaling ƒ LUN-mapping schemes ƒ Many servers ƒ Scalable distributed resource scheduling Observations ƒ CPU and I/O virtualization ƒ No single technique satisfies all the requirements ƒ Provide transparent choice of best technique for the hardware and customer needs ƒ Hardware support can help virtualization software ƒ Evolve towards the vision and requirements of the future datacenter ƒ Scalability and power management ƒ Stateless servers and converged I/O fabrics