Delivering Transformational User Experience with Blast Extreme Adaptive Transport and NVIDIA GRID
Total Page:16
File Type:pdf, Size:1020Kb
Delivering Transformational User Experience with Blast Extreme Adaptive Transport and NVIDIA GRID. Kiran Rao – Director, Product Management at VMware Luke Wignall – Sr. Manager, Performance Engineering at NVIDIA © 2014 VMware Inc. All rights reserved. Challenges for Virtual Graphics Professional graphics workloads require great user experience in both LAN & WAN environments. UX Require Rely on heavy User density is “snappy” encoding and limited by CPU experience decoding bottleneck VMware Horizon Gets Even Better with NVIDIA GRID NVIDIA GRID NVIDIA GRID Virtual PC Virtual Workstation NVIDIA NVIDIA Quadro QuadroGraphics Driver Driver Driver NVIDIA GRID vGPU manager vSphere vGPU vGPU NVSMI, NVSMI, NVML Scheduling – 3D, CE, NVENC, NVDEC NVIDIA Tesla GPU NVIDIA GRID management tools GRID management NVIDIA H.264 Encode Server Blast Extreme Protocol Blast Extreme: Unified Protocol for All VMware Products • A new VMware controlled protocol for a richer app & desktop experience • Protocol optimized for mobile and overall lower client TCO • Horizon remote experience features work with Blast Extreme and updated Horizon clients • Performance on par or exceeding all competitive protocols • Rapid client proliferation from strong Horizon Client ecosystem 2013 2015 2016 2017 BEAT 5 Blast Extreme Designed for All Use Cases Windows VDI, RDSH Apps/Desktop & Linux VDI SDKs Feature-Rich User Experience Hosted Apps Printing Scanning Smart USB Audio In/Out Client Drive Windows Media File Type Unified Webcams & RDS & Imaging Card Redirection Redirection Association Communi- Desktops Devices cations Session Enhancement SDK Horizon Clients / Broadest Support For Every Use Case RDP VC Bridge SDK Windows Linux HTML Mac iOS Android Chrome OS Thin Clients (Blast Only) 6 6 Blast Extreme – A Year Of Progress GRID Optimized: Broad Client Support: Superior 3D App Use Less Bandwidth: 125+ thin/zero clients certified Experience Up to 50% reduction Network Friendly: WAN Optimized: Monitoring: Blast Extreme Adaptive Transport Faster access across WAN More insight and control 7 The most powerful data center GPUs targeted at graphics virtualization TeslaM10 Lineup for GRIDM6 M60 GPU Quad Mid-level Maxwell Single High-end Maxwell Dual High-end Maxwell CUDA Cores 2560 (640 per GPU) 1536 4096 (2048 per GPU) Memory Size 32 GB GDDR5 (8 GB per GPU) 8 GB GDDR5 16 GB GDDR5 (8GB per GPU) H.264 1080p30 streams 28 18 36 Max vGPU instances 64 16 32 Form Factor PCIe 3.0 Dual Slot (rack servers) MXM (blade servers) PCIe 3.0 Dual Slot (rack servers) Power 225W 100W (75W opt) 240W / 300W (225W opt) Thermal passive bare board active / passive USER DENSITY BLADE PERFORMANCE Optimized Optimized Optimized NVIDIA Blast Extreme Acceleration Apps • Reduces overall latency Apps Apps Remote Client Graphics H.264 • Offloads CPU workload to GPU commands streams GRID GPU • Increases scalability HW Encoder 3D • Improves user experience Context/Display Capture • Lowers N/W bandwidth demand Render Front Target Buffer • Supported with single and multiple Framebuffer monitor use case Blast Extreme Adaptive Transport Blast Extreme Adaptive Transport: BEAT Maintain a Great User Experience Dynamically Adjust to: Across a Wide Variety of Network Types: Corporate LAN Varying speeds Public Wi-Fi Severe packet loss Mobile networks CONFIDENTIAL11 Blast Extreme Adaptive Protocol Settings • Excellent – TCP Only – Ideal for Corporate LAN • Typical (Default) – UDP for protocol transport, TCP for control and broker communications – Falls back to TCP if UDP connection is blocked – Ideal for most situations. Dynamically adjusts for packet loss and jitter • Poor – UDP Only – For protocol transport, control and broker communications – Requires VMware Unified Access Gateway 2.9 or later – Necessary when network conditions are so poor can’t make a broker connection typically greater than 20% packet loss 12 Blast Extreme Adaptive Transport – by the Numbers Improvements Over 4x faster file transfers for Over 6x faster file transfers for Up to 50% bandwidth cross-continental connections trans-continental connections reduction out-of-box ~100 ms with slight packet loss (1%) ~200 ms with slight packet loss (1%) over previous over previous versions Delivers with Challenging Networks Over 13x higher average frame rates Over 57% higher average frame rates under extremely poor network conditions with high latency and slight packet loss Low bandwidth, high latency, significant Low bandwidth, high latency, slight packet loss (1.5 Mbps, 200 ms, 20%) packet loss (10 Mbps, 200 ms, 1%) Over 70% higher average frame rates Over 2x faster file transfers under under poor network conditions extreme network conditions Low bandwidth, high latency, and medium Low bandwidth, high latency, significant packet loss (1.5 Mbps, 200 ms, 5%) packet loss (1.5 Mbps, 200 ms, 20%) CONFIDENTIAL 13 Windows 10 requires more resources for improvement User Experience Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 100 400 90 80 15% more 300 70 CPU utilization 60 50 200 40 CPU host utilization CPU host utilization % 30 100 20 10 0 0 Windows 7 Windows 10 Windows 10 Windows 10 Time (single (single (single (dual 1920x1080) 1920x1080) 2560x1600) 1920x1080) Windows 7 Windows 10 64 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload Bandwidth: Office Worker Use Case – Single Display 1080p Bandwidth: Office Worker Use Case – Dual Display 1080p Host CPU offloading Blast Extreme decreases CPU utilization on the host, up to 42% Total sum 100 90000 90 75000 80 70 60000 Lower 60 45000 is 50 better 40 30000 30 15000 20 10 0 0 NOGPU-PCoIP GPU-PCoIP NoGPU-JPEG GPU-JPEG NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU GPU-BLAST-NVENC 63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency CPU only vs. NVIDIA GRID GPU with NVENC provide an average positive increase to UX of 34% 5.0 +55% +26% +13% +13% +20% +5% +6% +21% +30% +133% +68% User Experience Scale +19% +65% +9% 4.0 1 Unacceptable, unusable - fire someone in IT! 3.0 2 Barely useable, borderline, but I’ll get tired of this soon Higher 2.0 is 3 Tolerable, I guess I can make do better 1.0 4 Pretty good for a virtual desktop 0.0 5 Outstanding - as good (or almost) as physical Horizon 7 with PCoIP - No GPU Horizon 7 with Blast Extreme and H.264 HW Testing ran on two identical systems, CPU system was loaded up to 60-80% utilization, the GPU system ran the same workload CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY CLICK TO PHOTON SIMPLIFIED Mouse click Mouse button Packet Received Packed Decoded Access Device processed released Packetized and Packet Frame displayed encoded transmitted Network Latency CLICK-TO- Network Latency on the WAN Network Latency on the WAN PHOTON (i.e. 50ms) (i.e. 50ms) LATENCY Mouse click Packet Received Packet Decoded Application Server processed New Frame Frame Captured via Frame Encoded via Frame rendered NVIDIA NVFBC NVIDIA NVENC transmitted CLICK to PHOTON Latency Comparing latency of single VM and at scale(80%) at <1ms network latency 300 250 250 ms 200 240 Lower 150 is 170 160 Idle, 1 VM better 100 185 165 110 Scale, 64VMs 155 125 50 65 107 0 Local PC with Blast Extreme Blast Extreme Blast Extreme Blast Extreme Blast Extreme Integrated No GPU - M10-1B - No GPU - M10-1B - M10-1B - GPU JPEG/PNG JPEG/PNG H.264 H.264 H.264 Software Software Hardware 63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency Blast Extreme Adaptive Transport User Experience: FPS in LAN & WAN Environments FPS: LAN, 0% PL, 0ms RTT FPS:10Mbps, 200ms RTT FPS:1.5Mbps, 200ms RTT 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 1%PL , 5%PL 10%PL 20%PL 100Mbps 200Mbps 300Mbps 1%PL , 5%PL 20%PL BEAT PcoIP BEAT PCoIP BEAT PCoIP Blast Extreme Adaptive Transport - Demo Instant Clone Support for NVIDIA GRID vGPU Desktops Overview • Option to provision vGPU desktops using Instant Clones • Admin to select profile during pool creation. Note: only 1 profile supported per ESX cluster • Compatible with NVIDIA GRID M Series Benefits • Broader set of use cases – no longer have to choose between better provisioning and better 3D graphics • 2x faster provision over View Composer 23 High Availability with vSphere 6.0 U3/6.5 • vSphere HA enables high availability of VMs in case of server HW failure, by immediately detecting the failure (and thus VM crash) and restarting the VM on another host in the cluster • vSphere HA is now extended to NVIDIA GRID vGPU-backed VMs with vSphere 6.5/6.0 U3 App • If a vGPU server fails with vSphere HA, the VMs Volumes will automatically be started on another vGPU enabled host on the same cluster 24 Sizing Your Virtual Desktop for High Performance Graphics High Performance VDI is NOT Your Mother’s VDI CPU Memory IOPS Rich Graphics Density Design First for User Experience Ensure the right level of performance and then use that determine density User experience must be the equivalent of Determine how many users they are used to today you can put on a host based on requirements of users One Size Does Not Fit All Different users have different requirements User vCPU vRAM vGPU OS requirements • At least 2-8 vCPUs • 4GB • Quadro features • vGPU – Power User – 64 bit Windows • Recommend at • 512MB-1GB – Office user – 64 bit Linux Performance least 4 vCPUs for • 8GB Power Users – Mid Eng./Video • 1-2GB • Install – VM Tools • 16GB – Power user – View Agent – Advanced Eng. • 2-4GB – NVIDIA driver Mobility • 32GB – Designer/ (vGPU) – CAD/CAM engineer