With NVIDIA Virtual GPU
Total Page:16
File Type:pdf, Size:1020Kb
Virtualizing AI/DL platform: NVIDIA Virtual GPU Solution Doyoung Kim, Sr. Solution Architect NVIDIA Virtual GPU Solution Support, Updates & Maintenance NVIDIA Virtual GPU Software Tesla Datacenter GPUs How it works? CPU Only VDI With NVIDIA Virtual GPU Apps and VMs NVIDIA Graphics Drivers Apps and VMs NVIDIA Virtual GPU Hypervisor NVIDIA virtualization software Hypervisor Server NVIDIA Tesla GPU Server Evolution of Virtual GPU Live Migration Ultra High End Simulation, NGC Support Photo Realism, 3D Rendering, AL/DL Business Designers, User Architects, Engineers 2013 2016 2017 Today GPU Compute on Virtual GPU Optimized DL Container from NGC Virtual Machine Virtual Quadro GPU CUDA & OpenCL enabled NVIDIA Virtualization Software Hypervisor NVIDIA Tesla GPU Server Virtual GPU can…. ✓ Run GPU compute workload - Any CUDA / OpenCL applications - Requires NVIDIA Virtual GPU 5.0 or higher - Requires Pascal or later GPU ✓ Be fully integrated with Virtualization solution - Live migration support from Virtual GPU 6.0 or higher - Support Cluster / Host / VM / Application level performance monitoring - Enabled for all major solutions such as vSphere / XenServer / KVM ✓ Provide every level of AI / DL Platforms - From desktop level to multi-GPU server - Fully support NVIDIA GPU Cloud (NGC) - Support up to 4 multi-vGPU per Virtual Machine* * Requires Virtual GPU 7.0 or higher / RHEL KVM only Why Virtualization? Which is best for AI/DL research system? PC with Consumer GPU GPU Server w/o Virtualization NVIDIA GPU Virtualization Manageability Require MGMT Fully managed Resource Utilization Require Scheduler Flexible resource mgmt No Tech Support Resource Utilization Support by NVIDIA Stability Limited CCUs (1:1) Up to 32 Users per GPU 1 User per 1 GPU (1:1) GPU Recommendation Optimized for DL training & inference NEW! GPU Tesla V100 32GB Tesla T4 Form Factor 5,120 CUDA Core 2,560 CUDA Core Cores 640 Tensor Core 320 Tensor Core Performance 7.8TF DP, 15.7TF SP, 125TF FP16 8.1TF SP, 64.8TF FP16 Memory Size 32GB HBM2 16GB GDDR6 Memory Bandwidth 900GB/s 320GB/s GPU Peer to Peer PCIe Gen3 / NVLink PCIe Gen3 Power 300W 70W Configuration Example #1 Standard GPU Server with 3 types of DL workload = 16 Users Light DL VM User:GPU=8:1 Heavy DL VM Midrange DL VM User:GPU=2:1 User:GPU=4:1 Hypervisor NVIDIA Virtualization Software NVIDIA Tesla V100 (x3) GPU Server Configuration Example #2 High Density GPU Server with 4 types of DL workload = 9 Users DL Deployment VM DL Deployment VM Heavy DL VM Heavy DL VM User:GPU=1:2* User:GPU=1:4* User:GPU=2:1 User:GPU=1:1 Hypervisor NVIDIA Virtualization Software NVIDIA Tesla V100 (x8) GPU Server * Multi-vGPU support from NVIDIA Virtual GPU 7.0 (Oct.2018) Performance Estimate Images/Sec 4500 4121 4000 3500 3000 2500 2134 2000 Titan V 1181 / s 1500 GF 1080 TI 513.6 / s 1140 GF 1070 1000 GF 1050 292.9 / s 621 91.4 / s 500 282 111 0 Light DL DT (8:1) Mid DL DT (4:1) Heavy DL DT (2:1) Heavy DL DT (1:1) DL Deploy VM (2:1) DL Deploy VM (4:1) 8xXeon Gold 6126 vCPU / 12GB vMem / V100D-4Q, 8Q, 16Q, 32Q, 2x32Q, 4x32Q Tensorflow benchmark using Resnet 152, batch size 64 to 512, FP16 with TensorCore Result measured by trained image per second, Average value from concurrent test run Before / After GPU Servers for Deployment Virtual GPU Cluster 1:2 1:4 vGPU vGPU VM VM DL Deployment VMs 2:1 2:1 1:1 vGPU vGPU vGPU VM VM VM Resource Manager / Monitor / Automation Heavy DL VMs 4:1 4:1 4:1 4:1 vGPU vGPU vGPU vGPU GPU DTs for Researcher VM VM VM VM GPU (%) GPU (%) Mid DL VMs 8:1 8:1 8:1 8:1 8:1 8:1 8:1 8:1 vGPU vGPU vGPU vGPU vGPU vGPU vGPU vGPU VM VM VM VM VM VM VM VM Light DL VMs Out of Service More to come SCALABILITY STABILITY Multi-vGPU for all / Multi- High availability Node ECC & Page Retirement NVLINK / GPU Direct RDMA PERFORMANCE ENHANCEMENT Virtual GPU for Compute Next gen. GPU Support Virtual GPU Test Drive https://www.nvidia.com/tryvgpu NVIDIA Virtual GPU Website www.nvidia.com/virtualgpu NVIDIA Virtual GPU YouTube Channel http://tinyurl.com/gridvideos Reference Questions? Ask on our Forums https://gridforums.nvidia.com NVIDIA Virtual GPU on LinkedIn http://linkd.in/QG4A6u Follow us on Twitter @NVIDIAVirt Technical Question | Sales Inquiry [email protected] [email protected] Questions? SEOUL | NOVEMBER 7 - 8,2018 www.nvidia.com/ko-kr/ai-conference/.