Virtualizing AI/DL platform: NVIDIA Virtual GPU Solution Doyoung Kim, Sr. Solution Architect NVIDIA Virtual GPU Solution
Support, Updates & Maintenance
NVIDIA Virtual GPU Software
Tesla Datacenter GPUs How it works?
CPU Only VDI With NVIDIA Virtual GPU
Apps and VMs
NVIDIA Graphics Drivers
Apps and VMs NVIDIA Virtual GPU
Hypervisor NVIDIA virtualization software
Hypervisor
Server NVIDIA Tesla GPU
Server Evolution of Virtual GPU
Live Migration Ultra High End Simulation, NGC Support Photo Realism, 3D Rendering, AL/DL
Business Designers, User Architects, Engineers
2013 2016 2017 Today GPU Compute on Virtual GPU
Optimized DL Container from NGC
Virtual Machine Virtual Quadro GPU CUDA & OpenCL enabled
NVIDIA Virtualization Software Hypervisor
NVIDIA Tesla GPU Server Virtual GPU can….
✓ Run GPU compute workload - Any CUDA / OpenCL applications - Requires NVIDIA Virtual GPU 5.0 or higher - Requires Pascal or later GPU
✓ Be fully integrated with Virtualization solution - Live migration support from Virtual GPU 6.0 or higher - Support Cluster / Host / VM / Application level performance monitoring - Enabled for all major solutions such as vSphere / XenServer / KVM
✓ Provide every level of AI / DL Platforms - From desktop level to multi-GPU server - Fully support NVIDIA GPU Cloud (NGC) - Support up to 4 multi-vGPU per Virtual Machine*
* Requires Virtual GPU 7.0 or higher / RHEL KVM only Why Virtualization? Which is best for AI/DL research system?
PC with Consumer GPU GPU Server w/o Virtualization NVIDIA GPU Virtualization
Manageability Require MGMT Fully managed Resource Utilization Require Scheduler Flexible resource mgmt No Tech Support Resource Utilization Support by NVIDIA Stability Limited CCUs (1:1) Up to 32 Users per GPU 1 User per 1 GPU (1:1) GPU Recommendation Optimized for DL training & inference
NEW! GPU Tesla V100 32GB Tesla T4
Form Factor
5,120 CUDA Core 2,560 CUDA Core Cores 640 Tensor Core 320 Tensor Core
Performance 7.8TF DP, 15.7TF SP, 125TF FP16 8.1TF SP, 64.8TF FP16
Memory Size 32GB HBM2 16GB GDDR6 Memory Bandwidth 900GB/s 320GB/s GPU Peer to Peer PCIe Gen3 / NVLink PCIe Gen3 Power 300W 70W Configuration Example #1 Standard GPU Server with 3 types of DL workload = 16 Users
Light DL VM User:GPU=8:1
Heavy DL VM Midrange DL VM User:GPU=2:1 User:GPU=4:1
Hypervisor
NVIDIA Virtualization Software
NVIDIA Tesla V100 (x3) GPU Server Configuration Example #2 High Density GPU Server with 4 types of DL workload = 9 Users
DL Deployment VM DL Deployment VM Heavy DL VM Heavy DL VM User:GPU=1:2* User:GPU=1:4* User:GPU=2:1 User:GPU=1:1
Hypervisor
NVIDIA Virtualization Software
NVIDIA Tesla V100 (x8) GPU Server * Multi-vGPU support from NVIDIA Virtual GPU 7.0 (Oct.2018) Performance Estimate
Images/Sec 4500 4121 4000
3500
3000
2500 2134 2000 Titan V 1181 / s 1500 GF 1080 TI 513.6 / s 1140 GF 1070 1000 GF 1050 292.9 / s 621 91.4 / s 500 282 111 0 Light DL DT (8:1) Mid DL DT (4:1) Heavy DL DT (2:1) Heavy DL DT (1:1) DL Deploy VM (2:1) DL Deploy VM (4:1)
8xXeon Gold 6126 vCPU / 12GB vMem / V100D-4Q, 8Q, 16Q, 32Q, 2x32Q, 4x32Q Tensorflow benchmark using Resnet 152, batch size 64 to 512, FP16 with TensorCore Result measured by trained image per second, Average value from concurrent test run Before / After
GPU Servers for Deployment Virtual GPU Cluster
1:2 1:4 vGPU vGPU VM VM DL Deployment VMs 2:1 2:1 1:1 vGPU vGPU vGPU VM VM VM Resource Manager / Monitor / Automation Heavy DL VMs 4:1 4:1 4:1 4:1 vGPU vGPU vGPU vGPU GPU DTs for Researcher VM VM VM VM GPU (%) GPU (%) Mid DL VMs 8:1 8:1 8:1 8:1 8:1 8:1 8:1 8:1 vGPU vGPU vGPU vGPU vGPU vGPU vGPU vGPU VM VM VM VM VM VM VM VM Light DL VMs Out of Service More to come
SCALABILITY STABILITY Multi-vGPU for all / Multi- High availability Node ECC & Page Retirement NVLINK / GPU Direct RDMA
PERFORMANCE ENHANCEMENT Virtual GPU for Compute Next gen. GPU Support Virtual GPU Test Drive https://www.nvidia.com/tryvgpu
NVIDIA Virtual GPU Website www.nvidia.com/virtualgpu
NVIDIA Virtual GPU YouTube Channel http://tinyurl.com/gridvideos
Reference Questions? Ask on our Forums https://gridforums.nvidia.com
NVIDIA Virtual GPU on LinkedIn http://linkd.in/QG4A6u
Follow us on Twitter @NVIDIAVirt Technical Question | Sales Inquiry [email protected] [email protected] Questions? SEOUL | NOVEMBER 7 - 8,2018 www.nvidia.com/ko-kr/ai-conference/