Virtualizing AI/DL platform: Virtual GPU Solution Doyoung Kim, Sr. Solution Architect NVIDIA Virtual GPU Solution

Support, Updates & Maintenance

NVIDIA Virtual GPU Software

Tesla Datacenter GPUs How it works?

CPU Only VDI With NVIDIA Virtual GPU

Apps and VMs

NVIDIA Graphics Drivers

Apps and VMs NVIDIA Virtual GPU

Hypervisor NVIDIA virtualization software

Hypervisor

Server GPU

Server Evolution of Virtual GPU

Live Migration Ultra High End Simulation, NGC Support Photo Realism, 3D Rendering, AL/DL

Business Designers, User Architects, Engineers

2013 2016 2017 Today GPU Compute on Virtual GPU

Optimized DL Container from NGC

Virtual Machine Virtual GPU CUDA & OpenCL enabled

NVIDIA Virtualization Software Hypervisor

NVIDIA Tesla GPU Server Virtual GPU can….

✓ Run GPU compute workload - Any CUDA / OpenCL applications - Requires NVIDIA Virtual GPU 5.0 or higher - Requires Pascal or later GPU

✓ Be fully integrated with Virtualization solution - Live migration support from Virtual GPU 6.0 or higher - Support Cluster / Host / VM / Application level performance monitoring - Enabled for all major solutions such as vSphere / XenServer / KVM

✓ Provide every level of AI / DL Platforms - From desktop level to multi-GPU server - Fully support NVIDIA GPU Cloud (NGC) - Support up to 4 multi-vGPU per Virtual Machine*

* Requires Virtual GPU 7.0 or higher / RHEL KVM only Why Virtualization? Which is best for AI/DL research system?

PC with Consumer GPU GPU Server w/o Virtualization NVIDIA GPU Virtualization

Manageability Require MGMT Fully managed Resource Utilization Require Scheduler Flexible resource mgmt No Tech Support Resource Utilization Support by NVIDIA Stability Limited CCUs (1:1) Up to 32 Users per GPU 1 User per 1 GPU (1:1) GPU Recommendation Optimized for DL training & inference

NEW! GPU Tesla V100 32GB Tesla T4

Form Factor

5,120 CUDA Core 2,560 CUDA Core Cores 640 Tensor Core 320 Tensor Core

Performance 7.8TF DP, 15.7TF SP, 125TF FP16 8.1TF SP, 64.8TF FP16

Memory Size 32GB HBM2 16GB GDDR6 Memory Bandwidth 900GB/s 320GB/s GPU Peer to Peer PCIe Gen3 / NVLink PCIe Gen3 Power 300W 70W Configuration Example #1 Standard GPU Server with 3 types of DL workload = 16 Users

Light DL VM User:GPU=8:1

Heavy DL VM Midrange DL VM User:GPU=2:1 User:GPU=4:1

Hypervisor

NVIDIA Virtualization Software

NVIDIA Tesla V100 (x3) GPU Server Configuration Example #2 High Density GPU Server with 4 types of DL workload = 9 Users

DL Deployment VM DL Deployment VM Heavy DL VM Heavy DL VM User:GPU=1:2* User:GPU=1:4* User:GPU=2:1 User:GPU=1:1

Hypervisor

NVIDIA Virtualization Software

NVIDIA Tesla V100 (x8) GPU Server * Multi-vGPU support from NVIDIA Virtual GPU 7.0 (Oct.2018) Performance Estimate

Images/Sec 4500 4121 4000

3500

3000

2500 2134 2000 Titan V 1181 / s 1500 GF 1080 TI 513.6 / s 1140 GF 1070 1000 GF 1050 292.9 / s 621 91.4 / s 500 282 111 0 Light DL DT (8:1) Mid DL DT (4:1) Heavy DL DT (2:1) Heavy DL DT (1:1) DL Deploy VM (2:1) DL Deploy VM (4:1)

8xXeon Gold 6126 vCPU / 12GB vMem / V100D-4Q, 8Q, 16Q, 32Q, 2x32Q, 4x32Q Tensorflow benchmark using Resnet 152, batch size 64 to 512, FP16 with TensorCore Result measured by trained image per second, Average value from concurrent test run Before / After

GPU Servers for Deployment Virtual GPU Cluster

1:2 1:4 vGPU vGPU VM VM DL Deployment VMs 2:1 2:1 1:1 vGPU vGPU vGPU VM VM VM Resource Manager / Monitor / Automation Heavy DL VMs 4:1 4:1 4:1 4:1 vGPU vGPU vGPU vGPU GPU DTs for Researcher VM VM VM VM GPU (%) GPU (%) Mid DL VMs 8:1 8:1 8:1 8:1 8:1 8:1 8:1 8:1 vGPU vGPU vGPU vGPU vGPU vGPU vGPU vGPU VM VM VM VM VM VM VM VM Light DL VMs Out of Service More to come

SCALABILITY STABILITY Multi-vGPU for all / Multi- High availability Node ECC & Page Retirement NVLINK / GPU Direct RDMA

PERFORMANCE ENHANCEMENT Virtual GPU for Compute Next gen. GPU Support Virtual GPU Test Drive https://www.nvidia.com/tryvgpu

NVIDIA Virtual GPU Website www.nvidia.com/virtualgpu

NVIDIA Virtual GPU YouTube Channel http://tinyurl.com/gridvideos

Reference Questions? Ask on our Forums https://gridforums.nvidia.com

NVIDIA Virtual GPU on LinkedIn http://linkd.in/QG4A6u

Follow us on Twitter @NVIDIAVirt Technical Question | Sales Inquiry [email protected] [email protected] Questions? SEOUL | NOVEMBER 7 - 8,2018 www.nvidia.com/ko-kr/ai-conference/