USING CONTAINERS FOR GPU APPLICATIONS Jonathan Calmels, Felix Abecassis, 05/08/2017 OUR TEAM Core Container Technologies

Enable GPUs in the container ecosystem: • Monitoring • Orchestration • Images • Runtime • OS

2 CHALLENGES A Typical Cluster

Ubuntu 14.04 CentOS 7 Ubuntu 16.04 Drivers 367 Drivers 361 Drivers 375 4x Maxwell 4x Kepler 8x Pascal

CUDA 7.5 CUDA 7.0 cuDNN 4 CUDA 7.5 CUDA 8.0 cuDNN 6

Patches

3 CONTAINERS To the rescue

Portable and reproducible builds Ease of deployment Isolation of resources Run across heterogeneous CUDA toolkit environments (sharing the host driver) Bare Metal Performance Facilitate collaboration

4 VIRTUAL MACHINES VS CONTAINERS Not so similar

5 6 7 NVIDIA-DOCKER github.com/NVIDIA/nvidia-docker

8 DOCKERHUB IMAGES Multiple flavors

Ubuntu Ubuntu 14.04 16.04

CUDA 8.0 CUDA 8.0 runtime runtime

cuDNN v5 CUDA 8.0 cuDNN v5 CUDA 8.0 runtime devel runtime devel

NVIDIA/Caffe cuDNN v5 cuDNN v5 cuDNN v6 0.15.13 devel devel devel

DIGITS CNTK TensorFlow PyTorch 5.0

9 CONTAINERS ON NVIDIA DGX-1 NVIDIA’s Deep Learning System

10 NVIDIA-DOCKER 1.0 Internals

GPU information

http cuda+nvml nvidia driver nvidia-docker nvidia-docker-plugin

http+unix docker dockerd http+unix

container process

$ NV_GPU=0 nvidia-docker run -ti nvidia/cuda 11 LIMITS OF NVIDIA-DOCKER 1.0 Limited scope

Only for Docker CLI Docker plugins are difficult to manage Not extensible (OpenGL, Vulkan, InfiniBand, KVM, etc.) Challenging to support new architectures (Power, ARM) Difficult to integrate into the container ecosystem

12 DOCKER COMPONENTS

13 WHAT’S A CONTAINER Kernel building blocks

Init system Netlink Namespaces Capabilities BPF UnionFS LSM KVM ...

14 LIBNVIDIA-CONTAINER github.com/NVIDIA/libnvidia-container

Integrates with the container internals Agnostic of the container runtime Drop-in GPU support for runtime developers Better stability, follows driver releases Brings features seamlessly (Graphics, Display, Exclusive mode, VM, etc.)

15 NVIDIA-DOCKER 2.0 Internals

docker nvidia-runc nvidia-oci-runtime

http(s)(+unix)

container process libnvidia-container dockerd

cuda+nvml grpc+unix nvidia driver docker-containerd + shim

$ docker run -ti -e NVIDIA_VISIBLE_DEVICES=0 --runtime=nvidia nvidia/cuda 16 DEMO!

17 CONTAINER FUTURE Enable GPUs everywhere nvidia-docker 2.0 release Multi-arch support (Power, ARM) Support other container runtimes (LXC/LXD, Rkt) Additional Docker images Additional features (OpenGL, Vulkan, InfiniBand, KVM, etc.) Support for GPU monitoring (cAdvisor)

18 QUESTIONS?

19