USING CONTAINERS FOR GPU APPLICATIONS Jonathan Calmels, Felix Abecassis, 05/08/2017 OUR TEAM Core Container Technologies
Enable GPUs in the container ecosystem: • Monitoring • Orchestration • Images • Runtime • OS
2 CHALLENGES A Typical Cluster
Ubuntu 14.04 CentOS 7 Ubuntu 16.04 Drivers 367 Drivers 361 Drivers 375 4x Maxwell 4x Kepler 8x Pascal
CUDA 7.5 CUDA 7.0 cuDNN 4 CUDA 7.5 CUDA 8.0 cuDNN 6
Patches
3 CONTAINERS To the rescue
Portable and reproducible builds Ease of deployment Isolation of resources Run across heterogeneous CUDA toolkit environments (sharing the host driver) Bare Metal Performance Facilitate collaboration
4 VIRTUAL MACHINES VS CONTAINERS Not so similar
5 6 7 NVIDIA-DOCKER github.com/NVIDIA/nvidia-docker
8 DOCKERHUB IMAGES Multiple flavors
Ubuntu Ubuntu 14.04 16.04
CUDA 8.0 CUDA 8.0 runtime runtime
cuDNN v5 CUDA 8.0 cuDNN v5 CUDA 8.0 runtime devel runtime devel
NVIDIA/Caffe cuDNN v5 cuDNN v5 cuDNN v6 0.15.13 devel devel devel
DIGITS CNTK TensorFlow PyTorch 5.0
9 CONTAINERS ON NVIDIA DGX-1 NVIDIA’s Deep Learning System
10 NVIDIA-DOCKER 1.0 Internals
GPU information
http cuda+nvml nvidia driver nvidia-docker nvidia-docker-plugin
http+unix docker dockerd http+unix
container process
$ NV_GPU=0 nvidia-docker run -ti nvidia/cuda 11 LIMITS OF NVIDIA-DOCKER 1.0 Limited scope
Only for Docker CLI Docker plugins are difficult to manage Not extensible (OpenGL, Vulkan, InfiniBand, KVM, etc.) Challenging to support new architectures (Power, ARM) Difficult to integrate into the container ecosystem
12 DOCKER COMPONENTS
13 WHAT’S A CONTAINER Kernel building blocks
Init system Netlink Namespaces Capabilities Cgroups Seccomp BPF UnionFS LSM KVM Netfilter ...
14 LIBNVIDIA-CONTAINER github.com/NVIDIA/libnvidia-container
Integrates with the container internals Agnostic of the container runtime Drop-in GPU support for runtime developers Better stability, follows driver releases Brings features seamlessly (Graphics, Display, Exclusive mode, VM, etc.)
15 NVIDIA-DOCKER 2.0 Internals
docker nvidia-runc nvidia-oci-runtime
http(s)(+unix)
container process libnvidia-container dockerd
cuda+nvml grpc+unix nvidia driver docker-containerd + shim
$ docker run -ti -e NVIDIA_VISIBLE_DEVICES=0 --runtime=nvidia nvidia/cuda 16 DEMO!
17 CONTAINER FUTURE Enable GPUs everywhere nvidia-docker 2.0 release Multi-arch support (Power, ARM) Support other container runtimes (LXC/LXD, Rkt) Additional Docker images Additional features (OpenGL, Vulkan, InfiniBand, KVM, etc.) Support for GPU monitoring (cAdvisor)
18 QUESTIONS?
19