Accelerate Your AI cloud With SUSE Linux Enterprise Server: A Virtualization Perspective
DEV-1158
Liang Yan, Senior Virtualization Engineer [email protected]
1 Outline
1. Background Cloud & AI Hardware Accelerator 2. GPU virtualization 3. FPGA virtualization 4. AI chips virtualization 5. Conclusion 6. Q&A
2 Liang Yan
Virtualization Area for 8 years
– IO Virtualization(GPU, Network)
– Memory Virtualization Optimization
New Arch (ARM64/S390x)
Work at home for 3 years
– Louisville Kentucky, USA
3 Background
4 Cloud
Cloud is a default option today.
– Public Cloud: 4-50 nodes is a good range
– Private Cloud:
– Hybrid Cloud 5 AI
AI is everywhere today. AI / Machine Learning / Deep Learning
Autonomous vehicles: Tesla, Uber, Waymo Healthcare CoronaVirus Finance Hedge Founding, four meltdown Electronic Commerce: Delivery Transport, Recommend Sale Language Translations Face/Image/Sound Recognition
6 AI And Cloud
Gartner Report on Cloud market: 2017: 145 billions 2018: 175 billions 2019: 206 billions
Gartner Report on AI market:
7 AI
https://www.zhihu.com/question/57770020 8 Deep Learning
Deep Learning is a branch of machine learning in which the models (typically neural networks) are graphed like “deep” structures with multiple layers. Deep learning is used to learn features and patterns that best represent data.
It achieves an algorithm not designed by human. The more data and layer it has, the more accurate results it will get.
Artificial neural networks DNN => CNN => RNN Model
9 Deep Learning
Image source: Shutterstock
10 Deep Learning
People actually do not develop model directly, A lot of Frameworks exist today! •TensorFlow • PyTorch • MxNet • Keras • Theano
Workflow : Framework IR(Intermediate Representation) Backend API Backend
Backend support: Library: CUDA/OpenCL/Native Implementation Vendor: GPU, FPGA, TPU
11 We Need An AI Cloud...
Because whole AI stack is complicated and setup is painful Because it is usually integrated with other exist systems Because it usually comes with huge data
But…
We still want everything work smoothly and efficiently
12 Deep Learning / Neural Network
GEMM, General Matrix to Matrix Multiplication
for(i = 0; i < n; i++) { for(j = 0; j < n; j++) { C[i][j] = 0; for(k = 0; k < n; k++) C[i][j] += A[i][k] * B[k][j]; } }
Think about millions of layers and parameters
13 AI Accelerator
FPGA: Field Programmable Gate Array Software Algorithm Oriented Xilinux, Altera(acquired by intel in 2015 as $16.7 B) Support OpenCL and HSL(High Level Synthesis) 2011, Altera introduced OpenCL for FPGA, CNN algorithms
AI chip(ASIC-Application Specific Integrated Circuits) ASIC specific for DL algorithms
14 AI Cloud
GPU/FPGA As a Service
GPU: Google Colaboratory Paperspace Gradient FloydHub Workspace Lambda GPU Cloud AWS Deep Learning AMIs GCP Deep Learning VM Images
FPGA: AWS+Xlinux Ali Cloud + Intel Altera MS + Intel Altera AI chips are more used in Mobile/Edge end for inference
15 GPU Virtualization
16 Full GPU Virtualization
As a cloud resident, it should have moderate multiplexing capability Run native graphics driver in VM Achieve good performance
● Split Time Slices Framebuffer memory ● Isolate Give a neat access between VM and Host Physical Device IOMMU/Mdev and VFIO DMA Interrupt ● Schedule Efficient and Robust Pretty fix for AMD, More flexible for NVIDIA, RR, BOND
17 Full GPU Virtualization
Nvidia Mdev + vGPU ~80% Tesla Series: Volta Pascal Maxwell M6 M10 M60 P4 P6 P40 P100 V100 Tensor processor, cuDNN Inference production: Jetson Nano, Xavier, TX2 http://www.nvidia.com/object/grid-certified-servers.html
AMD SRIOV + vGPU ~90% less code Radeon Instinct MI25, MI50 ROCm https://lists.freedesktop.org/archives/amd-gfx/2016-December/004075.html
Intel Mdev + vGPU ~80% share VRAM Haswell(3VMs) Broadwell(7VMs) Skylake, Kaby Lake Dedicated GPU next year OpenVINO(Open Visual Inferencing and Neural Network Optimization) https://github.com/intel/gvt-linux/wiki
18 Full GPU Virtualization - SUSE
SUSE Status
- Intel KVMGT technical ready(Evaluation Version is ready) SLE12-SP4~ SLE15~ - Nvidia vGPU technical ready(Evaluation Version is ready) SLE15~ SLE12-SP5~ - AMD MxGPU on going (Trial is available) SLE12-SP3~ SLE15~ - Even not official support yet, Setup consultant/assistant is available - AI VM image service is available
19 Full GPU Virtualization - SUSE
SUSE Status
GPU virtualization for CAASP(Container as a service platform)
- cuda-docker-2.0
- vGPU for Kata (POC)
- AI container image service is available
20 FPGA Virtualization
21 FPGA: Field Programmable Gate Array xilinux announced the first fpga(xc2000) in 1985. It was used as prototype development by HDL(Verilog/VHDL). Now start to be used as complicate situation, like DL.
https://medium.com/@ckyrkou/what-are-fpgas-c9121ac2a7ae 22 FPGA Virtualization
FPGA virtualization is a technique that provides an abstraction to the FPGA hardware.
Three main directions: Overlays: a programmable architecture on top of an FPGA
Dynamic modules: PR(Partial Reconfiguration) Intel Altera Stratix 10 Xilinx UltraScale
FPGA resource pool: MS catapult project Multiple Nodes, Cloud Oriented Catapult Project V3 Bing Network => NN FPGA resources pool 5670 servers with FPGAs over 15 countries
23 FPGA Virtualization
SUSE - Working on POC - Start work on Intel FPGA
Upstream - Linux kernel FPGA subsystem - Xilinx contributes Alveo FPGA Accelerator Drivers - Vendor keep releasing new hardware
It’s pretty new, but definitely an alternative option for some specific customer who want to use it in a private environment.
24 AI Chips
25 26 AI Chips Today
Designless-Fabless: Software Define Chip. Optimization is more specific, cost of hardware implementation is even cheaper than software way.
1. Mostly for Deep Learning/Neural Network MatrixMultply 2. Mostly for inference, Mobil/Edge End 3. Mostly for specific usage, no general platform like CUDA
27 AI Chips Future
Linux Hardware Accelerator Subsystem
ARM+ASIC Need a general platform(ARM NN)
AI Chips Software ecosystem
AI Chips Virtualization
28 SUSE Effort
Most of the hardware are not public, like TPU. We keep an close eye on upstream and vendors, try to enable as much as possible.
1. PCIE AI Chips 2. AI sticks 3. Some ARM NPU board Enablement Jetson Nano VIM Pro3 Google Edge TPU Sophon Edge
Working on POC for Virtualization Scenario, currently are passthrough only.
29 Conclusion
30 Heterogeneous Architecture Hardware Acceleration: Work load is so heavy that computing load is even higher than data movement and hardware latency. Hardware implementation for software algorithm
New IO virtualization trends: (AWS Nitro) Device is focus on virtualization implementation(resource split, kernel bypass) Workload is re-assigned to Backend driver. Data plane between virtualization stack is much simple today(freeway).
31 SUSE Focus
1. Best Bridge: Keep close with Hardware Vendors and Customers: Enable New Feature, Official Tech Support, Friendly User Experience
2. Embedded Backend of Framework in our production Best OS above AI hardware Accelerators, provide a engine for AI stack
3. Focus on bottleneck, provide the best performance always Split Compute Task automatically Advanced IO virtualization techniques
32
Questions?
Thank you for the time!
Special Thanks to Raj Meel (Global Product Marketing Manager )
34 General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.
35