Accelerate Your AI cloud With SUSE Linux Enterprise Server: A Perspective

DEV-1158

Liang Yan, Senior Virtualization Engineer [email protected]

1 Outline

1. Background Cloud & AI Hardware Accelerator 2. GPU virtualization 3. FPGA virtualization 4. AI chips virtualization 5. Conclusion 6. Q&A

2 Liang Yan

Virtualization Area for 8 years

– IO Virtualization(GPU, Network)

– Memory Virtualization Optimization

New Arch (ARM64/S390x)

Work at home for 3 years

– Louisville Kentucky, USA

3 Background

4 Cloud

Cloud is a default option today.

– Public Cloud: 4-50 nodes is a good range

– Private Cloud:

– Hybrid Cloud 5 AI

AI is everywhere today. AI / /

Autonomous vehicles: Tesla, Uber, Waymo Healthcare CoronaVirus Finance Hedge Founding, four meltdown Electronic Commerce: Delivery Transport, Recommend Sale Language Translations Face/Image/Sound Recognition

6 AI And Cloud

Gartner Report on Cloud market: 2017: 145 billions 2018: 175 billions 2019: 206 billions

Gartner Report on AI market:

7 AI

https://www.zhihu.com/question/57770020 8 Deep Learning

Deep Learning is a branch of machine learning in which the models (typically neural networks) are graphed like “deep” structures with multiple layers. Deep learning is used to learn features and patterns that best represent .

It achieves an algorithm not designed by human. The more data and layer it has, the more accurate results it will get.

Artificial neural networks DNN => CNN => RNN Model

9 Deep Learning

Image source: Shutterstock

10 Deep Learning

People actually do not develop model directly, A lot of Frameworks exist today! •TensorFlow • PyTorch • MxNet • Keras • Theano

Workflow : Framework IR(Intermediate Representation) Backend API Backend

Backend support: Library: CUDA/OpenCL/Native Implementation Vendor: GPU, FPGA, TPU

11 We Need An AI Cloud...

Because whole AI stack is complicated and setup is painful Because it is usually integrated with other exist systems Because it usually comes with huge data

But…

We still want everything work smoothly and efficiently

12 Deep Learning / Neural Network

GEMM, General Matrix to Matrix Multiplication

for(i = 0; i < n; i++) { for(j = 0; j < n; j++) { C[i][j] = 0; for(k = 0; k < n; k++) C[i][j] += A[i][k] * B[k][j]; } }

Think about millions of layers and parameters

13 AI Accelerator

GPUs ● ● AMD ●

FPGA: Field Programmable Gate Array Software Algorithm Oriented Xilinux, Altera(acquired by intel in 2015 as $16.7 B) Support OpenCL and HSL(High Level Synthesis) 2011, Altera introduced OpenCL for FPGA, CNN algorithms

AI chip(ASIC-Application Specific Integrated Circuits) ASIC specific for DL algorithms

14 AI Cloud

GPU/FPGA As a Service

GPU: Google Colaboratory Paperspace Gradient FloydHub Workspace Lambda GPU Cloud AWS Deep Learning AMIs GCP Deep Learning VM Images

FPGA: AWS+Xlinux Ali Cloud + Intel Altera MS + Intel Altera AI chips are more used in Mobile/Edge end for inference

15 GPU Virtualization

16 Full GPU Virtualization

As a cloud resident, it should have moderate multiplexing capability Run native graphics driver in VM Achieve good performance

● Split Time Slices Framebuffer memory ● Isolate Give a neat access between VM and Host Physical Device IOMMU/Mdev and VFIO DMA Interrupt ● Schedule Efficient and Robust Pretty fix for AMD, More flexible for NVIDIA, RR, BOND

17 Full GPU Virtualization

Nvidia Mdev + vGPU ~80% Tesla Series: Volta Pascal Maxwell M6 M10 M60 P4 P6 P40 P100 V100 Tensor , cuDNN Inference production: Jetson Nano, Xavier, TX2 http://www.nvidia.com/object/grid-certified-servers.html

AMD SRIOV + vGPU ~90% less code Radeon Instinct MI25, MI50 ROCm https://lists.freedesktop.org/archives/amd-gfx/2016-December/004075.html

Intel Mdev + vGPU ~80% share VRAM Haswell(3VMs) Broadwell(7VMs) Skylake, Kaby Lake Dedicated GPU next year OpenVINO(Open Visual Inferencing and Neural Network Optimization) https://github.com/intel/gvt-linux/wiki

18 Full GPU Virtualization - SUSE

SUSE Status

- Intel KVMGT technical ready(Evaluation Version is ready) SLE12-SP4~ SLE15~ - Nvidia vGPU technical ready(Evaluation Version is ready) SLE15~ SLE12-SP5~ - AMD MxGPU on going (Trial is available) SLE12-SP3~ SLE15~ - Even not official support yet, Setup consultant/assistant is available - AI VM image service is available

19 Full GPU Virtualization - SUSE

SUSE Status

GPU virtualization for CAASP(Container as a service platform)

- -docker-2.0

- vGPU for Kata (POC)

- AI container image service is available

20 FPGA Virtualization

21 FPGA: Field Programmable Gate Array xilinux announced the first fpga(xc2000) in 1985. It was used as prototype development by HDL(Verilog/VHDL). Now start to be used as complicate situation, like DL.

https://medium.com/@ckyrkou/what-are-fpgas-c9121ac2a7ae 22 FPGA Virtualization

FPGA virtualization is a technique that provides an abstraction to the FPGA hardware.

Three main directions: Overlays: a programmable architecture on top of an FPGA

Dynamic modules: PR(Partial Reconfiguration) Intel Altera Stratix 10 UltraScale

FPGA resource pool: MS catapult project Multiple Nodes, Cloud Oriented Catapult Project V3 Bing Network => NN FPGA resources pool 5670 servers with FPGAs over 15 countries

23 FPGA Virtualization

SUSE - Working on POC - Start work on Intel FPGA

Upstream - Linux kernel FPGA subsystem - Xilinx contributes Alveo FPGA Accelerator Drivers - Vendor keep releasing new hardware

It’s pretty new, but definitely an alternative option for some specific customer who want to use it in a private environment.

24 AI Chips

25 26 AI Chips Today

Designless-Fabless: Software Define Chip. Optimization is more specific, cost of hardware implementation is even cheaper than software way.

1. Mostly for Deep Learning/Neural Network MatrixMultply 2. Mostly for inference, Mobil/Edge End 3. Mostly for specific usage, no general platform like CUDA

27 AI Chips Future

Linux Hardware Accelerator Subsystem

ARM+ASIC Need a general platform(ARM NN)

AI Chips Software ecosystem

AI Chips Virtualization

28 SUSE Effort

Most of the hardware are not public, like TPU. We keep an close eye on upstream and vendors, try to enable as much as possible.

1. PCIE AI Chips 2. AI sticks 3. Some ARM NPU board Enablement Jetson Nano VIM Pro3 Google Edge TPU Sophon Edge

Working on POC for Virtualization Scenario, currently are passthrough only.

29 Conclusion

30 Heterogeneous Architecture : Work load is so heavy that computing load is even higher than data movement and hardware latency. Hardware implementation for software algorithm

New IO virtualization trends: (AWS Nitro) Device is focus on virtualization implementation(resource split, kernel bypass) Workload is re-assigned to Backend driver. Data plane between virtualization stack is much simple today(freeway).

31 SUSE Focus

1. Best Bridge: Keep close with Hardware Vendors and Customers: Enable New Feature, Official Tech Support, Friendly User Experience

2. Embedded Backend of Framework in our production Best OS above AI hardware Accelerators, provide a engine for AI stack

3. Focus on bottleneck, provide the best performance always Split Compute Task automatically Advanced IO virtualization techniques

32

Questions?

Thank you for the time!

Special Thanks to Raj Meel (Global Product Marketing Manager )

34 General Disclaimer

This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

35