FPGA Virtualization 4

Accelerate Your AI cloud With SUSE Linux Enterprise Server: A Virtualization Perspective DEV-1158 Liang Yan, Senior Virtualization Engineer [email protected] 1 Outline 1. Background Cloud & AI Hardware Accelerator 2. GPU virtualization 3. FPGA virtualization 4. AI chips virtualization 5. Conclusion 6. Q&A 2 Liang Yan Virtualization Area for 8 years – IO Virtualization(GPU, Network) – Memory Virtualization Optimization New Arch (ARM64/S390x) Work at home for 3 years – Louisville Kentucky, USA 3 Background 4 Cloud Cloud is a default option today. – Public Cloud: 4-50 nodes is a good range – Private Cloud: – Hybrid Cloud 5 AI AI is everywhere today. AI / Machine Learning / Deep Learning Autonomous vehicles: Tesla, Uber, Waymo Healthcare CoronaVirus Finance Hedge Founding, four meltdown Electronic Commerce: Delivery Transport, Recommend Sale Language Translations Face/Image/Sound Recognition 6 AI And Cloud Gartner Report on Cloud market: 2017: 145 billions 2018: 175 billions 2019: 206 billions Gartner Report on AI market: 7 AI https://www.zhihu.com/question/57770020 8 Deep Learning Deep Learning is a branch of machine learning in which the models (typically neural networks) are graphed like “deep” structures with multiple layers. Deep learning is used to learn features and patterns that best represent data. It achieves an algorithm not designed by human. The more data and layer it has, the more accurate results it will get. Artificial neural networks DNN => CNN => RNN Model 9 Deep Learning Image source: Shutterstock 10 Deep Learning People actually do not develop model directly, A lot of Frameworks exist today! •TensorFlow • PyTorch • MxNet • Keras • Theano Workflow : Framework IR(Intermediate Representation) Backend API Backend Backend support: Library: CUDA/OpenCL/Native Implementation Vendor: GPU, FPGA, TPU 11 We Need An AI Cloud... Because whole AI stack is complicated and setup is painful Because it is usually integrated with other exist systems Because it usually comes with huge data But… We still want everything work smoothly and efficiently 12 Deep Learning / Neural Network GEMM, General Matrix to Matrix Multiplication for(i = 0; i < n; i++) { for(j = 0; j < n; j++) { C[i][j] = 0; for(k = 0; k < n; k++) C[i][j] += A[i][k] * B[k][j]; } } Think about millions of layers and parameters 13 AI Accelerator GPUs ● Nvidia ● AMD ● Intel FPGA: Field Programmable Gate Array Software Algorithm Oriented Xilinux, Altera(acquired by intel in 2015 as $16.7 B) Support OpenCL and HSL(High Level Synthesis) 2011, Altera introduced OpenCL for FPGA, CNN algorithms AI chip(ASIC-Application Specific Integrated Circuits) ASIC specific for DL algorithms 14 AI Cloud GPU/FPGA As a Service GPU: Google Colaboratory Paperspace Gradient FloydHub Workspace Lambda GPU Cloud AWS Deep Learning AMIs GCP Deep Learning VM Images FPGA: AWS+Xlinux Ali Cloud + Intel Altera MS + Intel Altera AI chips are more used in Mobile/Edge end for inference 15 GPU Virtualization 16 Full GPU Virtualization As a cloud resident, it should have moderate multiplexing capability Run native graphics driver in VM Achieve good performance ● Split Time Slices Framebuffer memory ● Isolate Give a neat access between VM and Host Physical Device IOMMU/Mdev and VFIO DMA Interrupt ● Schedule Efficient and Robust Pretty fix for AMD, More flexible for NVIDIA, RR, BOND 17 Full GPU Virtualization Nvidia Mdev + vGPU ~80% Tesla Series: Volta Pascal Maxwell M6 M10 M60 P4 P6 P40 P100 V100 Tensor processor, cuDNN Inference production: Jetson Nano, Xavier, TX2 http://www.nvidia.com/object/grid-certified-servers.html AMD SRIOV + vGPU ~90% less code Radeon Instinct MI25, MI50 ROCm https://lists.freedesktop.org/archives/amd-gfx/2016-December/004075.html Intel Mdev + vGPU ~80% share VRAM Haswell(3VMs) Broadwell(7VMs) Skylake, Kaby Lake Dedicated GPU next year OpenVINO(Open Visual Inferencing and Neural Network Optimization) https://github.com/intel/gvt-linux/wiki 18 Full GPU Virtualization - SUSE SUSE Status - Intel KVMGT technical ready(Evaluation Version is ready) SLE12-SP4~ SLE15~ - Nvidia vGPU technical ready(Evaluation Version is ready) SLE15~ SLE12-SP5~ - AMD MxGPU on going (Trial is available) SLE12-SP3~ SLE15~ - Even not official support yet, Setup consultant/assistant is available - AI VM image service is available 19 Full GPU Virtualization - SUSE SUSE Status GPU virtualization for CAASP(Container as a service platform) - cuda-docker-2.0 - vGPU for Kata (POC) - AI container image service is available 20 FPGA Virtualization 21 FPGA: Field Programmable Gate Array xilinux announced the first fpga(xc2000) in 1985. It was used as prototype development by HDL(Verilog/VHDL). Now start to be used as complicate situation, like DL. https://medium.com/@ckyrkou/what-are-fpgas-c9121ac2a7ae 22 FPGA Virtualization FPGA virtualization is a technique that provides an abstraction to the FPGA hardware. Three main directions: Overlays: a programmable architecture on top of an FPGA Dynamic modules: PR(Partial Reconfiguration) Intel Altera Stratix 10 Xilinx UltraScale FPGA resource pool: MS catapult project Multiple Nodes, Cloud Oriented Catapult Project V3 Bing Network => NN FPGA resources pool 5670 servers with FPGAs over 15 countries 23 FPGA Virtualization SUSE - Working on POC - Start work on Intel FPGA Upstream - Linux kernel FPGA subsystem - Xilinx contributes Alveo FPGA Accelerator Drivers - Vendor keep releasing new hardware It’s pretty new, but definitely an alternative option for some specific customer who want to use it in a private environment. 24 AI Chips 25 26 AI Chips Today Designless-Fabless: Software Define Chip. Optimization is more specific, cost of hardware implementation is even cheaper than software way. 1. Mostly for Deep Learning/Neural Network MatrixMultply 2. Mostly for inference, Mobil/Edge End 3. Mostly for specific usage, no general platform like CUDA 27 AI Chips Future Linux Hardware Accelerator Subsystem ARM+ASIC Need a general platform(ARM NN) AI Chips Software ecosystem AI Chips Virtualization 28 SUSE Effort Most of the hardware are not public, like TPU. We keep an close eye on upstream and vendors, try to enable as much as possible. 1. PCIE AI Chips 2. AI sticks 3. Some ARM NPU board Enablement Jetson Nano VIM Pro3 Google Edge TPU Sophon Edge Working on POC for Virtualization Scenario, currently are passthrough only. 29 Conclusion 30 Heterogeneous Architecture Hardware Acceleration: Work load is so heavy that computing load is even higher than data movement and hardware latency. Hardware implementation for software algorithm New IO virtualization trends: (AWS Nitro) Device is focus on virtualization implementation(resource split, kernel bypass) Workload is re-assigned to Backend driver. Data plane between virtualization stack is much simple today(freeway). 31 SUSE Focus 1. Best Bridge: Keep close with Hardware Vendors and Customers: Enable New Feature, Official Tech Support, Friendly User Experience 2. Embedded Backend of Framework in our production Best OS above AI hardware Accelerators, provide a engine for AI stack 3. Focus on bottleneck, provide the best performance always Split Compute Task automatically Advanced IO virtualization techniques 32 Questions? Thank you for the time! Special Thanks to Raj Meel (Global Product Marketing Manager ) 34 General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners. 35.

Load more