Khronos API Landscape EVE Oct19
Total Page:16
File Type:pdf, Size:1020Kb
APIs for Accelerating Vision and Inferencing An Overview of Options and Trade-offs Neil Trevett Khronos President NVIDIA VP Developer Ecosystems [email protected] | @neilt3d This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 1 Khronos is an open, non-profit, member-driven industry consortium developing royalty-free standards, and vibrant ecosystems, to harness the power of silicon acceleration for demanding graphics rendering and computationally intensive applications such as inferencing and vision processing >150 Members ~ 40% US, 30% Europe, 30% Asia Some Khronos Standards Relevant to Embedded Vision and Inferencing Vision and Neural Network Heterogeneous Parallel C++ Parallel GPU Inferencing Runtime Exchange Format Parallel Compute Compute IR Compute Acceleration This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 2 Embedded Vision and Inferencing Acceleration Networks trained on Neural Network high-end desktop Training Data and cloud systems Training Trained Networks Compilation Ingestion Apps link to compiled code or Compiled Inferencing Application Vision inferencing library Code Engine Code Library Hardware Acceleration APIs Sensor Data Diverse Embedded DSP Hardware Dedicated (GPUs, DSPs, FPGAs) FPGA Hardware GPU This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 3 Neural Network Training Acceleration Neural Network Training Frameworks Neural Net Training Network training needs 32- Neural Network Neural Net Training bit floating-point precision NeuralNeural NetFrameworks Net Training Training Authoring Interchange Frameworks Training Data FrameworksFrameworks Neural Network cuDNN MIOpen clDNN Acceleration Libraries Desktop and Cloud Hardware Hardware Acceleration APIs TPU This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 4 Embedded Vision and Inferencing Acceleration Networks trained on Neural Network high-end desktop Training Data and cloud systems Training Trained Networks Compilation Ingestion Apps link to compiled code or Compiled Inferencing Application Vision inferencing library Code Engine Code Library Hardware Acceleration APIs Sensor Data Diverse Embedded DSP Hardware Dedicated (GPUs, DSPs, FPGAs) FPGA Hardware GPU This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 5 Neural Network Exchange Formats Before - Training and Inferencing Fragmentation After - NN Training and Inferencing Interoperability Training Framework 1 Inference Engine 1 Training Framework 1 Neural Inference Engine 1 Network Training Framework 2 Inference Engine 2 Training Framework 2 Inference Engine 2 Exchange Format Training Framework 3 Inference Engine 3 Training Framework 3 Inference Engine 3 Every Inferencing Engine needs a Common Optimization custom importer from every Framework and processing tools Two Neural Network Exchange Format Initiatives ONNX and NNEF Embedded Inferencing Import Training Interchange are Complementary ONNX moves quickly to track authoring Defined Specification Open Source Project framework updates NNEF provides a stable bridge from Multi-company Governance at Khronos Initiated by Facebook & Microsoft training into edge inferencing engines Stability for hardware deployment Software stack flexibility This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 6 NNEF and ONNX Industry Support NNEF V1.0 released in August 2018 ONNX 1.6 Released in September 2019 After positive industry feedback on Provisional Specification. Introduced support for Quantization Maintenance update issued in September 2019 ONNX Runtime being integrated with GPU inferencing Extensions to V1.0 released for expanded functionality engines such as NVIDIA TensorRT NNEF Working Group Participants ONNX Supporters This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 7 NNEF Tools Ecosystem Compound operations TensorFlow and Live captured by exporting TensorFlow Lite Imminent graph Python script Import/Export NNEF Model Zoo Now available on GitHub. Useful for checking that ingested NNEF produces ONNX Caffe and Import/Export Caffe2 acceptable results on target system Import/Export Files NNEF adopts a rigorous approach to design lifecycle Syntax OpenVX Especially important for safety-critical or Parser and Ingestion and mission-critical applications in automotive, Validator Execution industrial and infrastructure markets NNEF open source projects hosted on Khronos NNEF GitHub repository under Apache 2.0 https://github.com/KhronosGroup/NNEF-Tools This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 8 Embedded Vision and Inferencing Acceleration Networks trained on Neural Network high-end desktop Training Data and cloud systems Training Trained Networks Compilation Ingestion Apps link to compiled code or Compiled Inferencing Application Vision inferencing library Code Engine Code Library Hardware Acceleration APIs Sensor Data Diverse Embedded DSP Hardware Dedicated (GPUs, DSPs, FPGAs) FPGA Hardware GPU This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 9 Primary Machine Learning Compilers TensorFlow Graph, Caffe, Keras, MXNet, TensorFlow Graph, MXNet, PaddlePaddle, PyTorch, ONNX Import Formats ONNX PyTorch Keras, ONNX Front-end / IR NNVM / Relay IR nGraph / Stripe IR Glow Core / Glow IR XLA HLO LLVM, TPU IR, XLA IR OpenCL, LLVM, OpenCL, OpenCL TensorFlow Lite / NNAPI Output CUDA, Metal LLVM, CUDA LLVM (inc. HW accel) This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 10 Embedded NN Compilers ML Compiler Steps CEVA Deep Neural Network (CDNN) Cadence Xtensa Neural Network Compiler (XNNC) Consistent Steps 1.Import Trained Network Description 2. Apply graph-level optimizations e.g. node fusion, node lowering and memory tiling 3. Decompose to primitive instructions and emit programs for accelerated run-times Fast progress but still area of intense research If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and won’t need complex metacommands (combined primitive commands) like DirectML Google’s Multi-Level IR Enables multiple domain-specific dialects within a common framework Experimenting with emitting to Vulkan/SPIR-V This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 11 SPIR-V Ecosystem SPIR-V Khronos-defined cross-API IR Native graphics and parallel compute support GLSL HLSL Easily parsed/extended 32-bit stream Data object/control flow retained for effective code generation/translation Third party kernel and shader languages glslang DXC MSL C++ for HLSL SYCL for OpenCL in SPIRV-Cross OpenCL C OpenCL C++ ISO C++ clang GLSL Front-end Front-end Front-end Front-end SPIR-V (Dis)Assembler clspv SPIR-V Validator Khronos cooperating LLVM to SPIR-V closely with clang/LLVM Bi-directional LLVM Community Translators SPIRV-opt | SPIRV-remap Optimization Tools 3rd Party-hosted Open Source Projects Khronos-hosted EnvironmentIHV spec Driver for Open Source Projects each target APIRuntimes used to drive compilation https://github.com/KhronosGroup/SPIRV-Tools This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 12 Embedded Vision and Inferencing Acceleration Networks trained on Neural Network high-end desktop Training Data and cloud systems Training Trained Networks Compilation Ingestion Apps link to compiled code or Compiled Inferencing Application Vision inferencing library Code Engine Code Library Hardware Acceleration APIs Sensor Data Diverse Embedded DSP Hardware Dedicated (GPUs, DSPs, FPGAs) FPGA Hardware GPU This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 13 PC/Mobile Platform Inferencing Stacks Consistent Steps 1. Import and optimize trained NN model file 2. Ingest graph into ML runtime 3. Accelerate inferencing operations using underlying low-level API TensorFlow Lite Core ML Model Microsoft Windows Google Android Apple MacOS and iOS Windows Machine Learning (WinML) Neural Network API (NNAPI) CoreML https://docs.microsoft.com/en-us/windows/uwp/machine-learning/ https://developer.android.com/ndk/guides/neuralnetworks/ https://developer.apple.com/documentation/coreml This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2019 - Page 14 Inferencing Engines Acceleration APIs Cross-platform Inferencing Engines Platform Engines Android NNAPI Microsoft WinML Apple CoreML Desktop IHVs AMD MIVisionX over MIOpen Intel OpenVINO over clDNN Both provide NVIDIA TensorRT over cuDNN Inferencing AND Vision acceleration Mobile / Embedded Huawei Arm Compute Library Huawei MACE Almost all Embedded Qualcomm Neural Processing SDK Inferencing Engines use OpenCL to access Synopsis MetaWare EV Dev Toolkit accelerator silicon TI Deep Learning Library (TIDL) VeriSilicon VeriSilicon Acuity This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos®