Standards for Vision Processing and Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD [email protected] © Copyright Khronos Group 2017 - Page 1 Agenda • Why we need a standard? • Khronos NNEF • Khronos OpenVX dog Network Architecture Pre-trained Network Model (weights, …) © Copyright Khronos Group 2017 - Page 2 Neural Network End-to-End Workflow Neural Network Third Vision/AI Party Applications Training Frameworks Tools Datasets Trained Vision and Neural Network Network Inferencing Runtime Network Model Architecture Desktop and Cloud Hardware Embedded/Mobile Embedded/MobileEmbedded/Mobile Embedded/Mobile/Desktop/CloudVision/InferencingVision/Inferencing Hardware Hardware cuDNN MIOpen MKL-DNN Vision/InferencingVision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA © Copyright Khronos Group 2017 - Page 3 Problem: Neural Network Fragmentation Neural Network Training and Inferencing Fragmentation NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator Neural Network Inferencing Fragmentation toll on Applications Inference Engine 1 Hardware 1 Vision/AI Inference Engine 2 Hardware 2 Application Inference Engine 3 Hardware 3 Every Application Needs know about Every Accelerator API © Copyright Khronos Group 2017 - Page 4 Khronos APIs Connect Software to Silicon Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, Virtual and Augmented Reality, Parallel Computing, Vision Processing and Neural Networks © Copyright Khronos Group 2017 - Page 5 NNEF - Solving Neural Network Fragmentation Before NNEF – NN Training and Inferencing Fragmentation NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator With NNEF- NN Training and Inferencing Interoperability NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Optimization and processing tools NNEF is a Cross-vendor Neural Net file format Encapsulates network formal semantics, structure, data formats, commonly-used operations (such as convolution, pooling, normalization, etc.) © Copyright Khronos Group 2017 - Page 6 OpenVX - Solving Inferencing Fragmentation Before OpenVX – Vision and NN Inferencing Fragmentation Inference Engine 1 Hardware 1 Vision/AI Inference Engine 2 Hardware 2 Application Inference Engine 3 Hardware 3 Every Application Needs know about Every Accelerator API With OpenVX – Vision and NN Inferencing Interoperability Inference Engine 1 Hardware 1 Vision/AI Inference Engine 2 Hardware 2 Application Inference Engine 3 Hardware 3 OpenVX is an open, royalty-free standard for cross platform acceleration of computer vision and neural network applications. © Copyright Khronos Group 2017 - Page 7 Neural Net Workflow with Khronos Standards Neural Network Third Vision/AI Party Applications Training Frameworks Tools Datasets Trained Vision and Neural Network Network Inferencing Runtime Network Model Architecture Desktop and Cloud Hardware Embedded/Mobile Embedded/MobileEmbedded/Mobile Embedded/Mobile/Desktop/CloudVision/InferencingVision/Inferencing Hardware Hardware cuDNN MIOpen MKL-DNN Vision/InferencingVision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA © Copyright Khronos Group 2017 - Page 8 Application Development Workflow Vision/AI Applications Structure plus weights (quantized or not). Trained Can be augmented with private data Format Graph Inference Compiler Engine Training Framework Vendor Structure only Format OR Third Structure plus weights Party (quantized or not) Tools © Copyright Khronos Group 2017 - Page 9 NNEF Status and Roadmap • V1.0 is under development, industry comments are being sought now - NNEF has formed an advisory panel, you are invited today to participate • First version will focus on interface between framework and embedded inference engines - But will allow training as secondary goal • Support ‘First cut’ range of network types - Field is moving very fast but we aim to keep up with developments • NNEF Roadmap - Track development of new network types - Allow authoring and retraining (3rd party tools) - Address a wider range of applications (outside vision apps) - Increase the expressive power of the format © Copyright Khronos Group 2017 - Page 10 Khronos Advisory Panels Specification drafts and Requirements and invitations for feedback on requirements and feedback specification drafts Working Shared Email Advisory list and Group Repository Panel Hosted by Khronos. Khronos Members Under Khronos NDA Khronos Advisors Any company can join Invited industry experts Membership Fee $0 Cost Sign NDA and IP Framework Sign NDA and IP Framework Advisory Panels Active for NNEF, Vulkan, and OpenCL/SYCL © Copyright Khronos Group 2017 - Page 11 An open, royalty-free standard for cross platform acceleration of computer vision and neural network applications. © Copyright Khronos Group 2017 - Page 12 OpenVX Vision Engines Wide range of vision hardware architectures Middleware OpenVX provides a high-level Graph-based abstraction -> Applications Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! -> Portable, Efficient Vision Processing! Software GPU Portability DSP Hardware X100 Dedicated Hardware Vision DSPs X10 Vision GPU Node Compute Vision Vision Node Node Multi-core CNN CPU Nodes Power Efficiency Power X1 Computation Flexibility Vision Processing Graph © Copyright Khronos Group 2017 - Page 13 OpenVX - Graph-Level Abstraction • OpenVX developers express a graph of image operations (‘Nodes’) - Using a C API • Nodes can be executed on any hardware or processor coded in any language - Implementers can optimize under the high-level graph abstraction • Graphs are the key to run-time power and performance optimizations - E.g. Node fusion, tiled graph processing for cache efficiency etc. Pyr Camera OpenVX Nodes t Rendering Input Output RGB YUV Gray Array of Frame Color Frame Channel Frame Image Optical Keypoints Harris Conversion Extract Pyramid Flow Track Array of Features Ftr OpenVX Graph t-1 Feature Extraction Example Graph © Copyright Khronos Group 2017 - Page 14 OpenVX Efficiency through Graphs.. Graph Memory Kernel Data Scheduling Management Fusion Tiling Split the graph Execute a sub- Reuse execution across Replace a sub- graph at tile pre-allocated the whole graph with a granularity memory for system: CPU / single faster node instead of image multiple GPU / dedicated granularity intermediate data HW Faster execution Less allocation overhead, Better memory Better use of or lower power more memory for locality, less kernel data cache and consumption other applications launch overhead local memory © Copyright Khronos Group 2017 - Page 15 Simple Edge Detector in OpenVX vx_graph g = vxCreateGraph(); vx_image input = vxCreateImage(1920, 1080); Declare Input and Output Images vx_image output = vxCreateImage(1920, 1080); vx_image horiz = vxCreateVirtualImage(g); Declare Intermediate Images vx_image vert = vxCreateVirtualImage(g); vx_image mag = vxCreateVirtualImage(g); vxSobel3x3Node(g, input, horiz, vert); Construct the Graph topology vxMagnitudeNode(g, horiz, vert, mag); vxThresholdNode(g, mag, THRESH, output); Compile the Graph Execute the Graph status = vxVerifyGraph(g); h status = vxProcessGraph(g); i S M m T o v © Copyright Khronos Group 2017 - Page 16 OpenVX Evolution Conformant Conformant Implementations Implementations New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations New Functionality Extensions Under Discussion Neural Network Acceleration New Functionality NNEF Import Expanded Nodes Functionality Graph Save and Restore 16-bit image operation Enhanced Graph Framework Programmable user AMD OpenVX Tools kernels with accelerator offload - Open source, highly optimized Safety Critical for x86 CPU and OpenCL for GPU OpenVX 1.1 SC for - “Graph Optimizer” looks at Streaming/pipelining safety-certifiable systems entire processing pipeline and removes, replaces, merges functions to improve performance OpenVX and bandwidth Roadmap - Scripting for rapid prototyping, OpenVX 1.2 without re-compiling, at production performance levels Spec released May 2017 http://gpuopen.com/compute-product/amd-openvx/ OpenVX 1.0 OpenVX 1.1 Spec released October 2014 Spec released May 2016 © Copyright Khronos Group 2017 - Page 17 AMD’s open-source implementation • Highly-optimized for x86 CPU and OpenCL GPU • Available on github.com http://github.com/GPUOpen-ProfessionalCompute-Libraries/amdovx-modules • Additional modules • LOOM: highly optimized library for real-time 360 degree video stitching • NN: neural network module built on top MIOpen (OpenCL-based machine learning primitives) Includes a tool to import pre-trained Caffe models into NN (develop branch) … © Copyright Khronos Group 2017 - Page 18 New OpenVX 1.2 Functions • Feature detection: find features useful for object detection and recognition - Histogram of gradients – HOG • Template matching - Local binary patterns – LBP • Line finding • Classification: detect and recognize objects in an image based on a set of features - Import a classifier model trained offline - Classify objects based on a set of input features • Image Processing: transform an