Khronos Openvx Webinar Sep2017

The OpenVX Standard for Portable, Efficient Vision Processing Frank Brill Radha Giduthuri Tom e r Schwartz Cadence AMD Intel September 2017 © Copyright Khronos Group 2017 - Page 1 Khronos Connects Software to Silicon Industry Consortium creating OPEN STANDARD APIs for hardware acceleration Any company is welcome – one company one vote ROYALTY-FREE specifications Conformance Tests and Adopters State-of-the art IP framework protects Software Programs for specification integrity members AND the standards and cross-vendor portability Low-level silicon APIs needed on International, non-profit organization every platform: graphics, parallel Membership and Adopters fees cover compute, vision, neural networks, Silicon operating and engineering expenses augmented and virtual reality… Strong industry momentum 100s of man years invested by industry experts Well over a BILLION people use Khronos APIs Every Day… © Copyright Khronos Group 2017 - Page 2 Khronos Open Standards 3D for the Web - VR/AR and games in-browser - Efficiently delivering runtime 3D assets Real-time 2D/3D - Virtual and Augmented Reality displays - Cross-platform gaming and UI - CAD and Product Design Vision and Neural Networks - Tracking and odometry - Scene analysis/understanding - Neural Network inferencing Parallel Computation - Machine Learning acceleration - Embedded vision processing - High Performance Computing (HPC) © Copyright Khronos Group 2017 - Page 3 OpenVX Vision Engines Wide range of vision hardware architectures Middleware OpenVX provides a high-level Graph-based abstraction -> Applications Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! -> Portable, Efficient Vision Processing! Software GPU Portability DSP Hardware X100 Dedicated Hardware Vision DSPs X10 Vision GPU Node Compute Vision Vision Node Node Multi-core CNN CPU Nodes Power Efficiency X1 Computation Flexibility Vision Processing Graph © Copyright Khronos Group 2017 - Page 4 Neural Net Workflow with Khronos Standards Neural Network Training Third Vision/AI Party Applications Frameworks Tools Datasets Trained Vision and Neural Network Network Inferencing Runtime Network Model Structure Desktop and Cloud Hardware Embedded/Mobile Vision/Inferencing Embedded/Mobile Vision/Inferencing Embedded/MobileHardware Vision/Inferencing Embedded/Mobile/HardwareDesktop/Cloud Hardware OpenVX is an open, royalty-free standard for cross platform acceleration Vision/Inferencing Hardware of computer vision and neural network applications. GPU DSP CPU NNEF is a Cross-vendor Neural Net file format Custom FPGA Encapsulates network formal semantics, structure, data formats, commonly-used operations (such as convolution, pooling, normalization, etc.) © Copyright Khronos Group 2017 - Page 5 Problem: Neural Network Fragmentation Neural Network Inferencing Fragmentation toll on Applications Inference Engine 1 Hardware 1 Vision/AI Hardware 2 Application Inference Engine 2 Hardware 3 Every Application Needs know Inference Engine 3 about Every Accelerator API Neural Network Training and Inferencing Fragmentation NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator © Copyright Khronos Group 2017 - Page 6 Solving Neural Network Fragmentation Neural Network Inferencing Fragmentation toll on Applications Inference Engine 1 Hardware 1 Vision/AI Hardware 2 Application Inference Engine 2 Inference Engine 3 Hardware 3 Neural Network Training and Inferencing Fragmentation NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Optimization and processing tools © Copyright Khronos Group 2017 - Page 7 OpenVX in a nut-shell • OpenVX contains - a set of memory objects that abstract the physical memory - a library of predefined and customizable vision and neural network functions Image processing and image filtering Feature detectors Neural networks Control flow - a graph-based execution model to combine function enabling both task and data-independent execution • OpenVX is defined as a C API - object-oriented design - synchronous and asynchronous execution model - extend functionality using enum and callback functions © Copyright Khronos Group 2017 - Page 8 OpenVX - Graph-Level Abstraction • OpenVX developers express a graph of image operations (‘Nodes’) - Using a C API • Nodes can be executed on any hardware or processor coded in any language - Implementers can optimize under the high-level graph abstraction • Graphs are the key to run-time power and performance optimizations - E.g. Node fusion, tiled graph processing for cache efficiency etc. Pyr Camera OpenVX Nodes t Rendering Input Output RGB YUV Gray Array of Frame Color Frame Channel Frame Image Optical Keypoints Harris Conversion Extract Pyramid Flow Track Array of Features Ftr OpenVX Graph t-1 Feature Extraction Example Graph © Copyright Khronos Group 2017 - Page 9 OpenVX Efficiency through Graphs.. Graph Memory Kernel Data Scheduling Management Fusion Tiling Split the graph Execute a sub- Reuse execution across Replace a sub- graph at tile pre-allocated the whole graph with a granularity memory for system: CPU / single faster node instead of image multiple GPU / dedicated granularity intermediate data HW Faster execution Less allocation overhead, Better memory Better use of or lower power more memory for locality, less kernel data cache and consumption other applications launch overhead local memory © Copyright Khronos Group 2017 - Page 10 Simple Edge Detector in OpenVX vx_graph g = vxCreateGraph(); vx_image input = vxCreateImage(1920, 1080); Declare Input and Output Images vx_image output = vxCreateImage(1920, 1080); vx_image horiz = vxCreateVirtualImage(g); Declare Intermediate Images vx_image vert = vxCreateVirtualImage(g); vx_image mag = vxCreateVirtualImage(g); vxSobel3x3Node(g, input, horiz, vert); Construct the Graph topology vxMagnitudeNode(g, horiz, vert, mag); vxThresholdNode(g, mag, THRESH, output); Compile the Graph Execute the Graph status = vxVerifyGraph(g); h status = vxProcessGraph(g); i S M m T o v © Copyright Khronos Group 2017 - Page 11 OpenVX Evolution Conformant Conformant Implementations Implementations New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations New Functionality Extensions Under Discussion Neural Network Acceleration New Functionality NNEF Import Expanded Nodes Functionality Graph Save and Restore 16-bit image operation Enhanced Graph Framework Programmable user AMD OpenVX Tools kernels with accelerator offload - Open source, highly optimized Safety Critical for x86 CPU and OpenCL for GPU OpenVX 1.1 SC for - “Graph Optimizer” looks at Streaming/pipelining safety-certifiable systems entire processing pipeline and removes, replaces, merges functions to improve performance OpenVX and bandwidth Roadmap - Scripting for rapid prototyping, OpenVX 1.2 without re-compiling, at production performance levels Spec released May 2017 http://gpuopen.com/compute-product/amd-openvx/ OpenVX 1.0 OpenVX 1.1 Spec released October 2014 Spec released May 2016 © Copyright Khronos Group 2017 - Page 12 OpenVX SC - Safety Critical Vision Processing • OpenVX 1.1 - based on OpenVX 1.1 main specification - Enhanced determinism - Specification identifies and numbers requirements • MISRA C clean per KlocWorks v10 • Divides functionality into “development” and “deployment” feature sets - Adds requirement to support import/export extension OpenVX SC Verify Binary Import OpenVX SC Development Feature format Deployment Feature Set Set (Create Graph) Export (Execute Graph) Entire graph creation API Implementation No graph creation API dependent format © Copyright Khronos Group 2017 - Page 13 Benefits of Import-Export • Minimize overheads on the target system - Faster setup-time - Simpler OpenVX runtime - Simpler certification (ISO 26262, etc..) • Enable more graph optimizations when offline - No timing/resource constraints - Example: generate & compile specialized code for graphs © Copyright Khronos Group 2017 - Page 14 How OpenVX Compares to Alternatives Open standard API designed to be Community-driven, Open standard API designed to be Governance implemented and shipped by IHVs open source library implemented and shipped by IHVs Programming Graph defined with C API and then Immediate runtime function calls – Explicit kernels are compiled and Model compiled for run-time execution reading to and from memory executed via run-time API Built-in Vision Small but growing Vast. None. User programs their own or Functionality set of popular functions Mainly on PC/CPU call vision library over OpenCL Target Any combination of processors or Any heterogeneous combination of Mainly PCs and GPUs Hardware non-programmable hardware IEEE FP-capable processors Optimization Pre-declared graph enables Each function reads/writes memory. Any execution topology can be Opportunities significant optimizations Power performance inefficient explicitly programmed Implementations must pass Extensive Test Suite but Implementations must pass Conformance conformance to use trademark no formal Adopters program conformance to use trademark All core functions must be available Available functions vary depending on All core functions must be available Consistency in conformant implementations implementation / platform in all conformant implementations © Copyright Khronos Group 2017 - Page 15 Layered Vision/

Load more