Visionworks™ a Cuda Accelerated Computer Vision Library S6783
Total Page:16
File Type:pdf, Size:1020Kb
April 4-7, 2016 | Silicon Valley VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks™ VisionWorks™ Software Stack AGENDA VisionWorks™ Programming Model Conclusion Demo 2 COMPUTER VISION Intelligent Video Analytics Autonomous Driving Robotics Drones Augmented Reality 3 COMPUTER VISION 4 COMPUTER VISION APP DEVELOPMENT Product Port to target & optimize Reference Implementation Concept 5 VISIONWORKS™ MOTIVATION Deliver high performance, robust computer vision primitives Depth Map Ease development of computer vision applications on Tegra platforms Optical Flow Accelerate prototype to product cycle Corner detection 6 VISIONWORKS™ AT A GLANCE CUDA accelerated library (OpenVX primitives + NVIDIA extensions + Plus Algorithms) Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV Thread-safe API Documentation, tutorials, sample software pipelines that teach use of primitives and framework 7 VISIONWORKS™ SUPPORTED PLATFORMS Automotive Embedded Desktop Drive PX JETSON TX1 Ubuntu Linux 14.04, Windows 8 JETSON TK1 Pro Drive PX2 JETSON TK1 8 VISIONWORKS™ TOOLKIT SOFTWARE STACK VisionWorks VisionWorks VisionWorks-Plus SfM . Object Tracker VisionWorks Source Samples NVXIO Source Samples Feature Tracking, Hough Transform, Stereo Depth Multimedia Extraction, Camera Hist Equalization.. Abstraction NVIDIA VisionWorks VisionWorks Core Library Framework & Primitive Extensions VisionWorks CUDA API OpenVX TM Framework & Primitives Khronos CUDA Acceleration Framework NVIDIA 9 VISIONWORKS™ PRIMITIVES IMAGE ARITHMETIC Stereo Block Matching Median Filter Absolute Difference IME Create Motion Field Scharr3x3 Accumulate Image IME Refine Motion Field Sobel 3x3 IME Partition Motion Field All OpenVX Accumulate Squared Accumulate Weighted Primitives FEATURES Add/ Subtract/ Multiply + GEOMETRIC Canny Edge Detector Channel Combine TRANSFORMS FAST Corners + Channel Extract Affine Warp + FAST Track Color Convert + Warp Perspective + Harris Corners + CopyImage Flip Image Harris Track Convert Depth Remap Hough Circles Magnitude Scale Image + Hough Lines MultiplyByScalar Not / Or / And / Xor ANALYSIS Phase FILTERS Histogram NVIDIA Table Lookup BoxFilter Histogram Equalization Threshold Convolution Extensions Integral Image Dilation Filter Mean Std Deviation FLOW & DEPTH Erosion Filter Gaussian Filter Min Max Locations Median Flow Gaussian Pyramid Optical Flow (LK) + + type/mode extension by NVIDIA Laplacian3x3 Semi-Global Matching NVIDIA extension primitives 10 VISIONWORKS™ PRIMITIVES • VisionWorks primitives are CUDA optimized All OpenVX (except MedianFlow & FindHomography extensions) Primitives • 85% of VisionWorks OpenVX API is also accelerated with NEON. Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref. (Go to "VisionWorks API" -> "NVIDIA Extensions API" -> "Vision Primitives API”) • Primitive acceleration with VisionWorks NVIDIA • Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x) Extensions • Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x) (Measured on Drive PX, OS=‘V4L' Linux Kernel='3.18.21-tegra-g06aec38' CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’) 11 VISIONWORKS™ SAMPLE APPLICATIONS Feature Tracker Stereo Depth OpenCV-NPP- Hough Lines & Extraction OpenVX Interop Circles + Video stabilization + Iterative Motion Estimation/Flow and other platform specific samples (available only on certain platforms) Camera Capture, OpenGL interop, Video playback 12 VISIONWORKS SAMPLE APPLICATIONS NVXIO MULTIMEDIA ABSTRACTION Camera input Interop/EGLStre Interop/EGLStre ams ams CSI ISP & Camera GFX Processing Render Vision processing Video/image file input CUDA Image/Video Image/Video Decode Encode . Streamed CPU COMPLEX video/image GPU (Multi-core NVXIO input ARM v8) AUDIO SECURITY VIDEO VIDEO 2D ENGINE ENGINE ENGINE ENCODER DECODER (VIC) (APE) SAFETY SAFETY BOOT PROC CAN PROC IMAGE ENGINE MANAGER (BPMP) (SPE) PROC (ISP) (SCE) (HSM) I/O 13 VISIONWORKS™ PLUS ALGORITHMS Structure From Motion Object Tracker 14 Programming with VisionWorks Library 15 VISIONWORKS™ PROGRAMMING MODEL VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode Standard specified Heterogeneous compute Direct CUDA API for heterogeneous API with graph advanced CUDA compute API with optimizations developers individual function calls Extensible with user defined nodes 16 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE OpenVX Immediate mode API enables developers to easily port their applications. OpenVX API Immediate mode calls are prefixed with “vxu” Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode. OpenCV image Source Feature detection Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames Image Pyramid 17 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE Performance boost: Video stabilization application is accelerated by 2.6x (including the overhead for Mat to vx_image conversions) 1.4x OpenCV image Source Feature detection 0.6x 4.9x 2.3x 4.6x Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames 1.7x Image Pyramid 18 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE OpenVX API graph mode calls are prefixed with “vx” OpenVX Graph enables advanced optimizations • Buffer reuse, kernel fusion • Efficient use of streaming and CUDA textures • Automatic scheduling across processing units based on various factors (safety, perf,..) • Tiling and pipelining vision functions at sub-frame level Feature detection Image Processs pts Color Optical Warp Source & Find Conversion Flow Perspective Homography Stabilized frames Image Pyramid 19 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE Performance boost: Video stabilization application is further accelerated compared to immediate mode. Feature detection Processs pts Image Color Optical Warp & Find Source Conversion Flow Perspective Homography Stabilized frames Image Pyramid 20 VISIONWORKS CUDA API FEATURE TRACKING SAMPLE VisionWorks CUDA API enables developer with low-level access. Developer manages • Data allocations and transfer • Scheduling and pipelining Camera/image/video Input data YUV Gray Rendering/Output frame frame nvxcuColor nvxcuChannel nvxcuGaussian nvxcuOptica nvxcuHarris Convert Extract Pyramid lFlowPyrLK Track RGB frame Array of (CUDA buffer) keypoints 21 VISIONWORKS™ API SELECTION VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode Let the graph manager to Quick port from other Low level CUDA API hide overheads, optimize libraries access for advanced and manage data CUDA developers To be able to reassign CPU and GPU tasks based To be able to reassign CPU on perf. and GPU tasks based on perf. 22 DEBUGGING WITH VISIONWORKS™ Enable VisionWorks debug markers with “export NVX_PROF=nvtx” 23 VISIONWORKS™ DOCUMENTATION Installed location: /usr/share/visionWorks/docs 24 VISIONWORKS™ FACTS First Khronos OpenVX™ 1.0 compliant library (Jan 2015) VisionWorks enables key demos (CES’16 and more at GTC) 27K downloads (embedded) since release in Nov, 2015 + Installed by default on all automotive platforms Weekly VisionWorks downloads for various platforms 25 CONCLUSION • VisionWorks Toolkit delivers multiple levels of API – OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API • Heterogeneous API enables switching from GPU to CPU – this is very powerful, reducing productization time • Delivers high performance – Offers significant speedup over CUDA optimized OpenCV functions • Adopts native media APIs on Tegra platforms and delivers ready to use code samples S6739-VisionWorks™ L6129-VisionWorks™ H6115 - Designing Toolkit Programming Toolkit LAB Session Computer Vision Tutorial Applications with – VisionWorks™ Room LL20A Room 210C Pod B 26 RESOURCES & USEFUL LINKS http://www.embedded-vision.com/ https://www.khronos.org/openvx/ https://developer.nvidia.com/embedded/visionworks VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials 27 FULLY CONVOLUTIONAL NETWORK [1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). [3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015. VISIONWORKS WITH DEEP LEARNING DEMO 28 FULLY CONVOLUTIONAL NETWORK [1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). 2015. DEEP LEARNING & VISION DEMO 29 Introduction VisionWorks API OpenVX Sample Overview 30 VISIONWORKS™ Sample Applications Feature tracking Hough Lines with compressed Histogram Eq . w/Camera input with decoded video images Source Samples with multimedia I/0 NVXIO (Multimedia Abstraction) Platform Software Stack (Multimedia, Interop, GL, UI, System) 31 PLATFORMS & MULTIMEDIA API Platform Camera Decode Interop Render Encode Android Camera