April 4-7, 2016 | Silicon Valley
VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783
Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks™ VisionWorks™ Software Stack AGENDA VisionWorks™ Programming Model Conclusion Demo
2 COMPUTER VISION
Intelligent Video Analytics
Autonomous Driving
Robotics Drones
Augmented Reality
3 COMPUTER VISION
4 COMPUTER VISION APP DEVELOPMENT
Product
Port to target & optimize
Reference Implementation
Concept
5 VISIONWORKS™ MOTIVATION
Deliver high performance, robust computer vision primitives Depth Map
Ease development of computer vision applications on Tegra platforms Optical Flow
Accelerate prototype to product cycle
Corner detection
6 VISIONWORKS™ AT A GLANCE
CUDA accelerated library (OpenVX primitives + NVIDIA extensions + Plus Algorithms)
Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV
Thread-safe API
Documentation, tutorials, sample software pipelines that teach use of primitives and framework
7 VISIONWORKS™ SUPPORTED PLATFORMS
Automotive Embedded Desktop
Drive PX JETSON TX1
Ubuntu Linux 14.04, Windows 8
JETSON TK1 Pro Drive PX2 JETSON TK1
8 VISIONWORKS™ TOOLKIT SOFTWARE STACK
VisionWorks VisionWorks VisionWorks-Plus SfM . . . Object Tracker
VisionWorks Source Samples NVXIO Source Samples Feature Tracking, Hough Transform, Stereo Depth Multimedia Extraction, Camera Hist Equalization.. Abstraction
VisionWorks Core NVIDIA VisionWorks
Library Framework & Primitive Extensions VisionWorks CUDA API OpenVX TM Framework & Primitives Khronos
CUDA Acceleration Framework NVIDIA
9
VISIONWORKS™ PRIMITIVES IMAGE ARITHMETIC Stereo Block Matching Median Filter Absolute Difference IME Create Motion Field Scharr3x3 Accumulate Image IME Refine Motion Field Sobel 3x3 IME Partition Motion Field All OpenVX Accumulate Squared Accumulate Weighted Primitives FEATURES Add/ Subtract/ Multiply + GEOMETRIC Canny Edge Detector Channel Combine TRANSFORMS FAST Corners + Channel Extract Affine Warp + FAST Track Color Convert + Warp Perspective + Harris Corners + CopyImage Flip Image Harris Track Convert Depth Remap Hough Circles Magnitude Scale Image + Hough Lines MultiplyByScalar
Not / Or / And / Xor ANALYSIS Phase FILTERS Histogram NVIDIA Table Lookup BoxFilter Histogram Equalization Threshold Convolution Extensions Integral Image Dilation Filter Mean Std Deviation FLOW & DEPTH Erosion Filter Gaussian Filter Min Max Locations Median Flow Gaussian Pyramid Optical Flow (LK) + + type/mode extension by NVIDIA Laplacian3x3 Semi-Global Matching NVIDIA extension primitives
10 VISIONWORKS™ PRIMITIVES
• VisionWorks primitives are CUDA optimized All OpenVX (except MedianFlow & FindHomography extensions) Primitives • 85% of VisionWorks OpenVX API is also accelerated with NEON. Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref. (Go to "VisionWorks API" -> "NVIDIA Extensions API" -> "Vision Primitives API”)
• Primitive acceleration with VisionWorks NVIDIA • Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x) Extensions • Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x) (Measured on Drive PX, OS=‘V4L' Linux Kernel='3.18.21-tegra-g06aec38' CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’)
11
VISIONWORKS™ SAMPLE APPLICATIONS
Feature Tracker Stereo Depth OpenCV-NPP- Hough Lines & Extraction OpenVX Interop Circles
+ Video stabilization
+ Iterative Motion Estimation/Flow
and other platform specific samples (available only on certain platforms)
Camera Capture, OpenGL interop, Video playback 12 VISIONWORKS SAMPLE APPLICATIONS NVXIO MULTIMEDIA ABSTRACTION
Camera input Interop/EGLStre Interop/EGLStre ams ams
CSI ISP & Camera GFX Processing Render
Vision processing
Video/image file input CUDA Image/Video Image/Video Decode Encode . . .
Streamed CPU COMPLEX video/image GPU (Multi-core NVXIO input ARM v8)
AUDIO SECURITY VIDEO VIDEO 2D ENGINE ENGINE ENGINE ENCODER DECODER (VIC) (APE)
SAFETY SAFETY BOOT PROC CAN PROC IMAGE ENGINE MANAGER (BPMP) (SPE) PROC (ISP) (SCE) (HSM)
I/O 13 VISIONWORKS™ PLUS ALGORITHMS
Structure From Motion Object Tracker
14 Programming with VisionWorks Library
15 VISIONWORKS™ PROGRAMMING MODEL
VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode
Standard specified Heterogeneous compute Direct CUDA API for heterogeneous API with graph advanced CUDA compute API with optimizations developers individual function calls Extensible with user defined nodes 16 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE OpenVX Immediate mode API enables developers to easily port their applications.
OpenVX API Immediate mode calls are prefixed with “vxu”
Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode.
OpenCV image Source Feature detection
Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames
Image Pyramid
17 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE
Performance boost: Video stabilization application is accelerated by 2.6x (including the overhead for Mat to vx_image conversions)
1.4x OpenCV image Source Feature detection 0.6x 4.9x 2.3x 4.6x Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames 1.7x Image Pyramid
18 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE OpenVX API graph mode calls are prefixed with “vx” OpenVX Graph enables advanced optimizations • Buffer reuse, kernel fusion • Efficient use of streaming and CUDA textures • Automatic scheduling across processing units based on various factors (safety, perf,..) • Tiling and pipelining vision functions at sub-frame level
Feature detection
Image Processs pts Color Optical Warp Source & Find Conversion Flow Perspective Homography Stabilized frames
Image Pyramid
19 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE
Performance boost: Video stabilization application is further accelerated compared to immediate mode.
Feature detection Processs pts Image Color Optical Warp & Find Source Conversion Flow Perspective Homography Stabilized frames
Image Pyramid
20 VISIONWORKS CUDA API FEATURE TRACKING SAMPLE
VisionWorks CUDA API enables developer with low-level access. Developer manages • Data allocations and transfer • Scheduling and pipelining
Camera/image/video Input data YUV Gray Rendering/Output frame frame nvxcuColor nvxcuChannel nvxcuGaussian nvxcuOptica nvxcuHarris Convert Extract Pyramid lFlowPyrLK Track
RGB frame Array of (CUDA buffer) keypoints
21 VISIONWORKS™ API SELECTION
VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode
Let the graph manager to Quick port from other Low level CUDA API hide overheads, optimize libraries access for advanced and manage data CUDA developers To be able to reassign CPU and GPU tasks based To be able to reassign CPU on perf. and GPU tasks based on perf. 22 DEBUGGING WITH VISIONWORKS™
Enable VisionWorks debug markers with “export NVX_PROF=nvtx” 23 VISIONWORKS™ DOCUMENTATION
Installed location: /usr/share/visionWorks/docs 24 VISIONWORKS™ FACTS First Khronos OpenVX™ 1.0 compliant library (Jan 2015)
VisionWorks enables key demos (CES’16 and more at GTC)
27K downloads (embedded) since release in Nov, 2015 + Installed by default on all automotive platforms
Weekly VisionWorks downloads for various platforms
25 CONCLUSION
• VisionWorks Toolkit delivers multiple levels of API – OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API • Heterogeneous API enables switching from GPU to CPU – this is very powerful, reducing productization time • Delivers high performance – Offers significant speedup over CUDA optimized OpenCV functions • Adopts native media APIs on Tegra platforms and delivers ready to use code samples
S6739-VisionWorks™ L6129-VisionWorks™ H6115 - Designing Toolkit Programming Toolkit LAB Session Computer Vision Tutorial Applications with – VisionWorks™ Room LL20A Room 210C Pod B 26 RESOURCES & USEFUL LINKS http://www.embedded-vision.com/ https://www.khronos.org/openvx/ https://developer.nvidia.com/embedded/visionworks
VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials
27 FULLY CONVOLUTIONAL NETWORK
[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). [3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015.
VISIONWORKS WITH DEEP LEARNING DEMO 28 FULLY CONVOLUTIONAL NETWORK
[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). 2015. DEEP LEARNING & VISION DEMO 29 Introduction VisionWorks API OpenVX Sample Overview
30 VISIONWORKS™ Sample Applications
Feature tracking Hough Lines with compressed Histogram Eq . . . w/Camera input with decoded video images Source Samples with multimedia I/0
NVXIO (Multimedia Abstraction)
Platform Software Stack (Multimedia, Interop, GL, UI, System)
31 PLATFORMS & MULTIMEDIA API
Platform Camera Decode Interop Render Encode Android Camera Android API CUDA-OpenGL OpenGLES (?) Android HAL v3.0 interop? 3.0 NvMedia capture NvMedia +Gst EGLStreams OpenGLES Gst Vibrante NvMedia h264 ES (GLFW) Linux4Tegra Gst-capture Gst+OpenMAX EGLStreams OpenGLES Gst+OpenMAX Ubuntu Linux V4L through Gst+VDPAU CUDA-OpenGL OpenGL Gst 14.04 OpenCV4Tegra Interop V4W/OpenCV NVCUVID (Gst?) CUDA-OpenGL OpenGL Ffmeg/OpenCV Windows x64 Interop
Gst - Gstreamer 32 “Multi-quote slide sample.”
— Source: Either a name or publication text here, OR, a company logo to the right
“Multi-quote slide sample.”
— Source: Either a name or publication text here, OR, a company logo to the right
“Multi-quote slide sample.”
— Source: Either a name or publication text here, OR, a company logo to the right
33