April 4-7, 2016 | Silicon Valley

VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION S6783

Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks™ VisionWorks™ Software Stack AGENDA VisionWorks™ Programming Model Conclusion Demo

2 COMPUTER VISION

Intelligent Video Analytics

Autonomous Driving

Robotics Drones

Augmented Reality

3 COMPUTER VISION

4 COMPUTER VISION APP DEVELOPMENT

Product

Port to target & optimize

Reference Implementation

Concept

5 VISIONWORKS™ MOTIVATION

Deliver high performance, robust computer vision primitives Depth Map

Ease development of computer vision applications on platforms Optical Flow

Accelerate prototype to product cycle

Corner detection

6 VISIONWORKS™ AT A GLANCE

CUDA accelerated library (OpenVX primitives + extensions + Plus Algorithms)

Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV

Thread-safe API

Documentation, tutorials, sample software pipelines that teach use of primitives and framework

7 VISIONWORKS™ SUPPORTED PLATFORMS

Automotive Embedded Desktop

Drive PX JETSON TX1

Ubuntu 14.04, Windows 8

JETSON TK1 Pro  Drive PX2 JETSON TK1

8 VISIONWORKS™ TOOLKIT SOFTWARE STACK

VisionWorks VisionWorks VisionWorks-Plus SfM . . . Object Tracker

VisionWorks Source Samples NVXIO Source Samples Feature Tracking, Hough Transform, Stereo Depth Multimedia Extraction, Camera Hist Equalization.. Abstraction

VisionWorks Core NVIDIA VisionWorks

Library Framework & Primitive Extensions VisionWorks CUDA API OpenVX TM Framework & Primitives Khronos

CUDA Acceleration Framework NVIDIA

9

VISIONWORKS™ PRIMITIVES IMAGE ARITHMETIC Stereo Block Matching Median Filter Absolute Difference IME Create Motion Field Scharr3x3 Accumulate Image IME Refine Motion Field Sobel 3x3 IME Partition Motion Field All OpenVX Accumulate Squared Accumulate Weighted Primitives FEATURES Add/ Subtract/ Multiply + GEOMETRIC Canny Edge Detector Channel Combine TRANSFORMS FAST Corners + Channel Extract Affine Warp + FAST Track Color Convert + Warp Perspective + Harris Corners + CopyImage Flip Image Harris Track Convert Depth Remap Hough Circles Magnitude Scale Image + Hough Lines MultiplyByScalar

Not / Or / And / Xor ANALYSIS Phase FILTERS Histogram NVIDIA Table Lookup BoxFilter Histogram Equalization Threshold Convolution Extensions Integral Image Dilation Filter Mean Std Deviation FLOW & DEPTH Erosion Filter Gaussian Filter Min Max Locations Median Flow Gaussian Pyramid Optical Flow (LK) + + type/mode extension by NVIDIA Laplacian3x3 Semi-Global Matching NVIDIA extension primitives

10 VISIONWORKS™ PRIMITIVES

• VisionWorks primitives are CUDA optimized All OpenVX (except MedianFlow & FindHomography extensions) Primitives • 85% of VisionWorks OpenVX API is also accelerated with NEON. Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref. (Go to "VisionWorks API" -> "NVIDIA Extensions API" -> "Vision Primitives API”)

• Primitive acceleration with VisionWorks NVIDIA • Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x) Extensions • Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x) (Measured on Drive PX, OS=‘V4L' Linux Kernel='3.18.21-tegra-g06aec38' CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’)

11

VISIONWORKS™ SAMPLE APPLICATIONS

Feature Tracker Stereo Depth OpenCV-NPP- Hough Lines & Extraction OpenVX Interop Circles

+ Video stabilization

+ Iterative Motion Estimation/Flow

and other platform specific samples (available only on certain platforms)

Camera Capture, OpenGL interop, Video playback 12 VISIONWORKS SAMPLE APPLICATIONS NVXIO MULTIMEDIA ABSTRACTION

Camera input Interop/EGLStre Interop/EGLStre ams ams

CSI ISP & Camera GFX Processing Render

Vision processing

Video/image file input CUDA Image/Video Image/Video Decode Encode . . .

Streamed CPU COMPLEX video/image GPU (Multi-core NVXIO input ARM v8)

AUDIO SECURITY VIDEO VIDEO 2D ENGINE ENGINE ENGINE ENCODER DECODER (VIC) (APE)

SAFETY SAFETY BOOT PROC CAN PROC IMAGE ENGINE MANAGER (BPMP) (SPE) PROC (ISP) (SCE) (HSM)

I/O 13 VISIONWORKS™ PLUS ALGORITHMS

Structure From Motion Object Tracker

14 Programming with VisionWorks Library

15 VISIONWORKS™ PROGRAMMING MODEL

VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode

Standard specified Heterogeneous compute Direct CUDA API for heterogeneous API with graph advanced CUDA compute API with optimizations developers individual function  calls Extensible with user defined nodes 16 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE OpenVX Immediate mode API enables developers to easily port their applications.

OpenVX API Immediate mode calls are prefixed with “vxu”

Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode.

OpenCV image Source Feature detection

Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames

Image Pyramid

17 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE

Performance boost: Video stabilization application is accelerated by 2.6x (including the overhead for Mat to vx_image conversions)

1.4x OpenCV image Source Feature detection 0.6x 4.9x 2.3x 4.6x Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames 1.7x Image Pyramid

18 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE OpenVX API graph mode calls are prefixed with “vx” OpenVX Graph enables advanced optimizations • Buffer reuse, kernel fusion • Efficient use of streaming and CUDA textures • Automatic scheduling across processing units based on various factors (safety, perf,..) • Tiling and pipelining vision functions at sub-frame level

Feature detection

Image Processs pts Color Optical Warp Source & Find Conversion Flow Perspective Homography Stabilized frames

Image Pyramid

19 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE

Performance boost: Video stabilization application is further accelerated compared to immediate mode.

Feature detection Processs pts Image Color Optical Warp & Find Source Conversion Flow Perspective Homography Stabilized frames

Image Pyramid

20 VISIONWORKS CUDA API FEATURE TRACKING SAMPLE

VisionWorks CUDA API enables developer with low-level access. Developer manages • Data allocations and transfer • Scheduling and pipelining

Camera/image/video Input data YUV Gray Rendering/Output frame frame nvxcuColor nvxcuChannel nvxcuGaussian nvxcuOptica nvxcuHarris Convert Extract Pyramid lFlowPyrLK Track

RGB frame Array of (CUDA buffer) keypoints

21 VISIONWORKS™ API SELECTION

VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode

Let the graph manager to Quick port from other Low level CUDA API hide overheads, optimize libraries access for advanced and manage data  CUDA developers  To be able to reassign CPU and GPU tasks based To be able to reassign CPU on perf. and GPU tasks based on perf. 22 DEBUGGING WITH VISIONWORKS™

Enable VisionWorks debug markers with “export NVX_PROF=nvtx” 23 VISIONWORKS™ DOCUMENTATION

Installed location: /usr/share/visionWorks/docs 24 VISIONWORKS™ FACTS First Khronos OpenVX™ 1.0 compliant library (Jan 2015)

VisionWorks enables key demos (CES’16 and more at GTC)

27K downloads (embedded) since release in Nov, 2015 + Installed by default on all automotive platforms

Weekly VisionWorks downloads for various platforms

25 CONCLUSION

• VisionWorks Toolkit delivers multiple levels of API – OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API • Heterogeneous API enables switching from GPU to CPU – this is very powerful, reducing productization time • Delivers high performance – Offers significant speedup over CUDA optimized OpenCV functions • Adopts native media on Tegra platforms and delivers ready to use code samples

S6739-VisionWorks™ L6129-VisionWorks™ H6115 - Designing Toolkit Programming Toolkit LAB Session Computer Vision Tutorial Applications with – VisionWorks™ Room LL20A Room 210C Pod B 26 RESOURCES & USEFUL LINKS http://www.embedded-vision.com/ https://www.khronos.org/openvx/ https://developer.nvidia.com/embedded/visionworks

VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials

27 FULLY CONVOLUTIONAL NETWORK

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). [3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015.

VISIONWORKS WITH DEEP LEARNING DEMO 28 FULLY CONVOLUTIONAL NETWORK

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). 2015. DEEP LEARNING & VISION DEMO 29 Introduction VisionWorks API OpenVX Sample Overview

30 VISIONWORKS™ Sample Applications

Feature tracking Hough Lines with compressed Histogram Eq . . . w/Camera input with decoded video images Source Samples with multimedia I/0

NVXIO (Multimedia Abstraction)

Platform Software Stack (Multimedia, Interop, GL, UI, System)

31 PLATFORMS & MULTIMEDIA API

Platform Camera Decode Interop Render Encode Android Camera Android API CUDA-OpenGL OpenGLES (?) Android HAL v3.0 interop? 3.0 NvMedia capture NvMedia +Gst EGLStreams OpenGLES Gst Vibrante NvMedia h264 ES (GLFW) Linux4Tegra Gst-capture Gst+OpenMAX EGLStreams OpenGLES Gst+OpenMAX Ubuntu Linux V4L through Gst+VDPAU CUDA-OpenGL OpenGL Gst 14.04 OpenCV4Tegra Interop V4W/OpenCV NVCUVID (Gst?) CUDA-OpenGL OpenGL Ffmeg/OpenCV Windows x64 Interop

Gst - Gstreamer 32 “Multi-quote slide sample.”

— Source: Either a name or publication text here, OR, a company logo to the right

“Multi-quote slide sample.”

— Source: Either a name or publication text here, OR, a company logo to the right

“Multi-quote slide sample.”

— Source: Either a name or publication text here, OR, a company logo to the right

33