Visionworks™ a Cuda Accelerated Computer Vision Library S6783

Total Page:16

File Type:pdf, Size:1020Kb

Visionworks™ a Cuda Accelerated Computer Vision Library S6783 April 4-7, 2016 | Silicon Valley VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks™ VisionWorks™ Software Stack AGENDA VisionWorks™ Programming Model Conclusion Demo 2 COMPUTER VISION Intelligent Video Analytics Autonomous Driving Robotics Drones Augmented Reality 3 COMPUTER VISION 4 COMPUTER VISION APP DEVELOPMENT Product Port to target & optimize Reference Implementation Concept 5 VISIONWORKS™ MOTIVATION Deliver high performance, robust computer vision primitives Depth Map Ease development of computer vision applications on Tegra platforms Optical Flow Accelerate prototype to product cycle Corner detection 6 VISIONWORKS™ AT A GLANCE CUDA accelerated library (OpenVX primitives + NVIDIA extensions + Plus Algorithms) Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV Thread-safe API Documentation, tutorials, sample software pipelines that teach use of primitives and framework 7 VISIONWORKS™ SUPPORTED PLATFORMS Automotive Embedded Desktop Drive PX JETSON TX1 Ubuntu Linux 14.04, Windows 8 JETSON TK1 Pro Drive PX2 JETSON TK1 8 VISIONWORKS™ TOOLKIT SOFTWARE STACK VisionWorks VisionWorks VisionWorks-Plus SfM . Object Tracker VisionWorks Source Samples NVXIO Source Samples Feature Tracking, Hough Transform, Stereo Depth Multimedia Extraction, Camera Hist Equalization.. Abstraction NVIDIA VisionWorks VisionWorks Core Library Framework & Primitive Extensions VisionWorks CUDA API OpenVX TM Framework & Primitives Khronos CUDA Acceleration Framework NVIDIA 9 VISIONWORKS™ PRIMITIVES IMAGE ARITHMETIC Stereo Block Matching Median Filter Absolute Difference IME Create Motion Field Scharr3x3 Accumulate Image IME Refine Motion Field Sobel 3x3 IME Partition Motion Field All OpenVX Accumulate Squared Accumulate Weighted Primitives FEATURES Add/ Subtract/ Multiply + GEOMETRIC Canny Edge Detector Channel Combine TRANSFORMS FAST Corners + Channel Extract Affine Warp + FAST Track Color Convert + Warp Perspective + Harris Corners + CopyImage Flip Image Harris Track Convert Depth Remap Hough Circles Magnitude Scale Image + Hough Lines MultiplyByScalar Not / Or / And / Xor ANALYSIS Phase FILTERS Histogram NVIDIA Table Lookup BoxFilter Histogram Equalization Threshold Convolution Extensions Integral Image Dilation Filter Mean Std Deviation FLOW & DEPTH Erosion Filter Gaussian Filter Min Max Locations Median Flow Gaussian Pyramid Optical Flow (LK) + + type/mode extension by NVIDIA Laplacian3x3 Semi-Global Matching NVIDIA extension primitives 10 VISIONWORKS™ PRIMITIVES • VisionWorks primitives are CUDA optimized All OpenVX (except MedianFlow & FindHomography extensions) Primitives • 85% of VisionWorks OpenVX API is also accelerated with NEON. Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref. (Go to "VisionWorks API" -> "NVIDIA Extensions API" -> "Vision Primitives API”) • Primitive acceleration with VisionWorks NVIDIA • Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x) Extensions • Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x) (Measured on Drive PX, OS=‘V4L' Linux Kernel='3.18.21-tegra-g06aec38' CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’) 11 VISIONWORKS™ SAMPLE APPLICATIONS Feature Tracker Stereo Depth OpenCV-NPP- Hough Lines & Extraction OpenVX Interop Circles + Video stabilization + Iterative Motion Estimation/Flow and other platform specific samples (available only on certain platforms) Camera Capture, OpenGL interop, Video playback 12 VISIONWORKS SAMPLE APPLICATIONS NVXIO MULTIMEDIA ABSTRACTION Camera input Interop/EGLStre Interop/EGLStre ams ams CSI ISP & Camera GFX Processing Render Vision processing Video/image file input CUDA Image/Video Image/Video Decode Encode . Streamed CPU COMPLEX video/image GPU (Multi-core NVXIO input ARM v8) AUDIO SECURITY VIDEO VIDEO 2D ENGINE ENGINE ENGINE ENCODER DECODER (VIC) (APE) SAFETY SAFETY BOOT PROC CAN PROC IMAGE ENGINE MANAGER (BPMP) (SPE) PROC (ISP) (SCE) (HSM) I/O 13 VISIONWORKS™ PLUS ALGORITHMS Structure From Motion Object Tracker 14 Programming with VisionWorks Library 15 VISIONWORKS™ PROGRAMMING MODEL VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode Standard specified Heterogeneous compute Direct CUDA API for heterogeneous API with graph advanced CUDA compute API with optimizations developers individual function calls Extensible with user defined nodes 16 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE OpenVX Immediate mode API enables developers to easily port their applications. OpenVX API Immediate mode calls are prefixed with “vxu” Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode. OpenCV image Source Feature detection Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames Image Pyramid 17 VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE Performance boost: Video stabilization application is accelerated by 2.6x (including the overhead for Mat to vx_image conversions) 1.4x OpenCV image Source Feature detection 0.6x 4.9x 2.3x 4.6x Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames 1.7x Image Pyramid 18 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE OpenVX API graph mode calls are prefixed with “vx” OpenVX Graph enables advanced optimizations • Buffer reuse, kernel fusion • Efficient use of streaming and CUDA textures • Automatic scheduling across processing units based on various factors (safety, perf,..) • Tiling and pipelining vision functions at sub-frame level Feature detection Image Processs pts Color Optical Warp Source & Find Conversion Flow Perspective Homography Stabilized frames Image Pyramid 19 VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE Performance boost: Video stabilization application is further accelerated compared to immediate mode. Feature detection Processs pts Image Color Optical Warp & Find Source Conversion Flow Perspective Homography Stabilized frames Image Pyramid 20 VISIONWORKS CUDA API FEATURE TRACKING SAMPLE VisionWorks CUDA API enables developer with low-level access. Developer manages • Data allocations and transfer • Scheduling and pipelining Camera/image/video Input data YUV Gray Rendering/Output frame frame nvxcuColor nvxcuChannel nvxcuGaussian nvxcuOptica nvxcuHarris Convert Extract Pyramid lFlowPyrLK Track RGB frame Array of (CUDA buffer) keypoints 21 VISIONWORKS™ API SELECTION VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode Let the graph manager to Quick port from other Low level CUDA API hide overheads, optimize libraries access for advanced and manage data CUDA developers To be able to reassign CPU and GPU tasks based To be able to reassign CPU on perf. and GPU tasks based on perf. 22 DEBUGGING WITH VISIONWORKS™ Enable VisionWorks debug markers with “export NVX_PROF=nvtx” 23 VISIONWORKS™ DOCUMENTATION Installed location: /usr/share/visionWorks/docs 24 VISIONWORKS™ FACTS First Khronos OpenVX™ 1.0 compliant library (Jan 2015) VisionWorks enables key demos (CES’16 and more at GTC) 27K downloads (embedded) since release in Nov, 2015 + Installed by default on all automotive platforms Weekly VisionWorks downloads for various platforms 25 CONCLUSION • VisionWorks Toolkit delivers multiple levels of API – OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API • Heterogeneous API enables switching from GPU to CPU – this is very powerful, reducing productization time • Delivers high performance – Offers significant speedup over CUDA optimized OpenCV functions • Adopts native media APIs on Tegra platforms and delivers ready to use code samples S6739-VisionWorks™ L6129-VisionWorks™ H6115 - Designing Toolkit Programming Toolkit LAB Session Computer Vision Tutorial Applications with – VisionWorks™ Room LL20A Room 210C Pod B 26 RESOURCES & USEFUL LINKS http://www.embedded-vision.com/ https://www.khronos.org/openvx/ https://developer.nvidia.com/embedded/visionworks VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials 27 FULLY CONVOLUTIONAL NETWORK [1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). [3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015. VISIONWORKS WITH DEEP LEARNING DEMO 28 FULLY CONVOLUTIONAL NETWORK [1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). 2015. DEEP LEARNING & VISION DEMO 29 Introduction VisionWorks API OpenVX Sample Overview 30 VISIONWORKS™ Sample Applications Feature tracking Hough Lines with compressed Histogram Eq . w/Camera input with decoded video images Source Samples with multimedia I/0 NVXIO (Multimedia Abstraction) Platform Software Stack (Multimedia, Interop, GL, UI, System) 31 PLATFORMS & MULTIMEDIA API Platform Camera Decode Interop Render Encode Android Camera
Recommended publications
  • Visual Development Environment for Openvx
    ______________________________________________________PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION Visual Development Environment for OpenVX Alexey Syschikov, Boris Sedov, Konstantin Nedovodeev, Sergey Pakharev Saint Petersburg State University of Aerospace Instrumentation Saint Petersburg, Russia {alexey.syschikov, boris.sedov, konstantin.nedovodeev, sergey.pakharev}@guap.ru Abstract—OpenVX standard has appeared as an answer II. STATE OF THE ART from the computer vision community to the challenge of accelerating vision applications on embedded heterogeneous OpenVX is intended to increase performance and reduce platforms. It is designed as a low-level programming framework power consumption of machine vision applications. It is that enables software developers to leverage the computer vision focused on embedded systems with real-time use cases such as hardware potential with functional and performance portability. face, body and gesture tracking, video surveillance, advanced In this paper, we present the visual environment for OpenVX driver assistance systems (ADAS), object and scene programs development. To the best of our knowledge, this is the reconstruction, augmented reality, visual inspection etc. first time the graphical notation is used for OpenVX programming. Our environment addresses the need to design The using of OpenVX standard functions is a way to ensure OpenVX graphs in a natural visual form with automatic functional portability of the developed software to all hardware generation of a full-fledged program, saving the programmer platforms that support OpenVX. from writing a bunch of a boilerplate code. Using the VIPE visual IDE to develop OpenVX programs also makes it possible to work Since the OpenVX API is based on opaque data types, with our performance analysis tools.
    [Show full text]
  • GLSL 4.50 Spec
    The OpenGL® Shading Language Language Version: 4.50 Document Revision: 7 09-May-2017 Editor: John Kessenich, Google Version 1.1 Authors: John Kessenich, Dave Baldwin, Randi Rost Copyright (c) 2008-2017 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast, or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be reformatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions.
    [Show full text]
  • The Importance of Data
    The landscape of Parallel Programing Models Part 2: The importance of Data Michael Wong and Rod Burns Codeplay Software Ltd. Distiguished Engineer, Vice President of Ecosystem IXPUG 2020 2 © 2020 Codeplay Software Ltd. Distinguished Engineer Michael Wong ● Chair of SYCL Heterogeneous Programming Language ● C++ Directions Group ● ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong ● [email protected][email protected] Ported ● Head of Delegation for C++ Standard for Canada Build LLVM- TensorFlow to based ● Chair of Programming Languages for Standards open compilers for Council of Canada standards accelerators Chair of WG21 SG19 Machine Learning using SYCL Chair of WG21 SG14 Games Dev/Low Latency/Financial Trading/Embedded Implement Releasing open- ● Editor: C++ SG5 Transactional Memory Technical source, open- OpenCL and Specification standards based AI SYCL for acceleration tools: ● Editor: C++ SG1 Concurrency Technical Specification SYCL-BLAS, SYCL-ML, accelerator ● MISRA C++ and AUTOSAR VisionCpp processors ● Chair of Standards Council Canada TC22/SC32 Electrical and electronic components (SOTIF) ● Chair of UL4600 Object Tracking ● http://wongmichael.com/about We build GPU compilers for semiconductor companies ● C++11 book in Chinese: Now working to make AI/ML heterogeneous acceleration safe for https://www.amazon.cn/dp/B00ETOV2OQ autonomous vehicle 3 © 2020 Codeplay Software Ltd. Acknowledgement and Disclaimer Numerous people internal and external to the original C++/Khronos group, in industry and academia, have made contributions, influenced ideas, written part of this presentations, and offered feedbacks to form part of this talk. But I claim all credit for errors, and stupid mistakes. These are mine, all mine! You can’t have them.
    [Show full text]
  • Report of Contributions
    X.Org Developers Conference 2020 Report of Contributions https://xdc2020.x.org/e/XDC2020 X.Org Developer … / Report of Contributions State of text input on Wayland Contribution ID: 1 Type: not specified State of text input on Wayland Wednesday, 16 September 2020 20:15 (5 minutes) Between the last impromptu talk at GUADEC 2018, text input on Wayland has become more organized and more widely adopted. As before, the three-pronged approach of text_input, in- put_method, and virtual keyboard still causes confusion, but increased interest in implementing it helps find problems and come closer to something that really works for many usecases. The talk will mention how a broken assumption causes a broken protocol, and why we’re notdone with Wayland input methods yet. It’s recommended to people who want to know more about the current state of input methods on Wayland. Recommended background: aforementioned GUADEC talk, wayland-protocols reposi- tory, my blog: https://dcz_self.gitlab.io/ Code of Conduct Yes GSoC, EVoC or Outreachy No Primary author: DCZ, Dorota Session Classification: Demos / Lightning talks I Track Classification: Lightning Talk September 30, 2021 Page 1 X.Org Developer … / Report of Contributions IGT GPU Tools 2020 Update Contribution ID: 2 Type: not specified IGT GPU Tools 2020 Update Wednesday, 16 September 2020 20:00 (5 minutes) Short update on IGT - what has changed in the last year, where are we right now and what we have planned for the near future. IGT GPU Tools is a collection of tools and tests aiding development of DRM drivers. It’s widely used by Intel in its public CI system.
    [Show full text]
  • Blackberry QNX Multimedia Suite
    PRODUCT BRIEF QNX Multimedia Suite The QNX Multimedia Suite is a comprehensive collection of media technology that has evolved over the years to keep pace with the latest media requirements of current-day embedded systems. Proven in tens of millions of automotive infotainment head units, the suite enables media-rich, high-quality playback, encoding and streaming of audio and video content. The multimedia suite comprises a modular, highly-scalable architecture that enables building high value, customized solutions that range from simple media players to networked systems in the car. The suite is optimized to leverage system-on-chip (SoC) video acceleration, in addition to supporting OpenMAX AL, an industry open standard API for application-level access to a device’s audio, video and imaging capabilities. Overview Consumer’s demand for multimedia has fueled an anywhere- o QNX SDK for Smartphone Connectivity (with support for Apple anytime paradigm, making multimedia ubiquitous in embedded CarPlay and Android Auto) systems. More and more embedded applications have require- o Qt distributions for QNX SDP 7 ments for audio, video and communication processing capabilities. For example, an infotainment system’s media player enables o QNX CAR Platform for Infotainment playback of content, stored either on-board or accessed from an • Support for a variety of external media stores external drive, mobile device or streamed over IP via a browser. Increasingly, these systems also have streaming requirements for Features at a Glance distributing content across a network, for instance from a head Multimedia Playback unit to the digital instrument cluster or rear seat entertainment units. Multimedia is also becoming pervasive in other markets, • Software-based audio CODECs such as medical, industrial, and whitegoods where user interfaces • Hardware accelerated video CODECs are increasingly providing users with a rich media experience.
    [Show full text]
  • SID Khronos Open Standards for AR May17
    Open Standards for AR Neil Trevett | Khronos President NVIDIA VP Developer Ecosystem [email protected] | @neilt3d LA, May 2017 © Copyright Khronos Group 2017 - Page 1 Khronos Mission Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, Virtual and Augmented Reality, Parallel Computing, Neural Networks and Vision Processing © Copyright Khronos Group 2017 - Page 2 Khronos Standards Ecosystem 3D for the Web Real-time 2D/3D - Real-time apps and games in-browser - Cross-platform gaming and UI - Efficiently delivering runtime 3D assets - VR and AR Displays - CAD and Product Design - Safety-critical displays VR, Vision, Neural Networks Parallel Computation - VR/AR system portability - Tracking and odometry - Machine Learning acceleration - Embedded vision processing - Scene analysis/understanding - High Performance Computing (HPC) - Neural Network inferencing © Copyright Khronos Group 2017 - Page 3 Why AR Needs Standard Acceleration APIs Without API Standards With API Standards Platform Application Fragmentation Portability Everything Silicon runs on CPU Acceleration Standard Acceleration APIs provide PERFORMANCE, POWER AND PORTABILITY © Copyright Khronos Group 2017 - Page 4 AR Processing Flow Download 3D augmentation object and scene data Tracking and Positioning Generate Low Latency Vision Geometric scene 3D Augmentations for sensor(s) reconstruction display by optical system Semantic scene understanding
    [Show full text]
  • The Opencl Specification
    The OpenCL Specification Version: 2.0 Document Revision: 22 Khronos OpenCL Working Group Editor: Aaftab Munshi Last Revision Date: 3/18/14 Page 1 1. INTRODUCTION ............................................................................................................... 10 2. GLOSSARY ......................................................................................................................... 12 3. THE OPENCL ARCHITECTURE .................................................................................... 23 3.1 Platform Model ................................................................................................................................ 23 3.2 Execution Model .............................................................................................................................. 25 3.2.1 Execution Model: Mapping work-items onto an NDRange ........................................................................28 3.2.2 Execution Model: Execution of kernel-instances ........................................................................................30 3.2.3 Execution Model: Device-side enqueue ......................................................................................................31 3.2.4 Execution Model: Synchronization .............................................................................................................32 3.2.5 Execution Model: Categories of Kernels ....................................................................................................33 3.3 Memory
    [Show full text]
  • Standards for Vision Processing and Neural Networks
    Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD [email protected] © Copyright Khronos Group 2017 - Page 1 Agenda • Why we need a standard? • Khronos NNEF • Khronos OpenVX dog Network Architecture Pre-trained Network Model (weights, …) © Copyright Khronos Group 2017 - Page 2 Neural Network End-to-End Workflow Neural Network Third Vision/AI Party Applications Training Frameworks Tools Datasets Trained Vision and Neural Network Network Inferencing Runtime Network Model Architecture Desktop and Cloud Hardware Embedded/Mobile Embedded/MobileEmbedded/Mobile Embedded/Mobile/Desktop/CloudVision/InferencingVision/Inferencing Hardware Hardware cuDNN MIOpen MKL-DNN Vision/InferencingVision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA © Copyright Khronos Group 2017 - Page 3 Problem: Neural Network Fragmentation Neural Network Training and Inferencing Fragmentation NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator Neural Network Inferencing Fragmentation toll on Applications Inference Engine 1 Hardware 1 Vision/AI Inference Engine 2 Hardware 2 Application Inference Engine 3 Hardware 3 Every Application Needs know about Every Accelerator API © Copyright Khronos Group 2017 - Page 4 Khronos APIs Connect Software to Silicon Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access
    [Show full text]
  • Openbricks Embedded Linux Framework - User Manual I
    OpenBricks Embedded Linux Framework - User Manual i OpenBricks Embedded Linux Framework - User Manual OpenBricks Embedded Linux Framework - User Manual ii Contents 1 OpenBricks Introduction 1 1.1 What is it ?......................................................1 1.2 Who is it for ?.....................................................1 1.3 Which hardware is supported ?............................................1 1.4 What does the software offer ?............................................1 1.5 Who’s using it ?....................................................1 2 List of supported features 2 2.1 Key Features.....................................................2 2.2 Applicative Toolkits..................................................2 2.3 Graphic Extensions..................................................2 2.4 Video Extensions...................................................3 2.5 Audio Extensions...................................................3 2.6 Media Players.....................................................3 2.7 Key Audio/Video Profiles...............................................3 2.8 Networking Features.................................................3 2.9 Supported Filesystems................................................4 2.10 Toolchain Features..................................................4 3 OpenBricks Supported Platforms 5 3.1 Supported Hardware Architectures..........................................5 3.2 Available Platforms..................................................5 3.3 Certified Platforms..................................................7
    [Show full text]
  • Khronos Template 2015
    Ecosystem Overview Neil Trevett | Khronos President NVIDIA Vice President Developer Ecosystem [email protected] | @neilt3d © Copyright Khronos Group 2016 - Page 1 Khronos Mission Software Silicon Khronos is an Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for graphics, parallel compute and vision © Copyright Khronos Group 2016 - Page 2 http://accelerateyourworld.org/ © Copyright Khronos Group 2016 - Page 3 Vision Pipeline Challenges and Opportunities Growing Camera Diversity Diverse Vision Processors Sensor Proliferation 22 Flexible sensor and camera Use efficient acceleration to Combine vision output control to GENERATE PROCESS with other sensor data an image stream the image stream on device © Copyright Khronos Group 2016 - Page 4 OpenVX – Low Power Vision Acceleration • Higher level abstraction API - Targeted at real-time mobile and embedded platforms • Performance portability across diverse architectures - Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware… • Extends portable vision acceleration to very low power domains - Doesn’t require high-power CPU/GPU Complex - Lower precision requirements than OpenCL - Low-power host can setup and manage frame-rate graph Vision Engine Middleware Application X100 Dedicated Vision Processing Hardware Efficiency Vision DSPs X10 GPU Compute Accelerator Multi-core Accelerator Power Efficiency Power X1 CPU Accelerator Computation Flexibility © Copyright Khronos Group 2016 - Page 5 OpenVX Graphs
    [Show full text]
  • Gstreamer and Dmabuf
    GStreamer and dmabuf OMAP4+ graphics/multimedia update Rob Clark Outline • A quick hardware overview • Kernel infrastructure: drm/gem, rpmsg+dce, dmabuf • Blinky s***.. putting pixels on the screen • Bringing it all together in GStreamer A quick hardware overview DMM/Tiler • Like a system-wide GART – Provides a contiguous view of memory to various hw accelerators: IVAHD, ISS, DSS • Provides tiling modes for enhanced memory bandwidth efficiency – For initiators like IVAHD which access memory in 2D block patterns • Provides support for rotation – Zero cost rotation for DSS/ISS access in 0º/90º/180º/270º orientations (with horizontal or vertical reflection) IVA-HD • Multi-codec hw video encode/decode – H.264 BP/MP/HP encode/decode – MPEG-4 SP/ASP encode/decode – MPEG-2 SP/MP encode/decode – MJPEG encode/decode – VC1/WMV9 decode – etc DSS – Display Subsystem • Display Subsystem – 4 video pipes, 3 support scaling and YUV – Any number of video pipes can be attached to one of 3 “overlay manager” to route to a display Kernel infrastructure: drm/gem, rpmsg+dce, dmabuf DRM Overview • DRM → Direct Rendering Manager – Started life heavily based on x86/desktop graphics card architecture – But more recently has evolved to better support ARM and other SoC platforms • KMS → Kernel Mode Setting – Replaces fbdev for more advanced display management – Hotplug, multiple display support (spanning/cloning) – And more recently support for overlays (planes) • GEM → Graphics Execution Manager – But the important/useful part here is the graphics/multimedia buffer management DRM - KMS • Models the display hardware as: – Connector → the thing that the display connects to • Handles DDC/EDID, hotplug detection – Encoder → takes pixel data from CRTC and encodes it to a format suitable for connectors • ie.
    [Show full text]
  • GPU Technology Conference 2010 Sessions on 2095
    GPU Technology Conference 2010 Sessions on Video Processing (subject to change) IMPORTANT: Visit www.nvidia.com/gtc for the most up-to-date schedule and to enroll into sessions to ensure your spot in the most popular courses. 2095 - Building High Density Real-Time Video Processing Systems Learn how GPU Direct can be used to effectively build real time, high performance, cost effective video processing products. We will focus especially on how to optimize bus throughput while keeping CPU load and latency minimal. Speaker: Ronny Dewaele, Barco Topics: Video Processing, Imaging Time: Thursday, September, 23rd, 16:00 - 16:50 2029 - Computer Vision Algorithms for Automating HD Post- Production Discover how post-production tasks can be accelerated by taking advantage of GPU-based algorithms. In this talk we present computer vision algorithms for corner detection, feature point tracking, image warping and image inpainting, and their efficient implementation on GPUs using CUDA. We also show how to use these algorithms to do real-time stabilization and temporal re-sampling (re-timing) of high definition video sequences, both common tasks in post-production. Benchmarking of the GPU implementations against optimized CPU algorithms demonstrates a speedup of approximately an order of magnitude. Speaker: Hannes Fassold, JOANNEUM RESEARCH Topics: Computer Vision, Video Processing Time: Wednesday, September, 22nd, 15:00 - 15:50 2125 - Developing GPU Enabled Visual Effects For Film And Video 1 The arrival of fully programable GPUs is now changing the visual effects industry, which traditionally relied on CPU computation to create their spectacular imagery. Implementing the complex image processing algorithms used by VFX is a challenge, but the payoffs in terms of interactivity and throughput can be enormous.
    [Show full text]