Khronos Standard Apis for Accelerating Vision and Inferencing

Total Page:16

File Type:pdf, Size:1020Kb

Khronos Standard Apis for Accelerating Vision and Inferencing Khronos Standard APIs for Accelerating Vision and Inferencing Neil Trevett Khronos President NVIDIA VP Developer Ecosystems 22nd September 2020 © 2020 The Khronos Group Khronos Connects Software to Silicon Open interoperability standards to enable software to effectively harness the power of 3D and multiprocessor acceleration 3D graphics, XR, parallel programming, vision acceleration and machine learning Non-profit, member-driven standards-defining industry consortium Open to any interested company All Khronos standards are royalty-free Well-defined IP Framework protects participant’s intellectual property Founded in 2000 >150 Members ~ 40% US, 30% Europe, 30% Asia © 2020 The Khronos Group 2 Khronos Active Initiatives 3D Graphics 3D Assets Portable XR Parallel Computation Desktop, Mobile, Web Authoring Augmented and Vision, Inferencing, Embedded and Safety Critical and Delivery Virtual Reality Machine Learning © 2020 The Khronos Group 3 Khronos Compute Acceleration Standards Higher-level Languages and APIs Streamlined development and Single source C++ programming Graph-based vision and performance portability with compute acceleration inferencing acceleration Lower-level APIs Direct Hardware Control GPU rendering + Intermediate Heterogeneous compute Representation (IR) compute acceleration supporting parallel Increasing industry acceleration execution and interest in parallel graphics compute acceleration to combat the ‘End of Moore’s Law’ Hardware GPU CPUCPUCPU GPUGPU FPGA DSP AI/Tensor HW Custom Hardware © 2020 The Khronos Group 4 Embedded Vision and Inferencing Acceleration Networks trained on high-end Neural Network Training desktop and cloud systems Training Data Trained Compilation Networks Ingestion Applications link to compiled Compiled Vision / Inferencing C++ Application inferencing code or call Engine Code vision/inferencing API Code Sensor Hardware Acceleration APIs Data Diverse Embedded Hardware DSP Dedicated (GPUs, DSPs, FPGAs) Hardware FPGA GPU © 2020 The Khronos Group 5 NNEF Neural Network Exchange Format Before - Training and Inferencing Fragmentation Training Framework 1 Inference Engine 1 Training Framework 2 Inference Engine 2 Training Framework 3 Inference Engine 3 Every Inferencing Engine needs a custom importer from every Framework After - NN Training and Inferencing Interoperability Training Framework 1 Inference Engine 1 Training Framework 2 Inference Engine 2 Training Framework 3 Inference Engine 3 Common optimization and processing tools © 2020 The Khronos Group 6 NNEF and ONNX ONNX and NNEF Embedded Inferencing Import Training Interchange are Complementary ONNX moves quickly to track authoring Defined Specification Open Source Project framework updates NNEF provides a stable bridge from Multi-company Governance at Khronos Initiated by Facebook & Microsoft training into edge inferencing engines Stability for hardware deployment Software stack flexibility NNEF V1.0 released in August 2018 ONNX 1.6 Released in September 2019 After positive industry feedback on Provisional Specification. Introduced support for Quantization Maintenance update issued in September 2019 ONNX Runtime being integrated with GPU inferencing engines Extensions to V1.0 released for expanded functionality such as NVIDIA TensorRT NNEF Working Group Participants ONNX Supporters © 2020 The Khronos Group 7 NNEF Open Source Tools Ecosystem Compound operations TensorFlow and captured by exporting TensorFlow Lite graph Python script Import/Export NNEF Model Zoo Now available on GitHub. Useful for checking that ingested NNEF produces ONNX Caffe and Import/Export Caffe2 acceptable results on target system Import/Export Files NNEF adopts a rigorous approach to design lifecycle Syntax OpenVX Especially important for safety-critical or Parser and Ingestion and mission-critical applications in automotive, Validator Execution industrial and infrastructure markets NNEF open source projects hosted on Khronos NNEF GitHub repository under Apache 2.0 https://github.com/KhronosGroup/NNEF-Tools © 2020 The Khronos Group 8 SYCL Single Source C++ Parallel Programming Complex ML frameworks can be directly compiled SYCL-BLAS, SYCL-DNN, Standard C++ C++ ML and accelerated SYCL-Eigen, Application Libraries Frameworks SYCL Parallel STL Code C++ Template C++ Template C++ Template C++ templates and lambda functions separate host & Libraries Libraries Libraries accelerated device code SYCL Compiler CPU C++ Kernel Fusion can give for OpenCL Compiler better performance on complex apps and libs than hand-coding CPU Accelerated code passed into device CPUCPUCPU GPUGPU SYCL is ideal for accelerating larger OpenCL compilers C++-based engines and applications FPGA DSP with performance portability AI/Tensor HW Custom Hardware © 2020 The Khronos Group 9 SYCL Implementations SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and SYCL deployment of multiple acceleration technologies Source Code SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute DPC++ ComputeCpp triSYCL hipSYCL Uses LLVM/Clang SYCL 1.2.1 on Open source SYCL 1.2.1 on Part of oneAPI multiple hardware test bed CUDA & HIP/ROCm Experimental Any CPU CUDA+PTX Any CPU OpenCL+PTX OpenMP OpenMP CUDA NVIDIA GPUs NVIDIA GPUs Any CPU Any CPU NVIDIA GPUs OpenCL + OpenCL + OpenCL + ROCm SPIR-V SPIR(-V) SPIR/LLVM AMD GPUs Intel CPUs Intel CPUs XILINX FPGAs Intel GPUs Intel GPUs POCL Multiple Backends in Development Intel FPGAs Intel FPGAs (open source OpenCL supporting CPUs and NVIDIA GPUs and more) SYCL beginning to be supported on multiple AMD GPUs (depends on driver stack) low-level APIs in addition to OpenCL Arm Mali e.g. ROCm and CUDA IMG PowerVR For more information: http://sycl.tech Renesas R-Car © 2020 The Khronos Group 10 OpenVX Cross-Vendor Vision and Inferencing OpenVX High-level graph-based abstraction for portable, efficient vision processing Graph can contain vision processing and NN nodes – enables global optimizations Optimized OpenVX drivers created, optimized and shipped by processor vendors Implementable on almost any hardware or processor with performance portability Run-time graph execution need very little host CPU interaction OpenVX Graph Vision Node Native Vision Vision Downstream Camera Application Node Node Control CNN Nodes Processing NNEF Translator converts NNEF representation into OpenVX Node Graphs Performance comparable to hand-optimized, non-portable code Real, complex applications on real, complex hardware Much lower development effort than hand-optimized code Hardware Implementations © 2020 The Khronos Group 11 OpenVX 1.3 Released October 2019 Functionality Consolidation into Core Deployment Flexibility through Feature Sets Neural Net Extension, NNEF Kernel Import, Conformant Implementations ship one or more complete feature sets Safety Critical etc. Enables market-focused Implementations - Baseline Graph Infrastructure (enables other Feature Sets) Open Source Conformance Test Suite - Default Vision Functions https://github.com/KhronosGroup/OpenVX-cts/tree/openvx_1.3 - Enhanced Vision Functions (introduced in OpenVX 1.2) - Neural Network Inferencing (including tensor objects) OpenCL Interop - NNEF Kernel import (including tensor objects) Custom accelerated Nodes - Binary Images - Safety Critical (reduced features for easier safety certification) https://www.khronos.org/registry/OpenVX/specs/1.3/html/OpenVX_Specification_1_3.html Application OpenVX user-kernels can access command queue and cl_mem objects to asynchronously schedule Fully asynchronous host-device OpenCL kernel execution operations during data exchange Copy or export OpenVX data objects cl_mem buffers into OpenVX data cl_mem buffers objects OpenCL Command Queue Runtime Map or copy OpenVX data objects into cl_mem buffers Runtime OpenVX/OpenCL Interop © 2020 The Khronos Group 12 Open Source OpenVX & Samples Open Source OpenVX Tutorial and Code Samples Fully Conformant Open Source OpenVX 1.3 https://github.com/rgiduthuri/openvx_tutorial for Raspberry Pi https://github.com/KhronosGroup/openvx-samples https://github.com/KhronosGroup/OpenVX-sample-impl/tree/openvx_1.3 Raspberry Pi 3 and 4 Model B with Raspbian OS Memory access optimization via tiling/chaining Highly optimized kernels on multimedia instruction set Automatic parallelization for multicore CPUs and GPUs Automatic merging of common kernel sequences © 2020 The Khronos Group 13 OpenCL is Widely Deployed and Used Desktop Creative Apps Parallel Machine Learning Languages Libraries and Frameworks The industry’s most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming Intel Molecular Modelling Libraries Intel Modo clDNN Xiaomi ForceBalance SYCL-DNN Machine Learning Vision, Imaging Math and Physics Vegas Pro Compilers and Video Libraries Libraries Linear Algebra NNAPI Libraries VeriSilicon SYCL-BLAS Synopsis MetaWare EV TI DL Library (TIDL) CLBlast Arm Compute Library Accelerated Implementations https://en.wikipedia.org/wiki/List_of_OpenCL_applications © 2020 The Khronos Group 14 OpenCL – Low-level Parallel Programing Programming and Runtime Framework OpenCL for Application Acceleration KernelOpenCL Code KernelOpenCL Code Offload compute-intensive kernels onto parallel KernelOpenCL C Host CodeKernel heterogeneous processors Code CPUs, GPUs, DSPs, FPGAs, Tensor Processors CPU OpenCL C or C++ kernel languages Runtime OpenCL API to compile, load and execute Platform Layer API kernels across devices Query, select and initialize compute devices GPU CPU DSP Runtime API CPU
Recommended publications
  • Visual Development Environment for Openvx
    ______________________________________________________PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION Visual Development Environment for OpenVX Alexey Syschikov, Boris Sedov, Konstantin Nedovodeev, Sergey Pakharev Saint Petersburg State University of Aerospace Instrumentation Saint Petersburg, Russia {alexey.syschikov, boris.sedov, konstantin.nedovodeev, sergey.pakharev}@guap.ru Abstract—OpenVX standard has appeared as an answer II. STATE OF THE ART from the computer vision community to the challenge of accelerating vision applications on embedded heterogeneous OpenVX is intended to increase performance and reduce platforms. It is designed as a low-level programming framework power consumption of machine vision applications. It is that enables software developers to leverage the computer vision focused on embedded systems with real-time use cases such as hardware potential with functional and performance portability. face, body and gesture tracking, video surveillance, advanced In this paper, we present the visual environment for OpenVX driver assistance systems (ADAS), object and scene programs development. To the best of our knowledge, this is the reconstruction, augmented reality, visual inspection etc. first time the graphical notation is used for OpenVX programming. Our environment addresses the need to design The using of OpenVX standard functions is a way to ensure OpenVX graphs in a natural visual form with automatic functional portability of the developed software to all hardware generation of a full-fledged program, saving the programmer platforms that support OpenVX. from writing a bunch of a boilerplate code. Using the VIPE visual IDE to develop OpenVX programs also makes it possible to work Since the OpenVX API is based on opaque data types, with our performance analysis tools.
    [Show full text]
  • GLSL 4.50 Spec
    The OpenGL® Shading Language Language Version: 4.50 Document Revision: 7 09-May-2017 Editor: John Kessenich, Google Version 1.1 Authors: John Kessenich, Dave Baldwin, Randi Rost Copyright (c) 2008-2017 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast, or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be reformatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions.
    [Show full text]
  • The Importance of Data
    The landscape of Parallel Programing Models Part 2: The importance of Data Michael Wong and Rod Burns Codeplay Software Ltd. Distiguished Engineer, Vice President of Ecosystem IXPUG 2020 2 © 2020 Codeplay Software Ltd. Distinguished Engineer Michael Wong ● Chair of SYCL Heterogeneous Programming Language ● C++ Directions Group ● ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong ● [email protected][email protected] Ported ● Head of Delegation for C++ Standard for Canada Build LLVM- TensorFlow to based ● Chair of Programming Languages for Standards open compilers for Council of Canada standards accelerators Chair of WG21 SG19 Machine Learning using SYCL Chair of WG21 SG14 Games Dev/Low Latency/Financial Trading/Embedded Implement Releasing open- ● Editor: C++ SG5 Transactional Memory Technical source, open- OpenCL and Specification standards based AI SYCL for acceleration tools: ● Editor: C++ SG1 Concurrency Technical Specification SYCL-BLAS, SYCL-ML, accelerator ● MISRA C++ and AUTOSAR VisionCpp processors ● Chair of Standards Council Canada TC22/SC32 Electrical and electronic components (SOTIF) ● Chair of UL4600 Object Tracking ● http://wongmichael.com/about We build GPU compilers for semiconductor companies ● C++11 book in Chinese: Now working to make AI/ML heterogeneous acceleration safe for https://www.amazon.cn/dp/B00ETOV2OQ autonomous vehicle 3 © 2020 Codeplay Software Ltd. Acknowledgement and Disclaimer Numerous people internal and external to the original C++/Khronos group, in industry and academia, have made contributions, influenced ideas, written part of this presentations, and offered feedbacks to form part of this talk. But I claim all credit for errors, and stupid mistakes. These are mine, all mine! You can’t have them.
    [Show full text]
  • Delivering Heterogeneous Programming in C++
    Delivering Heterogeneous Programming in C++ Duncan McBain, Codeplay Software Ltd. About me ● Graduated from Edinburgh University 3 years ago ● Postgrad course got me interested in GPU programming ● Worked at Codeplay since graduating ● Research projects, benchmarking, debuggers ● Most recently on C++ library for heterogeneous systems 2 © 2016 Codeplay Software Ltd. Contents • What are heterogeneous systems? • How can we program them? • The future of heterogeneous systems 3 © 2016 Codeplay Software Ltd. What are heterogeneous systems • By this, I mean devices like GPUs, DSPs, FPGAs… • Generally a bit of hardware that is more specialised than, and fundamentally different to, the host CPU • Specialisation can make it very fast • Can also be harder to program because of specialisation 4 © 2016 Codeplay Software Ltd. Some definitions • Host – The CPU/code that runs on the CPU, controls main memory (RAM), might control many devices • Device – A GPU, DSP, or something more exotic • Heterogeneous system – A host, a device and an API tying them together 5 © 2016 Codeplay Software Ltd. Some definitions • Kernel – Code representing the computation to be performed on the device. • Work group – A collection of many work items executing on a device. Has shared local memory and executes same instructions 6 © 2016 Codeplay Software Ltd. Some definitions ● Work item – A single thread or task on a device that executes in parallel ● Parallel for – Some collection of work items, in many work groups, executing a kernel in parallel. In general, cannot return anything, and must be enqueued asynchronously 7 © 2016 Codeplay Software Ltd. Example heterogeneous device ● CPUs today can execute instructions out-of-order, speculative execution, branch prediction ● Complexity hidden from programmer ● Contrast with e.g.
    [Show full text]
  • SID Khronos Open Standards for AR May17
    Open Standards for AR Neil Trevett | Khronos President NVIDIA VP Developer Ecosystem [email protected] | @neilt3d LA, May 2017 © Copyright Khronos Group 2017 - Page 1 Khronos Mission Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, Virtual and Augmented Reality, Parallel Computing, Neural Networks and Vision Processing © Copyright Khronos Group 2017 - Page 2 Khronos Standards Ecosystem 3D for the Web Real-time 2D/3D - Real-time apps and games in-browser - Cross-platform gaming and UI - Efficiently delivering runtime 3D assets - VR and AR Displays - CAD and Product Design - Safety-critical displays VR, Vision, Neural Networks Parallel Computation - VR/AR system portability - Tracking and odometry - Machine Learning acceleration - Embedded vision processing - Scene analysis/understanding - High Performance Computing (HPC) - Neural Network inferencing © Copyright Khronos Group 2017 - Page 3 Why AR Needs Standard Acceleration APIs Without API Standards With API Standards Platform Application Fragmentation Portability Everything Silicon runs on CPU Acceleration Standard Acceleration APIs provide PERFORMANCE, POWER AND PORTABILITY © Copyright Khronos Group 2017 - Page 4 AR Processing Flow Download 3D augmentation object and scene data Tracking and Positioning Generate Low Latency Vision Geometric scene 3D Augmentations for sensor(s) reconstruction display by optical system Semantic scene understanding
    [Show full text]
  • Implementing FPGA Design with the Opencl Standard
    Implementing FPGA Design with the OpenCL Standard WP-01173-3.0 White Paper Utilizing the Khronos Group’s OpenCL™ standard on an FPGA may offer significantly higher performance and at much lower power than is available today from hardware architectures such as CPUs, graphics processing units (GPUs), and digital signal processing (DSP) units. In addition, an FPGA-based heterogeneous system (CPU + FPGA) using the OpenCL standard has a significant time-to-market advantage compared to traditional FPGA development using lower level hardware description languages (HDLs) such as Verilog or VHDL. 1 OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Introduction The initial era of programmable technologies contained two different extremes of programmability. As illustrated in Figure 1, one extreme was represented by single core CPU and digital signal processing (DSP) units. These devices were programmable using software consisting of a list of instructions to be executed. These instructions were created in a manner that was conceptually sequential to the programmer, although an advanced processor could reorder instructions to extract instruction-level parallelism from these sequential programs at run time. In contrast, the other extreme of programmable technology was represented by the FPGA. These devices are programmed by creating configurable hardware circuits, which execute completely in parallel. A designer using an FPGA is essentially creating a massively- fine-grained parallel application. For many years, these extremes coexisted with each type of programmability being applied to different application domains. However, recent trends in technology scaling have favored technologies that are both programmable and parallel. Figure 1.
    [Show full text]
  • Programmers' Tool Chain
    Reduce the complexity of programming multicore ++ Offload™ for PlayStation®3 | Offload™ for Cell Broadband Engine™ | Offload™ for Embedded | Custom C and C++ Compilers | Custom Shader Language Compiler www.codeplay.com It’s a risk to underestimate the complexity of programming multicore applications Software developers are now presented with a rapidly-growing range of different multi-core processors. The common feature of many of these processors is that they are difficult and error-prone to program with existing tools, give very unpredictable performance, and that incompatible, complex programming models are used. Codeplay develop compilers and programming tools with one primary goal - to make it easy for programmers to achieve big performance boosts with multi-core processors, but without needing bigger, specially-trained, expensive development teams to get there. Introducing Codeplay Based in Edinburgh, Scotland, Codeplay Software Limited was founded by veteran games developer Andrew Richards in 2002 with funding from Jez San (the founder of Argonaut Games and ARC International). Codeplay introduced their first product, VectorC, a highly optimizing compiler for x86 PC and PlayStation®2, in 2003. In 2004 Codeplay further developed their business by offering services to processor developers to provide them with compilers and programming tools for their new and unique architectures, using VectorC’s highly retargetable compiler technology. Realising the need for new multicore tools Codeplay started the development of the company’s latest product, the Offload™ C++ Multicore Programming Platform. In October 2009 Offload™: Community Edition was released as a free-to-use tool for PlayStation®3 programmers. Experience and Expertise Codeplay have developed compilers and software optimization technology since 1999.
    [Show full text]
  • Standards for Vision Processing and Neural Networks
    Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD [email protected] © Copyright Khronos Group 2017 - Page 1 Agenda • Why we need a standard? • Khronos NNEF • Khronos OpenVX dog Network Architecture Pre-trained Network Model (weights, …) © Copyright Khronos Group 2017 - Page 2 Neural Network End-to-End Workflow Neural Network Third Vision/AI Party Applications Training Frameworks Tools Datasets Trained Vision and Neural Network Network Inferencing Runtime Network Model Architecture Desktop and Cloud Hardware Embedded/Mobile Embedded/MobileEmbedded/Mobile Embedded/Mobile/Desktop/CloudVision/InferencingVision/Inferencing Hardware Hardware cuDNN MIOpen MKL-DNN Vision/InferencingVision/Inferencing Hardware Hardware GPU DSP CPU Custom FPGA © Copyright Khronos Group 2017 - Page 3 Problem: Neural Network Fragmentation Neural Network Training and Inferencing Fragmentation NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator Neural Network Inferencing Fragmentation toll on Applications Inference Engine 1 Hardware 1 Vision/AI Inference Engine 2 Hardware 2 Application Inference Engine 3 Hardware 3 Every Application Needs know about Every Accelerator API © Copyright Khronos Group 2017 - Page 4 Khronos APIs Connect Software to Silicon Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access
    [Show full text]
  • Khronos Template 2015
    Ecosystem Overview Neil Trevett | Khronos President NVIDIA Vice President Developer Ecosystem [email protected] | @neilt3d © Copyright Khronos Group 2016 - Page 1 Khronos Mission Software Silicon Khronos is an Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for graphics, parallel compute and vision © Copyright Khronos Group 2016 - Page 2 http://accelerateyourworld.org/ © Copyright Khronos Group 2016 - Page 3 Vision Pipeline Challenges and Opportunities Growing Camera Diversity Diverse Vision Processors Sensor Proliferation 22 Flexible sensor and camera Use efficient acceleration to Combine vision output control to GENERATE PROCESS with other sensor data an image stream the image stream on device © Copyright Khronos Group 2016 - Page 4 OpenVX – Low Power Vision Acceleration • Higher level abstraction API - Targeted at real-time mobile and embedded platforms • Performance portability across diverse architectures - Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware… • Extends portable vision acceleration to very low power domains - Doesn’t require high-power CPU/GPU Complex - Lower precision requirements than OpenCL - Low-power host can setup and manage frame-rate graph Vision Engine Middleware Application X100 Dedicated Vision Processing Hardware Efficiency Vision DSPs X10 GPU Compute Accelerator Multi-core Accelerator Power Efficiency Power X1 CPU Accelerator Computation Flexibility © Copyright Khronos Group 2016 - Page 5 OpenVX Graphs
    [Show full text]
  • Khronos Native Platform Graphics Interface (EGL Version 1.4 - April 6, 2011)
    Khronos Native Platform Graphics Interface (EGL Version 1.4 - April 6, 2011) Editor: Jon Leech 2 Copyright (c) 2002-2011 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, repub- lished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be re- formatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group web-site should be included whenever possible with specification distributions. Khronos Group makes no, and expressly disclaims any, representations or war- ranties, express or implied, regarding this specification, including, without limita- tion, any implied warranties of merchantability or fitness for a particular purpose or non-infringement of any intellectual property.
    [Show full text]
  • The Openvx™ Specification
    The OpenVX™ Specification Version 1.0.1 Document Revision: r31169 Generated on Wed May 13 2015 08:41:43 Khronos Vision Working Group Editor: Susheel Gautam Editor: Erik Rainey Copyright ©2014 The Khronos Group Inc. i Copyright ©2014 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specifica- tion for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be re-formatted AS LONG AS the contents of the specifi- cation are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group web-site should be included whenever possible with specification distributions.
    [Show full text]
  • SYCL – Introduction and Hands-On
    SYCL – Introduction and Hands-on Thomas Applencourt + people on next slide - [email protected] Argonne Leadership Computing Facility Argonne National Laboratory 9700 S. Cass Ave Argonne, IL 60349 Book Keeping Get the example and the presentation: 1 git clone https://github.com/alcf-perfengr/sycltrain 2 cd sycltrain/presentation/2020_07_30_ATPESC People who are here to help: • Collen Bertoni - [email protected] • Brian Homerding - [email protected] • Nevin ”:)” Liber [email protected] • Ben Odom - [email protected] • Dan Petre - [email protected]) 1/17 Table of contents 1. Introduction 2. Theory 3. Hands-on 4. Conclusion 2/17 Introduction What programming model to use to target GPU? • OpenMP (pragma based) • Cuda (proprietary) • Hip (low level) • OpenCL (low level) • Kokkos, RAJA, OCCA (high level, abstraction layer, academic projects) 3/17 What is SYCL™?1 1. Target C++ programmers (template, lambda) 1.1 No language extension 1.2 No pragmas 1.3 No attribute 2. Borrow lot of concept from battle tested OpenCL (platform, device, work-group, range) 3. Single Source (two compilations passes) 4. High level data-transfer 5. SYCL is a Specification developed by the Khronos Group (OpenCL, SPIR, Vulkan, OpenGL) 1SYCL Doesn’t mean ”Someone You Couldn’t Love”. Sadly. 4/17 SYCL Implementations 2 2Credit: Khronos groups (https://www.khronos.org/sycl/) 5/17 Goal of this presentation 1. Give you a feel of SYCL 2. Go through code examples (and make you do some homework) 3. Teach you enough so that can search for the rest if you interested 4. Question are welcomed! 3 3Please just talk, or use slack 6/17 Theory A picture is worth a thousand words4 4and this is a UML diagram so maybe more! 7/17 Memory management: SYCL innovation 1.
    [Show full text]