Introduction to Opencl Adaptation and Diffusion Strictly Forbidden Without Acsys Written Agreement Written Acsys Without Forbidden Strictly Diffusion and Adaptation
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to OpenCL Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation RTS 2013 April 11th 2013 training This lecture is brought to you by Bernard Dautrevaux Chief Technical Officer – Ac6 Alumni of École Normale Supérieure (ENS Cachan) Master of Mathematics Master of Computer Science University of Paris, Orsay Should you have any question later on feel free to contact me at Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation mailto:[email protected] You could also find more information on our web site: http://www.ac6.fr or http://www.ac6-training.com training Page 1 Agenda Why OpenCL? The Goals of OpenCL Overview Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation The OpenCL Language OpenCL Implementations training Page 2 Why OpenCL? Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation training Processor Parallelism CPUs GPUs Multiple cores Increasingly general provides increased Emerging purpose data-parallel performance Intersection computing Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation Multiprocessor Heteregeneous Graphics APIs and programming Computing Shading Languages (for example OpenMP) OpenCL is a programming framework for heterogeneous computing resources training Page 4 Original graphics design from The Khronos Group The Origins of OpenCL AMD Ericsson Merged, needed Nokia commonality across products IBM ATI Sony EA Freescale GPU vendor, wants TI Nvidia to steal market share from CPU vendors Wrote a rough draft, STM straw man API ... Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation CPU vendor, wants Intel to steal market share Khronos Compute from GPU vendors group formed Was tired of recoding for many Dec 2008 Apple core and GPUs; pushed vendors to standardize training Page 5 The OpenCL Working Group OpenCL means Open Computing Language Diverse industry participation Processor vendors, system OEMs, middleware vendors, application developers Many industry-leading experts involved in OpenCL’s design A healthy diversity of industry perspectives Apple made initial proposal in 2008 Apple is very active in the working group Serving as specification editor The OpenCL standard is edited by The Khronos Group Founded in January 2000 by a number of leading media-centric companies 3Dlabs, ATI, Discreet, Evans & Sutherland, Intel, NVIDIA, SGI, Sun Microsystems... Now more than 100 members, including STMicroelectronics Dedicated to creating open standard APIs to enable authoring and playback of rich media Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation The Khronos Group edit several standards: OpenCL OpenGL, OpenGL SC (Safety Critical), OpenGL ES (Embedded Systems): 3D graphics OpenVG: 2D vectored graphics OpenMAX AL, IL, DL: Streaming Media recording and playback, processing, codec OpenKode, OpenWF, WebGL, Collada, EGL... No one company “owns” the OpenCL standard, not even Apple This ensures you can depend on it without fear training Page 6 The OpenCL Timeline Six months from proposal to released OpenCL 1.0 specification Due to a strong initial proposal and a shared commercial incentive Multiple conformant implementations shipping Apple’s Mac OS X ships with OpenCL since version Snow Leopard 18 month between OpenCL 1.0 (December 2008) and OpenCL 1.1 (June 2010) Backwards compatibility protect software investment 18 months again before OpenCL 1.2 (released 15th November 2011) Adds new features, like device partitioning, separate compilation, better OpenGL integration… OpenCL is only a specification It has to be implemented by “someone” Apple, Intel, AMD, nVidia has implementations conforming to OpenCL-1.1 specification Apple, AMD and Intel (beta) support OpenCL-1.2, but nVidia is still at 1.1 Multiple conformant Khronos publicly releases implementations OpenCL 1.0 as royalty-free ship across diverse Release of OpenCL 1.2 Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation specification OS and platforms Jun 08 May 09 Jun 10 Dec 08 Fall 09 Nov 11 Apple proposes OpenCL Khronos releases OpenCL OpenCL 1.1 Specification working group and 1.0 conformance tests to released and first contributes draft ensure high-quality implementations ship specification to Khronos implementations training Page 7 Original graphics design from The Khronos Group OpenCL and the OpenGL Ecosystem Roadmap Convergence OpenGL 4.0 and OpenGL ES 2.0 Desktop Visual Computing are both streamlined, programmable pipelines. GL and ES working groups OpenGL and OpenCL have direct are working on convergence. WebGL interoperability. OpenCL objects can be created from OpenGL Textures, Buffer is a positive pressure for portable 3D content on all platforms Objects and Renderbuffers Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation Mobile Visual Computing Compute, graphics and AV APIs interoperate through EGL training Page 8 Original graphics design from The Khronos Group OpenCL: From Cell Phone to Super Computer OpenCL Embedded Profile for Mobile and Embedded silicon Relaxes some data types and precision requirements Avoids the need for a separate « ES » specification Khronos API provide computing support for imaging & graphics Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation Enabling advanced applications For example in Augmented Reality A camera phone with GPS processes images to OpenCL will enable parallel recognize buildings and landmarks and provides computing in new markets relevant data from internet training Page 9 Goals of OpenCL Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation training OpenCL Goals Provide a simple computing model For high performance parallel computing Based on the ISO C99 language Some restrictions Some extensions Thread management framework Application an thread synchronization Easy to use Needs to be lightweight and efficient Powerful Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation Allow use of all computing resources Portable and predictable IEEE-754 compliant rounding behavior Minimum accuracy for math functions Provide guidelines for new hardware requirements training Page 11 Uses of OpenCL Where OpenCL can be used: Image, Video and audio processing Simulations and scientific calculations Medical imaging Financial models All data-parallel algorithms that are computationally intensive There is many types of parallel computing They can be categorized by granularity From coarse to fine granularity: Grid Computing Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation MPI/OpenMPI OpenMP/pthreads SIMD OpenCL goal is to cover fine grained parallelism (threads, SIMD...) OpenCL was designed to work seamlessly with OpenGL When OpenCL works on a GPU, it can directly share data buffers with OpenGL training Page 12 Task- vs Data-Parallel Computing Coarse-grain distribution in single system Task1 Task1 Task1 Task Parallelism 0 3 -7 5 -4 -1 3 -9 Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation Same (set of) operation(s) on multiple data independently: abs() Data-Parallelism 0 3 7 5 4 1 3 9 OpenCL covers both task-parallelism and data-parallelism training Page 13 Data-parallelism The Box Filter Create a new image Compute the average at any “box” Place it at the middle pixel on the new image Operations can be done totally independently Results are stored at another location Computation is quite intensive Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation training Page 14 Why putting the emphasis on GPUs? GPUs are crazy fast floating point number crunchers Core 2 Duo They do almost only that But do it insanely fast... • 150 W • 4 Cores GPUs are designed for highly • 8.5 GiB/s scalable parallelism • 45 Gflop/s They are simple ALUs You can multiply them GPU performance is increasing NVIDIA GTX 285 faster than CPU performance Adaptation and diffusion strictly forbidden without Acsys written agreement written Acsys without forbidden strictly diffusion and Adaptation It’s simple to add more small cores • 204 W GPU bandwidth is much larger • 240 Cores • 159 GiB/s than CPU bandwidth • ~1 Tflop/s Working in-card Memory to ALU training Page 15 Why GPU can be so fast? The GPU is specialized for Compute-intensive, highly parallel, simple computations +, -, *, / in fixed or single precision FP no error