
Introduction to OpenCL Ezio Bartocci Vienna University of Technology Overview • Overview of OpenCL for NVIDIA GPUs • API and Languages • Sample codes walkthrough • OpenCL Information and Resources OpenCL – Open Computing Language • OpenCL is an Open, royalty-free C-language extension • It is a framework designed for parallel programming of heterogeneous systems using GPUs, CPUs, FPGA, DSP’s and other processors including embedded mobile devices • It was initially introduced by Apple, now is supported by NVIDIA, Intel, AMD, IBM….(that are in the OpenCL working group) • Managed by Khronos Group OpenCL versions and history (1) OpenCL 1.0 (2008) • OpenCL 1.0 has been released with Mac OS X Snow Leopard OpenCL 1.1 (2010) • The Khronos Group adds significant functionality for enhanced parallel programming flexibility, functionality, and performance including: • New data types including 3-component vectors and additional image formats; • Handling commands from multiple host threads and processing buffers across multiple devices; • Operations on regions of a buffer including read, write and copy of 1D, 2D, or 3D rectangular regions; • • Enhanced use of events to drive and control command execution; • Additional OpenCL built-in C functions such as integer clamp, shuffle, and asynchronous strided copies; • Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events. OpenCL versions and history (2) OpenCL 1.2 (2011) • Most notable features include: • Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks. • Separate compilation and linking of objects: the functionality to compile OpenCL into external libraries for inclusion into other programs. • Enhanced image support: 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images. • Built-in kernels: custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include video encoding/decoding and digital signal processors. • DirectX functionality: DX9 media surface sharing allows for efficient sharing between OpenCL and DX9 or DXVA media surfaces. Equally, for DX11, seamless sharing between OpenCL and DX11 surfaces is enabled. NVIDIA OpenCL Support Operative Systems • Windows (XP, VISTA, 8) 32/64 bits • Linux (Ubuntu, RHEL, etc.) 32/64 bits • Mac OSX Snow Leopard IDE’s supported • GCC for Linux • Visual Studio for Windows Drivers and JIT Compiler • They usually are provided with GPU drivers (i.e. CUDA drivers…) NVIDIA SDK • It contains examples of applications, the specification, the programming manual and the best practices guide. OpenCL Language & API Platform Layer API (called from the host) • It is an abstraction layer for diverse computational resources • Query, select and initialize compute devices • Create compute contexts and work-queues Runtime API (called from the host) • Launch compute kernels • Set kernel execution configuration • Manage scheduling, compute, and memory resources OpenCL Language • Write compute kernels that run on a compute device • C-based cross-platform programming interface • Subset of ISO C99 with language extensions • Include rich set of built-in functions • Can be compiled Just In Time(JIT) or offline OpenCL Programming Model OpenCL Programming Model NDRange – N-Dimensional Range N can be 1, 2 or 3. it defines the global index space for each kernel instance. OpenCL Programming Model Work-item • A single kernel instance in the index space. • Each Work-item execute the same compute • Kernel but on different data • Work-items have unique global IDs from the Index space • It can be related to the concept of Thread in CUDA OpenCL Programming Model Work-group • Work-items are further grouped into Work Groups • Work-group have a unique Work-group ID • Work items have a unique local ID within a Work-Group • It can be related to the concept of Block of Threads in CUDA OpenCL Memory Model Private Memory Work Group Work Group Read/Write access Private Private Private Private For Work-item only Memory Memory Memory Memory …….. Work-Item 1 Work-Item M Work-Item 1 Work-Item M Local Memory Read/Write access Compute Unit 1 Compute Unit N For enre Work Group Local Memory Local Memory Constant Memory Read access Global/Constant Memory/ Data Cache For enWre ND-range Compute Device (e.g. GPU) All work-items, all work-groups Global Memory Global Memory Read/write access For enWre ND-range Compute Device Memory All work-items, all work-groups Basic Program Structure Host program • Create memory objects associated to contexts • Compile and create kernel program objects • Issue commands to command-queue • Synchronization of commands PLATFORM LAYER • Clean up OpenCL resources • Query compute devices RUNTIME • Create contexts Compute Kernel (runs on device) OpenCL Language • C code with some restrictions and extensions Basic Program Structure Buffer objects • 1D collection of objects (like C arrays) • Scalar & Vector types, and user-defined Structures • They are accessed via pointers in the compute kernel Image objects • 2D or 3D texture, frame-buffer, or images • Must be addressed through built-in functions Sampler objects • Describe how to sample an image in the kernel • Addressing modes • Filtering modes OpenCL Language Highlights Function qualifiers • “__kernel” qualifier declares a function as a kernel Address space qualifiers • “__global, __local, __constant, __private” Work-item functions • get_work_dim() • get_global_id(), get_local_id(), get_group_id(), get_local_size() Image functions • Image must be accessed through built-in functions • Reads/writes performed through sampler objects from host or defined in source Synchronization functions • Barriers – All work-items within a work-group must execute the barrier function before any work-item in the work-group can continue .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages15 Page
-
File Size-