SYCL Building Blocks for C++ libraries Maria Rovatsou, SYCL specification editor Principal Software Engineer, Codeplay Software
© Copyright Khronos Group 2014 SYCL Building Blocks for C++ libraries
Overview SYCL standard What is the SYCL toolkit for? C++ libraries using SYCL: Under construction SYCL Building blocks asynchronous execution on multiple SYCL devices command groups for embedding computational kernels in a library buffers and accessors for data storage and access vector classes shared between host and device pipes hierarchical command group execution printing on device using cl::sycl::stream
© Copyright Khronos Group 2014 SYCL standard
© Copyright Khronos Group 2014 Khronos royalty-free open standard that works with OpenCLTM
SYCLTM for OpenCLTM 1.2 is a published specification that can be found at: https://www.khronos.org/registry/sycl/specs/sycl-1.2.pdf SYCLTM for OpenCLTM 2.2 is a published provisional specification that the group would like feedback on: https://www.khronos.org/registry/sycl/specs/sycl-2.2.pdf
© Copyright Khronos Group 2014 What is the SYCL toolkit for?
© Copyright Khronos Group 2014 What is the SYCL toolkit for?
Performance Programmability Heterogeneous Power efficiency system Longevity of software written for integrating host custom architectures Best h/w architecture multi-core CPU per application with GPUs and Use existing C++ frameworks on accelerators new targets, e.g. from HPC to Harness processing power in mobile and embedded and vice order to create much more versa powerful applications
© Copyright Khronos Group 2014 C++ libraries using SYCL building blocks: Under construction
© Copyright Khronos Group 2014 C++ libraries using SYCL building blocks: VisionCpp
1 auto q = get_queue
© Copyright Khronos Group 2014 C++ libraries using SYCL building blocks: Eigen Tensor Library
1 // use gpu 2 cl::sycl::gpu_selector s; 3 cl::sycl::queue q(s); 4 Eigen::SyclDevice sycl_device(q); 5 // create a tensor range Eigen is a C++ 6 Eigen::array
© Copyright Khronos Group 2014 C++ libraries using SYCL building blocks: Parallel STL
Parallel STL is the next version of the C++ standard library which is starting to add support for parallel algorithms.
1 vector
© Copyright Khronos Group 2014 SYCL building blocks
© Copyright Khronos Group 2014 SYCL Building Blocks: asynchronous execution on multiple SYCL devices
buffers device_selector accessors
command groups for atomically parallel_for functions for encapsulating kernels over an scheduling computational kernels index space of work items on queues
asynchronous error handling single-source offline multiple host/device compiler solutions event mechanism
C++ lambdas or functors compiled for SYCL asynchronous queue devices.
© Copyright Khronos Group 2014 SYCL Building Blocks: command groups for embedding computational kernels in a library
The data movement is command group lambda or functor managed by buffers and accessors.
1 auto myCommandGroup = ([&](cl::sycl::execution_handler& cgh) { 2 auto acc = buffer.get_access
On submit the kernel and its access requirements are 1 queue myQueue; scheduled for execution on an asynchronous queue. 2 auto myEvent = myQueue.submit(myCommandGroup);
© Copyright Khronos Group 2014 SYCL Building Blocks: buffers and accessors for data storage and access
Data is managed 1 buffer
The buffer on destruction waits for all usage of its data by the SYCL The buffer class takes runtime to be completed. ownership of the data.
© Copyright Khronos Group 2014 SYCL Building Blocks: accessors for data storage and access
Accessor class The SYCL runtime is effectively constructing an asynchronous runtime gives access to a graph that schedules all the command groups and their memory buffer in a transfers. command group
1 auto cg = [&](handler &ch) { 2 auto posAcc = positionsBuf.get_access
© Copyright Khronos Group 2014 SYCL Building Blocks: accessors for data storage and access
1 cl::sycl::svm_allocator
1 q.submit([&]( 2 cl::sycl::execution_handler
1 int result = *svmFineBuffer; Direct Usage of SVM allocated fine grained buffer pointer on the host.
© Copyright Khronos Group 2014 SYCL Building Blocks: SIMD vector classes available on host and device
1 for (int i = 0; i < 64; i++) { The vec
1 myQueue.submit([&](handler& cgh) { 2 auto ptrA = bufA.get_access
© Copyright Khronos Group 2014 SYCL Building Blocks: pipes
1 cl::sycl::static_pipe
The static_pipe is a pipe with constexpr capacity that is defined only for one target device. This is the type of pipe which can be useful for compile-time graphs and pipelines.
© Copyright Khronos Group 2014 SYCL Building Blocks: hierarchical command group kernels
parallel_for_work_group() : in this scope the code is executing once per work group and the memory is allocated in the local address space. In a parallel_for_work_item 1 auto command_group = [&](handler & cgh) { scope the code is 2 // Issue 8 work-groups of 8 work-items each 3 cgh.parallel_for_work_group
© Copyright Khronos Group 2014 SYCL Building Blocks: basic stream class
The stream object is provided to give basic support for printing from the device for debugging or notification reasons.
1 myQueue.submit([&](handler& cgh) { 2 /* We create a stream object to print from the device. */ 3 stream os(1024, 80, cgh); 4 cgh.single_task
© Copyright Khronos Group 2014 SYCL Building Blocks for C++ libraries Questions?
SYCL Building blocks asynchronous execution on multiple SYCL devices command groups for embedding computational kernels in a library buffers and accessors for data storage and access vector classes shared between host and device pipes hierarchical command group execution printing on device using cl::sycl::stream
© Copyright Khronos Group 2014 • SYCL spec and forums: www.khronos.org/opencl/sycl
• New website coming! sycl.tech
© Copyright Khronos Group 2016