Harnessing the Power of Heterogeneous Computing using Boost.Compute + OpenCL
Armin Sobhani [email protected] SHARCNET University of Ontario Institute of Technology (UOIT) August 15, 2018 Outline
• A Quick introduction to Boost.Compute and OpenCL
• Tutorial for developing Boost.Compute applications on SHARCNet clusters:
https://git.sharcnet.ca/asobhani/bc-tutorial
Heterogeneous Computing using Boost.Compute + OpenCL 2 Standard Template Library (STL)
• Software library for the C++ Containers • Influenced many parts of the C++ Standard Iterators Library Algorithms • Consisting of 4 components: Functions
Heterogeneous Computing using Boost.Compute + OpenCL 3 Parallel STL
Why? Available Implementations
C++17 Parallel Algorithms
• Intel’s open source Parallel STL Familiar to most C++ programmers • KhronosGroup’s SYCL Parallel STL
Third-Party C++ Libraries Simplifies porting existing applications to parallel • Boost.Compute architectures • Nvidia’s Thrust • AMD’s Bolt
Heterogeneous Computing using Boost.Compute + OpenCL 4 Boost.Compute
header-only template library
based on OpenCL
available in Boost starting with version 1.61
Heterogeneous Computing using Boost.Compute + OpenCL 5 OpenCL
Open Computing Language
Open standard for parallel programming of heterogenous systems
Latest version is 2.2
Views a computing system as consisting of a number of compute devices
Functions executed on an OpenCL device are called kernels
Works by compiling C99 code at run-time to generate kernel objects
Heterogeneous Computing using Boost.Compute + OpenCL 6 Boost.Compute vs. Thrust vs. Bolt
Nvidia’s Thrust AMD’s Bolt
Doesn’t work Only works with with AMD GPUs AMD CPUs/GPUs
Heterogeneous Computing using Boost.Compute + OpenCL 7 Boost.Compute 101 Boost.Compute
Multi-Core CPUs Intel AMD
GPUs STL for Accelerators AMD Xeon Phi Intel Parallel Epiphany Nvidia Devices
FPGAs Altera Xilinx
Heterogeneous Computing using Boost.Compute + OpenCL 9 Library Architecture
Boost.Compute
STL-like API containers Lambda Iterators RNGs Expressions algorithms functions
Low-Level API
OpenCL
Compute Devices (Multi-core CPU, GPU, Accelerators, FPGA)
Heterogeneous Computing using Boost.Compute + OpenCL 10 Low-Level API
OpenCL C++ Wrapper
OpenCL objects Utility buffer Ta k e s c a r e o f context reference counting Functions command_queue error checking e.g. setting up etc. default device
Heterogeneous Computing using Boost.Compute + OpenCL 11 Low-Level API
for (auto const& device : boost::compute::system::devices()) std::cout << "device = " << device.name() << std::endl; •
Heterogeneous Computing using Boost.Compute + OpenCL 12 Low-Level API Example
#include
// get default command queue • BOOST_COMPUTE_DEFAULT_DEVICE_TYPE – type of the auto queue = boost::compute::command_queue(ctx, device); compute device (e.g. "GPU" or "CPU")
// print device name • BOOST_COMPUTE_DEFAULT_PLATFORM – name of the std:cout << "device = " << device.name() << std::endl; platform (e.g. "NVIDIA CUDA")
• BOOST_COMPUTE_DEFAULT_VENDOR – name of the device vendor (e.g. "Intel")
Heterogeneous Computing using Boost.Compute + OpenCL 13 High-Level API – Parallel STL
array
flat_map
Containers uniform_real_distribution string strided_iterator
Heterogeneous Computing using Boost.Compute + OpenCL 14 High-Level API – Parallel STL
accumulate() for_each_n() none_of() set_difference() adjacent_difference() gather() nth_element() set_intersection() adjacent_find() generate() partial_sum() set_symmetric_ all_of() generate_n() partition() difference() any_of() includes() partition_copy() set_union() binary_search() inclusive_scan() partition_point() sort() copy() inner_product() prev_permutation() sort_by_key() copy_if() inplace_merge() random_shuffle() stable_partition() copy_n() iota() reduce() stable_sort() count() is_partitioned() reduce_by_key() stable_sort_by_key() count_if() is_permutation() remove() swap_ranges() equal() is_sorted() remove_if() transform() equal_range() lower_bound() replace() transform_reduce() exclusive_scan() lexicographical_ replace_copy() unique() fill() compare() reverse() unique_copy() fill_n() max_element() reverse_copy() upper_bound() find() merge() rotate() find_end() min_element() rotate_copy() Algorithms find_if() minmax_element() scatter() find_if_not() mismatch() search() for_each() next_permutation() search_n()
Heterogeneous Computing using Boost.Compute + OpenCL 15 Host Sort Example
STL Boost.Compute
#include
// fill the vector with some data // fill the vector with some data std::sort(v.begin(), v.end()); boost::compute::sort(v.begin(), v.end(), queue);
Heterogeneous Computing using Boost.Compute + OpenCL 16 Host Sort Example
Heterogeneous Computing using Boost.Compute + OpenCL 17 Algorithm Internals
generate produce run on the OpenCL compile at OpenCL kernels in kernel compute run-time device C99 objects
C++ OpenCL
Heterogeneous Computing using Boost.Compute + OpenCL 18 Custom Functions
BOOST_COMPUTE_FUNCTION(int, add_three, (int x), • BOOST_COMPUTE_FUNCTION() { return x + 3; macro produces Boost.Compute }); function object boost::compute::vector
Heterogeneous Computing using Boost.Compute + OpenCL 19 Lambda Expressions
• Not the same as C++11 lambdas • Easy way for specifying custom function for algorithms • Fully type-checked by the C++ compiler
using boost::compute::_1;
boost::compute::vector
Heterogeneous Computing using Boost.Compute + OpenCL 20 Boost.Compute Tu t o r i a l Available on SHARCNet GitLab
https://git.sharcnet.ca/asobhani/bc-tutorial
Heterogeneous Computing using Boost.Compute + OpenCL 22