Harnessing the Power of Hete

Harnessing the Power of Hete

Harnessing the Power of Heterogeneous Computing using Boost.Compute + OpenCL Armin Sobhani [email protected] SHARCNET University of Ontario Institute of Technology (UOIT) August 15, 2018 Outline • A Quick introduction to Boost.Compute and OpenCL • Tutorial for developing Boost.Compute applications on SHARCNet clusters: https://git.sharcnet.ca/asobhani/bc-tutorial Heterogeneous Computing using Boost.Compute + OpenCL 2 Standard Template Library (STL) • Software library for the C++ Containers • Influenced many parts of the C++ Standard Iterators Library Algorithms • Consisting of 4 components: Functions Heterogeneous Computing using Boost.Compute + OpenCL 3 Parallel STL Why? Available Implementations C++17 Parallel Algorithms • Intel’s open source Parallel STL Familiar to most C++ programmers • KhronosGroup’s SYCL Parallel STL Third-Party C++ Libraries Simplifies porting existing applications to parallel • Boost.Compute architectures • Nvidia’s Thrust • AMD’s Bolt Heterogeneous Computing using Boost.Compute + OpenCL 4 Boost.Compute header-only template library for parallel computing based on OpenCL available in Boost starting with version 1.61 Heterogeneous Computing using Boost.Compute + OpenCL 5 OpenCL Open Computing Language Open standard for parallel programming of heterogenous systems Latest version is 2.2 Views a computing system as consisting of a number of compute devices Functions executed on an OpenCL device are called kernels Works by compiling C99 code at run-time to generate kernel objects Heterogeneous Computing using Boost.Compute + OpenCL 6 Boost.Compute vs. Thrust vs. Bolt Nvidia’s Thrust AMD’s Bolt Doesn’t work Only works with with AMD GPUs AMD CPUs/GPUs Heterogeneous Computing using Boost.Compute + OpenCL 7 Boost.Compute 101 Boost.Compute Multi-Core CPUs Intel AMD GPUs STL for Accelerators AMD Xeon Phi Intel Parallel Epiphany Nvidia Devices FPGAs Altera Xilinx Heterogeneous Computing using Boost.Compute + OpenCL 9 Library Architecture Boost.Compute STL-like API containers Lambda Iterators RNGs Expressions algorithms functions Low-Level API OpenCL Compute Devices (Multi-core CPU, GPU, Accelerators, FPGA) Heterogeneous Computing using Boost.Compute + OpenCL 10 Low-Level API OpenCL C++ Wrapper OpenCL objects Utility buffer Ta k e s c a r e o f context reference counting Functions command_queue error checking e.g. setting up etc. default device Heterogeneous Computing using Boost.Compute + OpenCL 11 Low-Level API for (auto const& device : boost::compute::system::devices()) std::cout << "device = " << device.name() << std::endl; • Heterogeneous Computing using Boost.Compute + OpenCL 12 Low-Level API Example #include <boost/compute/core.hpp> The default device is selected based on a set of heuristics and can be influenced using one of the // lookup default compute device following environment variables: auto device = boost::compute::system::default_device(); • BOOST_COMPUTE_DEFAULT_DEVICE – // create OpenCL context for the device name of the compute device (e.g. "AMD Radeon") auto ctx = boost::compute::context(device); // get default command queue • BOOST_COMPUTE_DEFAULT_DEVICE_TYPE – type of the auto queue = boost::compute::command_queue(ctx, device); compute device (e.g. "GPU" or "CPU") // print device name • BOOST_COMPUTE_DEFAULT_PLATFORM – name of the std:cout << "device = " << device.name() << std::endl; platform (e.g. "NVIDIA CUDA") • BOOST_COMPUTE_DEFAULT_VENDOR – name of the device vendor (e.g. "Intel") Heterogeneous Computing using Boost.Compute + OpenCL 13 High-Level API – Parallel STL array<T, N> buffer_iterator<T> bernoulli_distribution basic_string<CharT> constant_buffer_iterator<T> default_random_engine dynamic_bitset<> constant_iterator<T> discrete_distribution linear_congruential_engine flat_map<Key, T> counting_iterator<T> RNGs mersenne_twister_engine flat_set<T> discard_iterator normal_distribution mapped_view<T> Iterators function_input_iterator<Function> uniform_int_distribution stack<T> permutation_iterator<Elem, Index> Containers uniform_real_distribution string strided_iterator<Iterator> valarray<T> transform_iterator<Iterator, Function> vector<T> zip_iterator<IteratorTuple> Heterogeneous Computing using Boost.Compute + OpenCL 14 High-Level API – Parallel STL accumulate() for_each_n() none_of() set_difference() adjacent_difference() gather() nth_element() set_intersection() adjacent_find() generate() partial_sum() set_symmetric_ all_of() generate_n() partition() difference() any_of() includes() partition_copy() set_union() binary_search() inclusive_scan() partition_point() sort() copy() inner_product() prev_permutation() sort_by_key() copy_if() inplace_merge() random_shuffle() stable_partition() copy_n() iota() reduce() stable_sort() count() is_partitioned() reduce_by_key() stable_sort_by_key() count_if() is_permutation() remove() swap_ranges() equal() is_sorted() remove_if() transform() equal_range() lower_bound() replace() transform_reduce() exclusive_scan() lexicographical_ replace_copy() unique() fill() compare() reverse() unique_copy() fill_n() max_element() reverse_copy() upper_bound() find() merge() rotate() find_end() min_element() rotate_copy() Algorithms find_if() minmax_element() scatter() find_if_not() mismatch() search() for_each() next_permutation() search_n() Heterogeneous Computing using Boost.Compute + OpenCL 15 Host Sort Example STL Boost.Compute #include <vector> #include <boost/compute/algorithm.hpp> #include <algorithm> //#include <boost/compute/algorithm/sort.hpp> #include <vector> std::vector<int> v{…}; std::vector<int> v{…}; // fill the vector with some data // fill the vector with some data std::sort(v.begin(), v.end()); boost::compute::sort(v.begin(), v.end(), queue); Heterogeneous Computing using Boost.Compute + OpenCL 16 Host Sort Example Heterogeneous Computing using Boost.Compute + OpenCL 17 Algorithm Internals generate produce run on the OpenCL compile at OpenCL kernels in kernel compute run-time device C99 objects C++ OpenCL Heterogeneous Computing using Boost.Compute + OpenCL 18 Custom Functions BOOST_COMPUTE_FUNCTION(int, add_three, (int x), • BOOST_COMPUTE_FUNCTION() { return x + 3; macro produces Boost.Compute }); function object boost::compute::vector<int> vector = { ... }; boost::compute::transform( • Body of the function is checked vector.begin(), during OpenCL compilation vector.end(), vector.begin(), add_three, queue ); Heterogeneous Computing using Boost.Compute + OpenCL 19 Lambda Expressions • Not the same as C++11 lambdas • Easy way for specifying custom function for algorithms • Fully type-checked by the C++ compiler using boost::compute::_1; boost::compute::vector<int> vec = { ... }; boost::compute::transform(vec.begin(), vec.end(), vec.begin(), _1 + 5, queue); Heterogeneous Computing using Boost.Compute + OpenCL 20 Boost.Compute Tu t o r i a l Available on SHARCNet GitLab https://git.sharcnet.ca/asobhani/bc-tutorial Heterogeneous Computing using Boost.Compute + OpenCL 22.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    22 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us