CS 698L: Intel Threading Building Blocks Swarnendu Biswas

CS 698L: Intel Threading Building Blocks Swarnendu Biswas

CS 698L: Intel Threading Building Blocks Swarnendu Biswas Semester 2019-2020-I CSE, IIT Kanpur Content influenced by many excellent references, see References slide for acknowledgements. Approaches to Parallelism • New languages • For example, Cilk, X10, Chapel • New concepts, but difficult to get widespread acceptance • Language extensions/pragmas • For example, OpenMP • Easy to extend, but requires special compiler or preprocessor support • Library • For example, C++ STL, Intel TBB, and MPI • Works with existing environments, usually no new compiler is needed CS 698L Swarnendu Biswas What is Intel TBB? • A library to help leverage multicore performance using standard C++ • Does not require programmers to be an expert • Writing a correct and scalable parallel loop is not straightforward • Does not require support for new languages and compilers • Does not directly support vectorization • TBB was first available in 2006 • Current release is 2019 Update 8 • Open source and licensed versions available CS 698L Swarnendu Biswas What is Intel TBB? • TBB works at the abstraction of tasks instead of low-level threads • Specify tasks that can run concurrently instead of threads • Specify work (i.e., tasks), instead of focusing on workers (i.e., threads) • Raw threads are like assembly language of parallel programming • Maps tasks onto physical threads, efficiently using cache and balancing load • Full support for nested parallelism CS 698L Swarnendu Biswas Advantages with Intel TBB • Promotes scalable data-parallel programming • Data parallelism is more scalable than functional parallelism • Functional blocks are usually limited while data parallelism scales with more processors • Not tailored for I/O-bound or real-time processing • Compatible with other threading packages and is portable • Can be used in concert with native threads and OpenMP • Relies on generic programming (e.g., C++ STL) CS 698L Swarnendu Biswas Key Features of Intel TBB Generic Parallel algorithms parallel_for, parallel_for_each, Concurrent containers parallel_reduce, parallel_scan, concurrent_hash_map parallel_do, pipeline, parallel_pipeline, concurrent_unordered_map parallel_sort, parallel_invoke concurrent_queue concurrent_bounded_queue Task scheduler concurrent_vector task_group, structured_task_group, task, task_scheduler_init Synchronization primitives Utilities atomic operations, condition_variable tick_count various flavors of mutexes tbb_thread Memory allocators tbb_allocator, cache_aligned_allocator, scalable_allocator, zero_allocator CS 698L Swarnendu Biswas Task-Based Programming • Challenges with threads: oversubscription or undersubscription, scheduling policy, load imbalance, portability • For example, mapping of logical to physical threads is crucial • Mapping also depends on whether computation waits on external devices • Non-trivial impact of time slicing with context switches, cache cooling effects, and lock preemption • Time slicing allows more logical threads than physical threads CS 698L Swarnendu Biswas Task-Based Programming with Intel TBB • Intel TBB parallel algorithms map tasks onto threads automatically • Task scheduler manages the thread pool • Oversubscription and undersubscription of core resources is prevented by task-stealing technique of TBB scheduler • Tasks are lighter-weight than threads CS 698L Swarnendu Biswas An Example: Hello World #include <iostream> int main() { #include <tbb/tbb.h> task_group tg; using namespace std; tg.run(HelloWorld("1")); using namespace tbb; tg.run(HelloWorld("2")); class HelloWorld { const char* id; tg.wait(); public: return EXIT_SUCCESS; HelloWorld(const char* s) : id(s) {} } void operator()() const { cout << "Hello from task " << id << "\n"; } }; CS 698L Swarnendu Biswas An Example: Hello World #include <iostream> int main() { #include <tbb/tbb.h> task_group tg; using namespace std; tg.run(HelloWorld("1")); using namespace tbb; tg.run(HelloWorld("2")); class HelloWorld { const char* id; tg.wait(); public: return EXIT_SUCCESS; HelloWorld(const char* s) : id(s) {} } void operator()() const { cout << "Hello from task " << id << "\n"; } }; CS 698L Swarnendu Biswas Another Example: Parallel loop #include <chrono> void parallel_incr(float* a) { #include <iostream> tbb::parallel_for(static_cast<size_t>(0), #include <tbb/parallel_for.h> static_cast<size_t>(N), #include <tbb/tbb.h> [&](size_t i) { a[i] += 10; using namespace std; }); using namespace std::chrono; using HRTimer = high_resolution_clock::time_point; } #define N (1 << 26) void seq_incr(float* a) { for (int i = 0; i < N; i++) { a[i] += 10; } } CS 698L Swarnendu Biswas Another Example: Parallel loop int main() { start = high_resolution_clock::now(); float* a = new float[N]; parallel_incr(a); for (int i = 0; i < N; i++) { end = high_resolution_clock::now(); a[i] = static_cast<float>(i); duration = duration_cast<microseconds> (end - start).count(); } cout << "Intel TBB Parallel increment in " << duration << " us\n"; HRTimer start = high_resolution_clock: :now(); return EXIT_SUCCESS; seq_incr(a); } HRTimer end = high_resolution_clock::n ow(); auto duration = duration_cast<microsec onds>(end - start).count(); cout << "Sequential increment in " << duration << " us\n"; CS 698L Swarnendu Biswas Another Example: Parallel loop int main() { start = high_resolution_clock::now(); float* a = new float[N]; parallel_incr(a); for (int i = 0; i < N; i++) { end = high_resolution_clock::now(); a[i] = static_cast<float>(i); duration = duration_cast<microseconds> (end - start).count(); } cout << "Intel TBB Parallel increment in " << duration << " us\n"; HRTimer start = high_resolution_clock: :now(); return EXIT_SUCCESS; seq_incr(a); } HRTimer end = high_resolution_clock::n ow(); auto duration = duration_cast<microsec onds>(end - start).count(); cout << "Sequential increment in " << duration << " us\n"; CS 698L Swarnendu Biswas Initializing the TBB Library #include <tbb/task_scheduler_init.h> • Control when the task scheduler is constructed using namespace tbb; and destroyed. • Specify the number of int main( ) { threads used by the task task_scheduler_init init; scheduler. ... • Specify the stack size for return 0; worker threads } Not required in recent versions, >= TBB 2.2 CS 698L Swarnendu Biswas Thinking Parallel • Decomposition • Decompose the problem into concurrent tasks • Scaling • Identify concurrent tasks to keep processors busy • Threads • Map tasks to threads • Correctness • Ensure correct synchronization to shared resources CS 698L Swarnendu Biswas How to Decompose? Data parallelism Task parallelism CS 698L Swarnendu Biswas How to Decompose? • Distinguishing just between data and task parallelism may not be perfect • Imagine TAs grading questions of varied difficulty • Might need hybrid parallelism or pipelining or work stealing CS 698L Swarnendu Biswas OpenMP vs Intel TBB OpenMP Intel TBB • Language extension consisting of • Library for task-based pragmas, routines, and programming environment variables • Supports C, C++, and Fortran • Supports C++ with generics • User can control scheduling • Automated divide-and-conquer policies approach to scheduling, with work stealing • OpenMP limited to specified • Generic programming is flexible types (for e.g., reduction) with types CS 698L Swarnendu Biswas Generic Parallel Algorithms CS 698L Swarnendu Biswas Generic Programming • Best known example is C++ Standard Template Library (STL) • Enables distribution of useful high-quality algorithms and data structures • Write best possible algorithm with fewest constraints (for e.g., std::sort) • Instantiate algorithm to specific situation • C++ template instantiation, partial specialization, and inlining make resulting code efficient • STL is not generally thread-safe CS 698L Swarnendu Biswas Generic Programming Example • The compiler creates the needed versions T must define a copy constructor and a destructor template <typename T> T max (T x, T y) { if (x < y) return y; return x; } T must define operator < int main() { int i = max(20,5); double f = max(2.5, 5.2); MyClass m = max(MyClass(“foo”), MyClass(“bar”)); return 0; } CS 698L Swarnendu Biswas Intel Threading Building Blocks Patterns • High-level parallel and scalable patterns • parallel_for: load-balanced parallel execution of independent loop iterations • parallel_reduce: load-balanced parallel execution of independent loop iterations that perform reduction • parallel_scan: template function that computes parallel prefix • parallel_while: load-balanced parallel execution of independent loop iterations with unknown or dynamically changing bounds • pipeline: data-flow pipeline pattern • parallel_sort: parallel sort CS 698L Swarnendu Biswas Loop Parallelization • parallel_for and parallel_reduce • Load-balanced, parallel execution of a fixed number of independent loop iterations • parallel_scan • A template function that computes a prefix computation (also known as a scan) in parallel • y[i] = y[i-1] op x[i] CS 698L Swarnendu Biswas TBB parallel_for void SerialApplyFoo(float a[], size_t n) { for (size_t i=0; i<n; ++i) foo(a[i]); } CS 698L Swarnendu Biswas Class Definition for TBB parallel_for #include “tbb/blocked_range.h” class ApplyFoo { float *const m_a; Task public: void operator()(const blocked_range<size_t>& r) const { Body object float *a = m_a; for (size_t i=r.begin(); i!=r.end( ); ++i) foo(a[i]); } ApplyFoo(float a[]) : m_a(a) {} }; CS 698L Swarnendu Biswas TBB parallel_for #include “tbb/parallel_for.h” void ParallelApplyFoo(float a[], size_t n) { parallel_for(blocked_range<size_t>(0,n,grainSize),

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    96 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us