Threading for Performance with Intel® Threading Building Blocks

Threading for Performance with Intel® Threading Building Blocks

Session: Threading for Performance with Intel® Threading Building Blocks INTEL CONFIDENTIAL Objjec tives At theee en dod of thi ssessos session ,you, you will b eae abl e t o: • List the different components of the Intel Threading Building Blocks library • Code parallel applications using TBB SOFTWARE AND SERVICES 2 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. AdAggenda Intel®®adgudgobagoud Threading Building Blocks background Generic Parallel Algorithms pp_arallel_for parallel_reduce paralle l_sort example Task Scheduler GiGeneric HihlHighly Concurren t CtiContainers Scalable Memory Allocation Low-level Synchronization Primitives SOFTWARE AND SERVICES 3 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. Intel®®gg() Threading Building Blocks (TBB) -Key Features You sp ecify tasks ((aauouwhat can run concurrently) y)ado instead of thread s • Library maps your logical tasks onto physical threads, efficiently using cache and balancing load • Full support for nested paralle lism Targets threading for scalable performance • Portable across Linux*, Mac OS*, Windows*, and Solaris* Emphasizes scalable data parallel programming • Loop parallelism tasks are more scalable than a fixed number of separate tasks Compatible with other thre ading pack age s • Can be used in concert with native threads and OpenMP Open source and lilidcensed versions availilblable SOFTWARE AND SERVICES 4 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. Limit ati ons Intel® TBB is not intended for • I/O bound processing • Real-time processing SOFTWARE AND SERVICES 5 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. CtComponents off TBB (i2)(version 2.1) Parallel algorithms paralle l_ for (improve d) parallel_reduce (improved) Concurrent containers para lle l_do ((enew) concurrent_hash _map pipeline (improved) concurrent_queue parallel_ sort concurrent_vector paralle l_scan (all improved) Taskhdlk scheduler With new functionality Synchronization primitives Utilities atomic operations tick_count various flavors of mutexes (improved) tbb_ thread (new) Memory alllltocators tbb_allocator (()new),,_g_, cache_aligned_allocator, scalable_ allocator SOFTWARE AND SERVICES 6 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. TkTask-base d Progggrammi ng with In te l® ® TBB Taaagsks are light-weiggauht entities at user-level • Intel® TBB parallel algorithms map tasks onto threads automatically • Tas k sched ul er manages the thread pool – Scheduler is unfair to favor tasks that have been most recent in the cache • Oversubscription and undersubscription of core resources is prevented by task-stealing technique of TBB scheduler SOFTWARE AND SERVICES 7 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. GiGeneric ParallelPlll AlgorithmsAlith INTEL CONFIDENTIAL GiGeneric Progggrammi ng Best knooapwn example is C++ STL Enables distribution of broadly-useful high-quality algorithms and data stttructures Write best possible algorithm with fewest constraints • Do not force particular data structure on user • Class ic examp le: STL st d::sort Instantiate algorithm to specific situation • C++ template instantiation, partial specialization, and inlining make resulting code efficient Standard Template Library, overall, is not thread-safe SOFTWARE AND SERVICES 9 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. GiGeneric Progggrammi ng - ElExamplp e The coopmpiler crea tes the need dded versio ns T must define a copy constructor and a destructor temppyplate <typename T> T max ((,y){T x, T y) { if (x < y) return y; return x; } T must define operator< int main() { inti=int i = max(20,5); dblfdouble f = max(2.5 , 5 .2) ; MClMyClass m = max(My Class (“foo ”), My Class (“bar ”)); return 0; } SOFTWARE AND SERVICES 10 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. In tel® ®gg Threa ding Bu ilding Bloc ks Pa tterns Task scheduler powers high level parallel patterns that are pre- packaged, tested, and tuned for scalability • parallel_for: load-balanced parallel execution of loop iterations where iterations are independent • parallel_reduce : load-balanced parallel execution of independent loop iterations that perform reduction (e.g . summation of array elements) • parallel_ while: load-balanced parallel execution of independent loop itera tions with unknown or dynami call y changi ng bound s (e.g. apply ing functi on to the element of link ed list) • pp_arallel_scan: temppppplate function that computes parallel prefix • pipeline: data-flow pipeline pattern • parallel_sort : parallel sort SOFTWARE AND SERVICES 11 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. GiGrain Size OpenMP haaapaas similar parameter Part of parallel_ for, not underlying task scheduler • Grain size exists to amortize overhead, not balance load • Units of granularity are loop iterations Typically only need to get it right within an order of magnitude SOFTWARE AND SERVICES 12 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. TiTuning GGirain Size too fine too coarse schdliheduling overhea d didomina tes lose poten tia l paralle lism • Tune byygg examining single-ppprocessor performance • When in doubt, err on the side of making it a little too large, so that performance is not hurt when only one core is available. SOFTWARE AND SERVICES 13 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. The paralle llf_ for TlTemplp a te template <typename Range, typename Body> void parallel_ for(const Range& range, const Body &body); Requires definition of: • A range type to iterat e over – Must define a copy constructor and a destructor – Defines is_empty () – Defines is_divisible () – Defines a splitting constructor, R(R &r, split) •A body type t hat ope rates o n t he ra nge (o r a subra nge ) – Must define a copy constructor and a destructor – DfiDefines operat()tor() SOFTWARE AND SERVICES 14 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. BdBodyy is Generi c Requirements fo r para llel_ foro Body Body::Body(const Body&) Copy constructor Bodyyy()::~Body() Destructor void Body::operator() (Range& subrange) const Apply the body to subrange. paralle l_ for partitions orig ina l range in to subranges, and dea ls out subranggyes to worker threads in a way that: • Balances load • Uses cache efficiently •Scales SOFTWARE AND SERVICES 15 Copyright © 2014, Intel Corporation. All rights reserved. In te l and the Int el logo are trad emark s or regist ered trad emark s of Int el Corporati on or its subsidi ari es in the Unit ed Stat es or other countries. * Other brands and names are the property of their respective owners. Range is GiGeneric Requirements fo r para llel_ foro Raagnge R::R (const R&) Copy constructor R::~R() Destructor bool R::is_empty() const True if range is empty bool R::is_divisible() const True if range can be partitioned R::R (R& r, split) Splitting constructor; splits r into two subranges Library provides predefined ranges • blocked_range and blocked_range2d You can define your own ranges SOFTWARE AND SERVICES 16 Copyright © 2014, Intel Corporation.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    72 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us