A Task-Based Implementation for Geantv

A task-based implementation for GeantV Joel Fuentes Andrei Gheata 1 Introduction - GeantV is a project that aims at developing a high performance detector simulation system integrating fast and full simulation. - This work corresponds to an implementation of a task approach for GeantV using Intel Threading Building Blocks (TBB). - Intel TBB is a well-known library that provides different tools to manage tasks, concurrent data structures and parallel algorithms such as parallel for, pipeline, etc. - Previous implementation with TBB helped as a starting point. 2 Intel Threading Building Blocks (TBB) • Each thread has its own ready pool, which is a lists of tasks. • A task goes into each pool when it is allocated. • Each thread steals tasks from other pools when necessary. 3 TBB Task Scheduler Problem Solution Oversubscription One TBB thread per hardware thread Fair scheduling Non-preemptive unfair scheduling High overhead Programmer specifies tasks, not threads Load imbalance Work-stealing balances load Scalability Specify tasks and how to create them, rather than threads 4 task model of GeantV Transport task may be further split into subtasks Feeder task spawn Reads from file a number Transport task of events. Invokes the Transports one basket for reuse tracks concurrent basketizer one step keeping locality service inject input particle dequeue basket enqueue spawn Scoring Basketizer(s) basket Basket queue concurrent service This is a user task concurrent service injects full baskets reading track info event finished? and creating ”hits” command inspect dump all your baskets I/O task Garbage collector Write data (hits, digits, kinematics) on Forces partially filled spawn Flow control task disk baskets into the basket event finished? queue event finished? queue to boost queue empty? empty? spawn concurrency Digitizer task This is a user task working on “hits” data 5 Implementation Tasks implemented: - InitialTask - FlowControllerTask - FeederTask - TransportTask • Tasks are described using C++ classes that contain the class tbb:task as the base class • Task operations are implemented in virtual method execute() • New parallel tasks are launched using the spawn(task *t) • Once a task is scheduled for execution by the runtime TBB library, the execute() method of the task is called in a non-preemptive manner, completing the execution of the task. Additional classes implemented: - ThreadData - TaskMgrTBB All the new classes that represent the task-based approach were packed on a shared library called Geant_tbb. 6 How to run in TBB mode? - Install TBB. See https://www.threadingbuildingblocks.org for details. - Build the GeantV project for TBB mode by setting the parameters: $ -DUSE_TBB=ON -DTBBROOT=/your/path/to/TBB - Execute runApp in TBB mode by setting the flag: $ ./runApp -i 1 7 Experimental Results Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 Static Threads mode runApp -e 10 -u 4 -i 1 -t 16 === Transported: 4870 primaries/4035173 tracks, total steps: 16358977, snext calls: 16358977, phys steps: 16208410, mag. field steps: 0, small steps: 939 bdr. crossings: 150567 RT=2.03824s, CP=31.08s TBB mode runApp -e 10 -u 4 -t 16 === Transported: 4870 primaries/4028541 tracks, total steps: 16331333, snext calls: 16331333, phys steps: 16179636, mag. field steps: 0, small steps: 916 bdr. crossings: 151697 RT=2.64854s, CP=54.26s nthreads=16 speed-up=20.486797 efficiency=1.280425 8 Experimental Results Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 Static Threads mode runApp -e 10 -u 4 -t 16 === Transported: 4870 primaries/4035173 tracks, total steps: 16358977, snext calls: 16358977, phys steps: 16208410, mag. field steps: 0, small steps: 939 bdr. crossings: 150567 RT=2.03824s, CP=31.08s TBB mode runApp -e 10 -u 4 -i 1 -t 16 === Transported: 4870 primaries/4028541 tracks, total steps: 16331333, snext calls: 16331333, phys steps: 16179636, mag. field steps: 0, small steps: 916 bdr. crossings: 151697 RT=2.64854s, CP=54.26s nthreads=16 speed-up=20.486797 efficiency=1.280425 9 Conclusions - A first implementation of a task-based approach for GeantV using TBB was deployed. - The new approach provides more flexibility and connectivity to other task based multi- threaded frameworks. - Some overhead for the TBB mode compares to the static thread approach has still to be fully understood and addressed. - possible reason: task initialization - There are still more tasks to implement: Scoring Task, I/O Task, Digitizer Task, and so on. - Directory with the Task-based implementation:https://gitlab.cern.ch/GeantV/geant/tree/master/vecprot_v2_tbb - Report about this work submitted to GSoC16:http://www.face.ubiobio.cl/~jfuentes/blog/geantv 10 Thank you 11.

Load more