A task-based implementation for GeantV

Joel Fuentes Andrei Gheata

1 Introduction

- GeantV is a project that aims at developing a high performance detector simulation system integrating fast and full simulation. - This work corresponds to an implementation of a task approach for GeantV using Intel Threading Building Blocks (TBB).

- Intel TBB is a well-known that provides different tools to manage tasks, concurrent data structures and parallel algorithms such as parallel for, pipeline, etc. - Previous implementation with TBB helped as a starting point.

2 Intel Threading Building Blocks (TBB)

• Each has its own ready pool, which is a lists of tasks. • A task goes into each pool when it is allocated. • Each thread steals tasks from other pools when necessary.

3 TBB Task Scheduler

Problem Solution Oversubscription One TBB thread per hardware thread Fair scheduling Non-preemptive unfair scheduling High overhead Programmer specifies tasks, not threads Load imbalance Work-stealing balances load Specify tasks and how to create them, rather than threads

4 task model of GeantV Transport task may be further split into subtasks

Feeder task spawn Reads from file a number Transport task of events. Invokes the Transports one basket for reuse tracks concurrent basketizer one step keeping locality service inject input particle dequeue basket enqueue spawn Scoring Basketizer(s) basket Basket queue concurrent service This is a user task concurrent service injects full baskets reading track info event finished? and creating ”hits”

command inspect dump all your baskets I/O task Garbage collector Write data (hits, digits, kinematics) on Forces partially filled spawn Flow control task disk baskets into the basket event finished? queue event finished? queue to boost queue empty? empty? spawn concurrency Digitizer task This is a user task working on “hits” data

5 Implementation

Tasks implemented: - InitialTask - FlowControllerTask - FeederTask - TransportTask

• Tasks are described using C++ classes that contain the class tbb:task as the base class • Task operations are implemented in virtual method execute() • New parallel tasks are launched using the spawn(task *t) • Once a task is scheduled for execution by the runtime TBB library, the execute() method of the task is called in a non-preemptive manner, completing the execution of the task.

Additional classes implemented: - ThreadData - TaskMgrTBB All the new classes that represent the task-based approach

were packed on a shared library called Geant_tbb. 6 How to run in TBB mode?

- Install TBB. See https://www.threadingbuildingblocks.org for details.

- Build the GeantV project for TBB mode by setting the parameters: $ -DUSE_TBB=ON -DTBBROOT=/your/path/to/TBB

- Execute runApp in TBB mode by setting the flag: $ ./runApp -i 1

7 Experimental Results

Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31

Static Threads mode runApp -e 10 -u 4 -i 1 -t 16 === Transported: 4870 primaries/4035173 tracks, total steps: 16358977, snext calls: 16358977, phys steps: 16208410, mag. field steps: 0, small steps: 939 bdr. crossings: 150567 RT=2.03824s, CP=31.08s TBB mode runApp -e 10 -u 4 -t 16 === Transported: 4870 primaries/4028541 tracks, total steps: 16331333, snext calls: 16331333, phys steps: 16179636, mag. field steps: 0, small steps: 916 bdr. crossings: 151697 RT=2.64854s, CP=54.26s nthreads=16 speed-up=20.486797 efficiency=1.280425

8 Experimental Results

Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31

Static Threads mode runApp -e 10 -u 4 -t 16 === Transported: 4870 primaries/4035173 tracks, total steps: 16358977, snext calls: 16358977, phys steps: 16208410, mag. field steps: 0, small steps: 939 bdr. crossings: 150567 RT=2.03824s, CP=31.08s TBB mode runApp -e 10 -u 4 -i 1 -t 16 === Transported: 4870 primaries/4028541 tracks, total steps: 16331333, snext calls: 16331333, phys steps: 16179636, mag. field steps: 0, small steps: 916 bdr. crossings: 151697 RT=2.64854s, CP=54.26s nthreads=16 speed-up=20.486797 efficiency=1.280425

9 Conclusions - A first implementation of a task-based approach for GeantV using TBB was deployed. - The new approach provides more flexibility and connectivity to other task based multi- threaded frameworks. - Some overhead for the TBB mode compares to the static thread approach has still to be fully understood and addressed. - possible reason: task initialization - There are still more tasks to implement: Scoring Task, I/O Task, Digitizer Task, and so on.

- Directory with the Task-based implementation:https://gitlab.cern.ch/GeantV/geant/tree/master/vecprot_v2_tbb - Report about this work submitted to GSoC16:http://www.face.ubiobio.cl/~jfuentes/blog/geantv

10 Thank you

11