Multithreading
Makoto Asai (SLAC PPA/SCA) November 12th, 2014 Geant4 tutorial @ ANS Winter Meeting 2014
Contents • Basics of mul threading • Event-level parallelism • How to install/configure MT mode • Race condi on • Mutex and thread local storage • How to migrate user’s code to MT • Intel Xeon Phi
Mul threading - M. Asai (SLAC) 2 The challenges of many-core era
• Increase frequency of CPU causes increase of power needs • Reached plateau around 2005 • No more increase in CPU frequency • However number of transistors/$ you can buy continues to grow • Multi/May-core era • Note: quantity memory you can buy with same $ scales slower • Expect: • Many core (double/2yrs?) • Single core performance will not increase as we were used to • Less memory/core • New software models need to take these into account: increase parallelism
CPU Clock Frequecy 1and usage: The Future of Computing Performance: Game Over or Next Level? DRAM cost: Data from 1971-2000: VLSI Research Inc. Data from 2001-2002: ITRS, 2002 Update, Table 7a, Cost-Near-Term Years, p. 172. Data from 2003-2018: ITRS, 2004 Update, Tables 7a and 7b, Cost-Near-Term Years, pp. 20-21. CPU cost: Data from 1976-1999: E. R. Berndt, E. R. Dulberger, and N. J. Rappaport, "Price and Quality of Desktop and Mobile Personal Computers: A Quarter Century of History," July 17, 2000, ;Data from 2001-2016: ITRS, 2002 Update, On-Chip Local Clock in Table 4c: Performance and Package Chips: Frequency On-Chip Wiring Levels -- Near-Term Years, p. 167. ; Average transistor price: Intel and Dataquest reports (December 2002), see Gordon E. Moore, "Our Revolution,” Mul threading - M. Asai (SLAC) 3 In Brief •Modern CPU architectures: need to introduce parallelism •Memory and its access will limit number of concurrent processes running on single chip •Solution: add parallelism in the application code
•Geant4 needs back-compatibility with user code and simple approach (physicists != computer scientists) •Events are independent: each event can be simulated separately •Multi-threading for event level parallelism is the natural choice
Mul threading - M. Asai (SLAC) 4 Geant4 Mul Threading capabili es
Mul threading - M. Asai (SLAC) 5 What is a thread?
Mul threading - M. Asai (SLAC) 6 What is a thread?
Mul threading - M. Asai (SLAC) 7 What is a thread?
Mul threading - M. Asai (SLAC) 8 What is a thread?
Mul threading - M. Asai (SLAC) 9 General Design
Mul threading - M. Asai (SLAC) 10 Simplified Master / Worker Model
• A G4 (with MT) application can be seen as simple finite state machine
Mul threading - M. Asai (SLAC) 11 Simplified Master / Worker Model
• A G4 (with MT) application can be seen as simple finite state machine • Threads do not exists before first /run/beamOn • When master starts the first run spawns threads and distribute work
Master Worker
Mul threading - M. Asai (SLAC) 12 Shared ? Personalized ? • In the mul -threaded mode, generally saying, data that are stable during the event loop are shared among threads while data that are transient during the event loop are thread-local. • Shared by all threads • Thread-local : stable during the event loop : dynamically changing for – Geometry every event/track/step – Par cle defini on – All transient objects such – Cross-sec on tables as run, event, track, step, trajectory, hit, etc. – User-ini aliza on classes – Physics processes – Sensi ve detectors – User-ac on classes
Mul threading - M. Asai (SLAC) 13 Detector geometry & Transient per event data cross-sec on tables MEMORY SPACE (tracks, hits, etc.)
AVAILABLE CORES Without MT
Ac ve cores Unused cores MEMORY SPACE
AVAILABLE CORES With MT
Ac ve cores
Mul threading - M. Asai (SLAC) 14 Shared ? Thread-local? • In general, geometry and physics tables are shared, while event, track, step, trajectory, hits, etc., as well as several Geant4 manager classes such as EevntManager, TrackingManager, SteppingManager, Transporta onManager, FieldManager, Navigator, Sensi veDetectorManager, etc. are thread-local. • Among the user classes, user ini aliza on classes (G4VUserDetectorConstruc on, G4VUserPhysicsList and newly introduced G4VUserAc onIni aliza on) are shared, while all user ac on classes and sensi ve detector classes are thread-local. – It is not straigh orward (and thus not recommended) to access from a shared class object to a thread-local object, e.g. from detector construc on to stepping ac on. – Please note that thread-local objects are instan ated and ini alized at the first BeamOn. • To avoid poten al errors, it is advised to always keep in mind which class is shared and which class is thread-local.
Mul threading - M. Asai (SLAC) 15 Sequen al mode
main()
G4RunManager G4Run
G4EventManager G4Event
G4TrackingManager G4Track
G4SteppingManager G4Step
Mul threading - M. Asai (SLAC) 16 Sequen al mode
main()
G4RunManager UserRunAc on
G4EventManager UserPrimaryGeneratorAc on
UserEventAc on
UserStackingAc on
G4TrackingManager UserTrackingAc on
G4SteppingManager UserSteppingAc on
Mul threading - M. Asai (SLAC) 17 Mul -threaded mode
main() Master thread
G4MTRunManager G4Run
G4WorkerRunManager G4WorkerRunManager G4Run G4WorkerRunManager G4Run G4Run
G4EventManager G4EventManager G4Event G4EventManager G4Event G4Event
G4TrackingManager G4TrackingManager G4Track G4TrackingManager G4Track G4Track
G4SteppingManager G4SteppingManager G4Step G4SteppingManager G4Step G4Step
Worker thread #0 Worker thread #1 Worker thread #2 Mul threading - M. Asai (SLAC) 18 Mul -threaded mode
main() Master thread
G4MTRunManager UserRunAc on
UserRun UserRun UserRun G4WorkerRunManager G4WorkerRunManager G4WorkerRunManager Ac on Ac on Ac on
UserPrimary UserPrimary UserPrimary GeneratorAc on GeneratorAc on GeneratorAc on G4Event G4Event G4Event Manager UserEventAc onManager UserEventAc onManager UserEventAc on
UserStackingAc on UserStackingAc on UserStackingAc on
G4TrackingManager UserTrackingAG4TrackingManager G4TrackingManager UserTrackingA UserTrackingA c on c on c on
G4SteppingManager UserSteppingG4SteppingManager G4SteppingManager UserStepping UserStepping Ac on Ac on Ac on Worker thread #0 Worker thread #1 Worker thread #2 Mul threading - M. Asai (SLAC) 19 Split class – case of par cle defini on • In Geant4, each par cle type has its own dedicated object of G4Par cleDefini on class. – Sta c quan es : mass, charge, life me, decay channels, etc., • To be shared by all threads. – Dedicated object of G4ProcessManager : list of physics processes this par cular kind of par cle undertakes. • Physics process object must be thread-local.