<<

Parallel Programming &

Murat Keçeli

1 Why do we need it now?

http://www.gotw.ca/publications/concurrency-ddj.htm 2 Why do we need it now?

Intel® ™ Coprocessor 7120X (16GB, 1.238 GHz, 61 core)

http://herbsutter.com/2012/11/30/256-cores-by-2013/ 3 Flyyn’s Taxonomy (1966)

Computer architectures

http://users.cis.fiu.edu/~prabakar/cda4101/Common/notes/lecture03.html 4 Multiple instruction multiple data

l • All processors are connected to a "globally available" memory. • Your laptop, smartphone, a single node in a cluster. • Easier to implement, but not scalable. l • Each processor has its own individual memory location. • Single processors at different nodes. • Data is shared through messages. Harder to implement. l Hybrid (clusters, ) l Distributed shared memory (Distributed Global Address Space )

5 Grid and

l Scalable solutions for loosely coupled jobs. l Cloud is the evolved version of grid computing. (in terms of efficiency, QoS, reliability) l Crowd-sourcing: SETI@HOME, FOLDIT@HOME, l The clean energy project. 2.3 million organic compounds screened by volunteers to discover the next generation of solar materials. (World Community Grid, IBM) l We can write proposals for thermochemistry calculations for aromatic hydrocarbons.

6 Goals of parallel programming

l Linear : problem of a given size is solved N times faster on N processors • You can reduce time/cost

Serial execution time t1 SN Speedup= SN = = N 0 < EN = ≤ 1 Parallel execution time tN N l : problem that is N times bigger is solved in the same amount of time on N processors • You can attack larger problems

7 Amdahl’s law

0 ≤ p ≤ 1: parallel portion 1 S = N p (1− p) + N

http://en.wikipedia.org/wiki/Amdahl's_law 8 Parallelization Tools

l Auto-parallelization l Libraries ( Threading Building Blocks, Intel MKL, ) l , Unified Parallel , Coarray l Functional programming languages (Lisp, F#) l OpenMP (Open Multi-Processing, shared memory) l MPI ( Interface, distributed memory), l Java is designed for level parallelism, java.util.concurrent l Python (https://wiki.python.org/moin/ParallelProcessing) • Global interpreter lock: The mechanism to assure that only one thread executes Python bytecode at a time

9 How to do parallel programming

l Start with the chunk that takes most amount of time. l Decide the parallelization scheme based on available hardware and . l Divide the chunk into subtasks such that: • Minimum dependency (minimizes communication) • Each has its own data (data independence) • Each process do not need others’ functions to finish (functional independence) • Equal distribution (minimizes latency) • Workload is equally distributed

10 SCOOP

l Scalable COncurrent Operations in Python: is a distributed task module allowing concurrent parallel programming on various environments, from heterogeneous grids to . • The future is parallel; • Simple is beautiful; • Parallelism should be simpler.

http://code.google.com/p/scoop/ 11 Hello World

l Results of a are always ordered even if their computation was made asynchronously on multiple computers.

http://code.google.com/p/scoop/ 12 RMG & Thermochemistry l Thermochemical parameters (enthalpy, entropy, heat capacity) are important for reaction equilibrium constants, kinetic parameter estimates, and thermal effects. l Affects both the mechanism generation process and the behavior of the final resulting model. l Estimate based on the group additivity approach of Benson. • This method is fast and can be improved by adding more parameterization. • Harder to parallelize: Hierarchical search, database sharing • Currently fails for aromatic species and subject to fail for any species outside of its parametrization scope. • As the applications of RMG starts to vary, this module needs to be updated for ad hoc corrections.

13 QMTP (Greg Magoon) l Quantum mechanics thermodynamic property (QMTP) module is designed for on-the-fly quantum and force field calculations to calculate thermochemical parameters. • Must be linked to third party programs. • Error checking is required. • Slow. Speed depends on the method of calculation and the software chosen. • Calculations are uncoupled. () Much easier to parallelize. • Both speed and reliability improvement comes from outside.

14 QMTP Design

!

Greg Magoon’s thesis 2012 15 1,3-Hexadiene without QM

~:0: (0.00%) 1.03% Serial: 3 minutes 1 (0.07%) 10753

1.44% 15.45% 0.97% 18 1 10753 main:498:saveEverything main:224:initialize adjlist:51:fromAdjacencyList 82.43% 1.44% 15.45% 1.07% 16 (0.00%) (0.01%) (0.88%) 18 1 11038

1.15% 6.00% 8.74% 18 2 1

model:595:enlarge main:652:saveOutputHTML main:203:loadDatabase 88.44% 1.15% 0.20% 8.74% (0.06%) (0.00%) 1 (0.03%) 18 18 1

1.30% 16.94% 2.19% 48.76% 1.15% 2.25% 6.21% 44 503 29 18 18 1 33 model:758:processNewReactions model:80:generateThermoData pdep:267:exploreIsomer model:1417:updateUnimolecularReactionNetworks output:52:saveOutputHTML rmg:64:load family:881:fillKineticsRulesByAveragingUp 1.30% 18.27% 16.96% 2.19% 48.76% 1.35% 2.25% 6.21% (0.03%) 119 (0.00%) (0.00%) (0.02%) (0.01%) (0.00%) (0.00%) 44 507 29 18 19 1 33

1.23% 15.47% 1.45% 2.18% 48.74% 1.62% 6.21% 2856 497 507 29 3556 1 33 model:477:makeNewReaction thermo:596:getThermoData model:577:react model:158:processThermoData pdep:450:update rmg:107:loadKinetics rules:416:fillRulesByAveragingUp 1.23% 15.52% 20.46% 1.45% 48.74% 1.62% 6.21% 6.20% (0.04%) (0.00%) (0.00%) (0.01%) (0.13%) (0.00%) (3.30%) 1920 2856 500 148 507 3556 1 471651

1.13% 15.20% 20.46% 1.31% 35.70% 11.99% 1.62% 1.56% 10188 498 213 499 28 70 1 471651

~:0: 35.70% 11.99% 1.62% 1.63% (0.03%) (0.02%) (0.01%) (0.00%) 1.31% (0.44%) (0.00%) (0.00%) (0.55%) 10192 701 498 213 (0.07%) 28 70 1 491725 499

8.95%13.53% 14.87% 20.46% 1.18% 12.29% 22.96% 11.99% 1.62% 1.01% 701 701 790 213 499 1120 1120 70 1 491725 thermo:830:estimateThermoViaGroupAdditivity __init__:296:generateReactionsFromFamilies optimize:1131:fminbound network:740:applyModifiedStrongCollisionMethod network:263:setConditions statmech:630:getStatmechData __init__:107:loadFamilies rules:393:getAllRules 3.98% 14.87% 0.51% 20.46% 1.21% 12.29% 22.96% 11.99% 1.62% 1.01% 1402 (0.11%) 1078 (0.01%) (0.52%) (0.03%) (0.03%) (0.00%) (0.00%) (0.86%) 1491 213 507 1120 1120 70 1 493918

2.70% 7.05% 20.45% 12.23% 18.23% 4.01% 11.99% 1.57% 790 31258 7029 1120 1120 224 70 33

~:0: network:718:calculateCollisionModel network:522:calculateMicrocanonicalRates statmech:669:getStatmechDataFromGroups family:493:load 'rmgpy.molecule.molecule.Molecule' 7.56% 20.45% 12.23% 18.23% 4.01% 11.99% 1.57% objects> (0.20%) (0.06%) (4.78%) (1.10%) (0.74%) (0.00%) (0.00%) 6.68% 32336 7029 1120 1120 224 70 33 (6.68%) 2192

20.39% 6.58% 0.87% 16.67% 3.09% 11.99% 1.54% 13365 305965 218960 2320 4888 70 99

~:0: objects> (0.32%) (1.33%) (0.16%) 2240 (0.01%) (0.68%) 16.67% 3.09% 14895 309590 253008 70 110 (16.68%) (1.87%) 2320 4888

3.05% 1.51% 0.91% 8.67% 8.68% 0.82% 1.15% 1.39% 0.35% 11.81% 4971 12608 365785 1530 1529 25053 619180 309590 309590 69

~:0: statmechfit:80:fitStatmechToHeatCapacity 'rmgpy.molecule.molecule.Molecule' 7.04% 3.09% 13.30% objects> 8.68% 0.82% 1.19% 1.42% 0.84% 11.81% objects> 32336 (0.00%) 28239 1.16% (0.01%) (0.05%) (0.44%) (0.75%) (0.84%) (0.01%) 1.51% 5027 (1.16%) 1530 25053 632323 314349 372520 69 (1.51%) 582525 12608

3.09% 0.66% 4.60% 7.12% 5027 632323 21 48 groups:101:getReactionTemplate family:1026:__generateProductStructures numeric:180:asarray statmechfit:248:fitStatmechPseudo statmechfit:141:fitStatmechDirect 3.09% 13.30% 0.84% 4.60% 7.12% (0.07%) (0.17%) (0.35%) (0.00%) (0.00%) 5027 28239 725478 21 48

2.93% 5.93% 6.79% 4.59% 7.12% 10374 106570 28222 21 48

~:0: 10.11% 6.28% 5.93% 6.81% 11.70% (0.56%) 24015 (0.35%) (0.24%) (1.99%) 123302 106570 28402 69

9.48% 5.58% 1.17% 3.75% 4.02% 5.70% 854881 184177 28222 54249 4200 4705

~:0: objects> 4.02% 5.70% (3.20%) (1.78%) 1.17% 3.78% (1.01%) (0.76%) 891134 184177 (1.17%) (3.78%) 4200 4705 28222 54970

1.06% 1.11% 0.95% 0.35% 0.45% 0.45% 3.77% 0.81% 2.07% 0.98% 0.52% 0.85% 0.39% 0.45% 0.51% 0.93% 0.60% 1.21% 1.43% 35488 11421 2778700 879320 655051 1636450 655051 1636450 1560917 163800 163800 141141 54600 141141 54600 130949 54600 130949 130949

~:0: 'rmgpy.molecule.group.Group' 'rmgpy.molecule.molecule.Molecule' statmechfit:321:harmonicOscillator_d_heatCapacity_d_freq statmechfit:311:harmonicOscillator_heatCapacity statmechfit:345:hinderedRotor_d_heatCapacity_d_freq statmechfit:357:hinderedRotor_d_heatCapacity_d_barr statmechfit:332:hinderedRotor_heatCapacity 'rmgpy.molecule.molecule.Molecule' 1.11% 1.29% objects> objects> 1.83% 0.98% 1.32% 1.71% 2.02% objects> (0.03%) (1.29%) 0.81% 1.30% (1.79%) (0.94%) (1.30%) (1.71%) (2.00%) 5.84% 11814 4064463 (0.81%) (1.30%) 304941 304941 185549 185549 185549 (5.84%) 2525703 2355705 2215968 1,3-Hexadiene with Mopac PM3

~:0: 100.00% 1.06% (0.00%) Serial: 32 mins (0.18%) 1 60

19.80% 1

main:224:initialize 79.81% 19.80% 13 (0.00%) 1

18.65% 1.01% 2 1

model:595:enlarge main:203:loadDatabase 98.46% 1.01% (0.00%) (0.00%) 15 1

93.62% 3.55% 1.03% 289 15 77

model:80:generateThermoData model:1417:updateUnimolecularReactionNetworks model:577:react 93.63% 3.55% 1.14% (0.01%) (0.00%) (0.00%) 293 15 90

63.26% 3.55% 1.14% 163 2121 123

thermo:753:estimateRadicalThermoViaHBI pdep:450:update __init__:251:generateReactions 29.72% 63.69% 3.55% 1.14% 28 (0.00%) (0.01%) (0.00%) 388 2121 123 17 63.08% 1.09% 2.38% 1.14% 163 61 19 123 main:138:getThermoData model:205:generateStatMech network:192:calculateRateCoefficients __init__:296:generateReactionsFromFamilies 92.80% 1.09% 2.38% 1.14% (0.00%) (0.00%) (0.03%) (0.00%) 191 61 19 123

92.77% 1.09% 1.49% 0.87% 1.14% 191 61 760 760 4059 molecule:210:generateThermoData statmech:630:getStatmechData network:263:setConditions network:740:applyModifiedStrongCollisionMethod family:1174:generateReactions 92.77% 1.09% 1.49% 0.87% 1.14% (0.00%) (0.00%) (0.00%) (0.00%) (0.00%) 191 61 760 760 4059

92.42% 1.09% 1.17% 0.87% 1.14% 101 61 760 760 7680 mopac:195:generateQMData statmech:669:getStatmechDataFromGroups network:718:calculateCollisionModel ~:0: family:1223:__generateReactions 92.42% 1.09% 1.17% 0.87% 1.14% (0.00%) (0.00%) (0.07%) (0.33%) (0.02%) 101 61 760 760 8408

15.42% 76.81% 1.09% 1.07% 101 102 61 1360

~:0: (0.00%) (0.00%) (0.00%) 1.07% 101 102 61 (1.07%) 1360

15.42% 76.58% 1.07% 101 102 60 molecule:56:generateRDKitGeometries subprocess:674:communicate statmechfit:80:fitStatmechToHeatCapacity 15.42% 76.64% 1.07% (0.00%) (0.00%) (0.00%) 101 203 60

15.41% 76.58% 101 102 molecule:80:rd_embed subprocess:1166:wait 15.41% 76.59% (15.40%) (0.00%) 101 203

76.59% 203 subprocess:462:_eintr_retry_call 76.76% (0.00%) 406

76.59% 203

~:0: 76.59% (76.59%) 203 Current situation

36 cores: 7 mins

18 Problem-1

l Job submission through grid engine fails. • ImportError: libRDGeneral.so.1: cannot open shared object file: No such file or directory l More info @ https://groups.google.com/forum/#!topic/scoop-users/T7bXN5x1zic • “SCOOP won't be handling environment variables directly (at least the way as MPI does). The next version (0.7) will contain a new feature called a prolog which is an executable (ie. a shell script) that SCOOP will execute at the launch of every worker. Exporting environment variables will be possible in this prolog.” l There might be a trick. (.bashrc, .login is not the solution) l Links required can be installed by root. l Temporary solution: • Submit a sleep job, ssh to that node, and run interactively. Don’t forget to cancel them. (kill -9 -1, then qdel xxxx)

19

Problem-2

l Job fails sometimes. • AttributeError: 'ccData' object has no attribute 'rotcons' • ERROR:root:Not all of the required keywords for success were found in the output file! l Reason: Unpredictable buffering of I/O by OS. l Trying: os.fsync, os.path.getsize l Temporary solution: Add a sleep part in the code. (1 second seems ok for MOPAC jobs, Gaussian jobs are more tricky)

20 Conclusions

l Parallel was the future, now we all need it. l Redesigning some portions of RMG-Py is necessary. • Reactor conditions, Pdep calculations, graph search • Database structure (Tree, might not be the best option) • Minimize I/O. • Avoid writing to home disk. Move them when job finishes. • We can avoid sharing library with workers. l Parallel programming is a headache even for the most advanced . l You may that you solve some problems by sleeping, but it is only a dream, it won’t last long.

21

Thank you all

RMG-Dev

22 References

l http://web0.tc.cornell.edu/Services/Education/Topics/Parallel/ l https://computing.llnl.gov/tutorials/parallel_comp/ l http://www.ibm.com/developerworks/library/wa-cloudgrid/ l http://parajava.sourceforge.net/ l http://groups.csail.mit.edu/cag/ps3/ l http://www.intel-software-academic-program.com/courses/

23

Monte-Carlo computation of pi l Generate a random point inside unit square • Two random numbers 0

http://code.google.com/p/scoop/ 24