HLST Core Team Report 2015

EUROFUSION WPISA-REP(16) 16235 R Hatzky et al. HLST Core Team Report 2015 REPORT This work has been carried out within the framework of the EUROfusion Con- sortium and has received funding from the Euratom research and training programme 2014-2018 under grant agreement No 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This document is intended for publication in the open literature. It is made available on the clear under- standing that it may not be further circulated and extracts or references may not be published prior to publication of the original when applicable, or without the consent of the Publications Officer, EUROfu- sion Programme Management Unit, Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK or e-mail Publications.Offi[email protected] Enquiries about Copyright and reproduction should be addressed to the Publications Officer, EUROfu- sion Programme Management Unit, Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK or e-mail Publications.Offi[email protected] The contents of this preprint and all other EUROfusion Preprints, Reports and Conference Papers are available to view online free at http://www.euro-fusionscipub.org. This site has full search facilities and e-mail alert options. In the JET specific papers the diagrams contained within the PDFs on this site are hyperlinked HLST Core Team Report 2015 This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014–2018 under grant agreement No 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission Contents 1. Executive Summary ........................................................................................... 6 1.1. Progress made by each core team member on allocated projects .............. 6 1.2. Further tasks and activities of the core team ............................................. 10 1.2.1. Dissemination .................................................................................... 10 1.2.2. Training .............................................................................................. 10 1.2.3. Internal training .................................................................................. 10 1.2.4. Workshops & conferences ................................................................. 11 1.2.5. Publications ........................................................................................ 11 1.2.6. Meetings ............................................................................................ 12 2. Report on HLST project SOLPSOPT ................................................................ 13 2.1. Introduction ............................................................................................... 13 2.2. OpenMP parallelization of the B2 code ..................................................... 13 2.2.1. Execution time on one core ................................................................ 15 2.3. B2 in the HLST branch of the SVN repository ........................................... 17 2.3.1. Modules introduced ............................................................................ 17 2.3.2. Problems with the build system .......................................................... 17 2.3.3. R8 parameters included ..................................................................... 17 2.3.4. OpenMP modifications ....................................................................... 18 2.3.5. Different compilers ............................................................................. 18 2.4. B2 correctness checks .............................................................................. 19 2.4.1. Benchmarks ....................................................................................... 19 2.4.2. Long test ............................................................................................ 20 2.5. B2 reproducibility ....................................................................................... 20 2.5.1. Reproducibility and OpenMP reductions ............................................ 20 2.5.2. Differences due to reductions ............................................................. 21 2.5.3. Deterministic reduction ....................................................................... 22 2.5.4. B2 Differences due to optimization ..................................................... 22 2.6. EIRENE .................................................................................................... 22 2.6.1. Code version and test cases .............................................................. 23 2.6.2. MPI bugfixes ...................................................................................... 23 2.6.3. Parallel work distribution in EIRENE................................................... 24 2.7. More balanced workload distribution ......................................................... 26 2.7.1. Execution time and particle number ................................................... 27 2.7.2. MPI scaling of the ITER test case ...................................................... 28 2.8. MPI overhead for the ITER test case ......................................................... 29 2.8.1. Summing inside a stratum .................................................................. 29 2.9. Other parallelization strategies .................................................................. 30 2.9.1. Client-server workload distribution ..................................................... 31 2.10. Correctness of the results ...................................................................... 32 2.11. Plans for correctness checks ................................................................. 33 2.12. Summary ............................................................................................... 34 2.13. Bibliography ........................................................................................... 35 3. Final report on HLST project COCHLEA .......................................................... 36 3.1. Introduction ............................................................................................... 36 3.2. Course of the Project ................................................................................ 36 3.3. Visit to the Principal Investigator at the UoA .............................................. 37 3.4. Performance and scalability of COCHLEA + OpenMP .............................. 37 3.5. Conclusions and outlook ........................................................................... 39 3.6. References ................................................................................................ 39 4. Final report on HLST project FWTOR-15 ......................................................... 40 4.1. Introduction ............................................................................................... 40 4.2. Transitioning to MPI + OpenMP (hybrid) parallelism .................................. 40 4.3. MPI-FWTOR: strong scaling results .......................................................... 42 4.4. Visit to the principal investigator ................................................................ 43 4.5. Conclusions: state of MPI-FWTOR and further activities ........................... 44 4.6. References ................................................................................................ 45 5. Final report on HLST project BEUIFERC .......................................................... 46 5.1. The Intel Xeon Phi hardware ..................................................................... 46 5.1.1. The Intel Xeon Phi vs Intel Sandy Bridge ........................................... 46 5.1.2. Intel Xeon Phi architecture ................................................................. 47 5.2. MIC network performance ......................................................................... 47 5.3. OpenMP overhead micro-benchmark on the MIC partition of the HELIOS supercomputer ..................................................................................................... 51 5.4. Host, offload and native computation modes on the MIC partition ............. 52 5.5. Tests of MPI 3.0 standard ......................................................................... 53 5.5.1. Non-blocking collective communication .............................................. 53 5.5.2. Remote memory access ..................................................................... 55 5.5.3. Shared memory ................................................................................. 57 5.6. Test of MPI 2.0 standard ........................................................................... 58 5.7. Conclusions .............................................................................................. 59 5.8. References ................................................................................................ 60 6. Report on HLST project JORSTAR .................................................................. 61 6.1. STARWALL code analysis ........................................................................ 61 6.1.1. Memory consumption analysis ........................................................... 61 6.1.2. Computational time analysis .............................................................. 63 6.1.3. OpenMP parallelization analysis .......................................................

HLST Core Team Report 2015

Recursive Approach in Sparse Matrix LU Factorization

Abstract 1 Introduction

Overview of Iterative Linear System Solver Packages

Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product

Easybuild Documentation Release 20210907.0

Over-Scalapack.Pdf

Prospectus for the Next LAPACK and Scalapack Libraries

Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist

Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA

LAPACK WORKING NOTE 195: SCALAPACK's MRRR ALGORITHM 1. Introduction. Since 2005, the National Science Foundation Has Been Fund

Interfacing Epetra to the RSB Sparse Matrix Format for Shared-Memory Performance

Using Machine Learning to Improve Dense and Sparse Matrix Multiplication Kernels