HLST Core Team Report 2014

EUROFUSION WPISA-REP(16) 16215 R Hatzky et al. HLST Core Team Report 2014 REPORT This work has been carried out within the framework of the EUROfusion Con- sortium and has received funding from the Euratom research and training programme 2014-2018 under grant agreement No 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This document is intended for publication in the open literature. It is made available on the clear under- standing that it may not be further circulated and extracts or references may not be published prior to publication of the original when applicable, or without the consent of the Publications Officer, EUROfu- sion Programme Management Unit, Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK or e-mail Publications.Offi[email protected] Enquiries about Copyright and reproduction should be addressed to the Publications Officer, EUROfu- sion Programme Management Unit, Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK or e-mail Publications.Offi[email protected] The contents of this preprint and all other EUROfusion Preprints, Reports and Conference Papers are available to view online free at http://www.euro-fusionscipub.org. This site has full search facilities and e-mail alert options. In the JET specific papers the diagrams contained within the PDFs on this site are hyperlinked HLST Core Team Report 2014 This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014–2018 under grant agreement No 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. Contents 1. Executive Summary ..............................................................................................5 1.1. Progress made by each core team member on allocated projects ................ 5 1.2. Further tasks and activities of the core team ...............................................10 1.2.1. Dissemination .......................................................................................10 1.2.2. Training .................................................................................................10 1.2.3. Internal training .....................................................................................11 1.2.4. Workshops & conferences ....................................................................11 1.2.5. Publications ..........................................................................................12 1.2.6. Meetings ...............................................................................................12 2. Final report on HLST project PARSOLPS ..........................................................13 2.1. Introduction ..................................................................................................13 2.2. Parallel performance of the B2 code ...........................................................13 2.2.1. Previous work .......................................................................................13 2.2.2. Benchmark of previous work ................................................................13 2.2.3. Improvements during the PARSOLPS project ...................................... 14 2.2.4. Speedup of individual subroutines ........................................................ 16 2.3. Factors that influence parallel performance ................................................. 16 2.3.1. OpenMP overhead ...............................................................................16 2.3.2. OpenMP reductions .............................................................................. 18 2.3.1. OpenMP loops and data layout ............................................................20 2.3.2. Memory bottleneck ...............................................................................21 2.3.3. Memory bandwidth ...............................................................................22 2.3.4. Discussion of the parallel performance ................................................24 2.4. Further optimizations ................................................................................... 24 2.5. Unit testing and parallelization .....................................................................24 2.6. Conclusions .................................................................................................26 2.7. Bibliography .................................................................................................27 3. Final report on HLST projects BEUIFERC/GOMIC ............................................. 28 3.1. Xeon Phi architecture and micro-benchmarks ............................................. 28 3.1.1. General architecture .............................................................................28 3.1.2. Vector processing cores micro-benchmarks ........................................29 3.1.3. Memory micro-benchmarks .................................................................. 29 3.1.4. Multi-MIC discussion ............................................................................ 31 3.2. Porting Gysela proto-app on Xeon Phi ........................................................35 3.2.1. Benchmark database ............................................................................35 3.2.2. First profiling .........................................................................................35 3.2.3. Increasing the number of threads for the Xeon Phi .............................. 36 3.2.4. Building yet another reduced version of the code ................................36 1 3.2.5. Implementation details on vectorization ............................................... 37 3.2.6. Bottom up approach ............................................................................. 40 3.3. Conclusions .................................................................................................40 3.4. References ..................................................................................................41 4. Final Report on HLST project KSOL2D-3 ...........................................................42 4.1. Introduction ..................................................................................................42 4.2. Model problem .............................................................................................42 4.3. Multigrid method with a gathering level .......................................................44 4.3.1. FDM vs. FEM ........................................................................................45 4.3.2. Weak scaling for FEM ..........................................................................46 4.4. FETI-DP method ..........................................................................................47 4.4.1. Results of the FETI-DP method ............................................................49 4.5. Conclusions .................................................................................................52 5. Final Report on HLST project HIMAT .................................................................53 5.1. Introduction ..................................................................................................53 5.2. Basic structure of an H-matrix .....................................................................53 5.3. Numerical experiments for 2D .....................................................................55 5.3.1. Geometric bisection clustering method ................................................ 55 5.4. Numerical experiments for 3D .....................................................................58 5.4.1. Geometric bisection clustering .............................................................58 5.4.2. Nested dissection method ....................................................................59 5.4.3. Memory consumption of the geometric bisection clustering and nested dissection method ...............................................................................................60 5.5. Conclusions .................................................................................................61 6. Final Report on HLST project ITM-ADIOS2 ........................................................ 62 6.1. Introduction ..................................................................................................62 6.2. Aspects of the HELIOS file systems ............................................................62 6.3. Serial I/O Performance Assessment on HELIOS ........................................63 6.4. I/O Performance for a trivial parallel job ......................................................64 6.5. Possible usage of /tmp for fast I/O ...............................................................65 6.6. Parallel I/O Performance Assessment on HELIOS ......................................65 6.7. ADIOS 1.5 and staging ................................................................................ 66 6.8. I/O needs of selected ITM codes .................................................................66 6.9. A lightweight FORTRAN module “HLST-ADIOS-CHECKPOINT” ............... 67 6.10. Performance aspects of HAC ...................................................................68 6.11. ADIOS-1.5 on the EFDA Gateway ...........................................................68 6.12. Possible Future Work ...............................................................................68

HLST Core Team Report 2014

Recursive Approach in Sparse Matrix LU Factorization

Abstract 1 Introduction

Overview of Iterative Linear System Solver Packages

Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product

Easybuild Documentation Release 20210907.0

Over-Scalapack.Pdf

Prospectus for the Next LAPACK and Scalapack Libraries

Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist

Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA

LAPACK WORKING NOTE 195: SCALAPACK's MRRR ALGORITHM 1. Introduction. Since 2005, the National Science Foundation Has Been Fund

Interfacing Epetra to the RSB Sparse Matrix Format for Shared-Memory Performance

Using Machine Learning to Improve Dense and Sparse Matrix Multiplication Kernels