D4.2 First Report on Code Profiling and Bottleneck Identification, Structured

Ref. Ares(2019)5488775 - 30/08/2019 HORIZON2020 European Centre of Excellence Deliverable D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities. D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities Carlo Cavazzoni, Fabio Affinito, Uliana Alekseeva, Claudia Cardoso, Augustin Degomme, Pietro Delugas, Andrea Ferretti, Alberto Garcia, Luigi Genovese, Paolo Giannozzi, Anton Kozhevnikov, Ivan Marri, Stephan Mohr, and Daniel Wortmann Due date of deliverable 31/08/2019 (month 9) Actual submission date 31/08/2019 Lead beneficiary CINECA (participant number 8) Dissemination level PU - Public http://www.max-centre.eu 1 HORIZON2020 European Centre of Excellence Deliverable D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities. Document information Project acronym MAX Project full title Materials Design at the Exascale Research Action Project type European Centre of Excellence in materials mod- elling, simulations and design EC Grant agreement no. 824143 Project starting/end date 01/12/2018 (month 1) / 30/11/2021 (month 36) Website http://www.max-centre.eu Deliverable no. D4.2 Authors Carlo Cavazzoni, Fabio Affinito, Uliana Alekseeva, Claudia Cardoso, Augustin Degomme, Pietro Del- ugas, Andrea Ferretti, Alberto Garcia, Luigi Gen- ovese, Paolo Giannozzi, Anton Kozhevnikov, Ivan Marri, Stephan Mohr, and Daniel Wortmann To be cited as C. Cavazzoni et al. (2019): First report on code profiling and bottleneck identification, structured plan of forward activities. Deliverable D4.2 of the H2020 CoE MAX (final version as of 30/08/2019). EC grant agreement no: 824143, CINECA, Bologna, Italy. Disclaimer This document’s contents are not intended to replace consultation of any applicable legal sources or the necessary advice of a legal expert, where appropriate. All information in this document is provided “as is” and no guarantee or warranty is given that the information is fit for any particular purpose. The user, therefore, uses the information at its sole risk and liability. For the avoidance of all doubts, the European Commission has no liability in respect of this document, which is merely representing the authors’ view. http://www.max-centre.eu 2 HORIZON2020 European Centre of Excellence Deliverable D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities. Contents 1 Executive Summary5 2 Introduction6 3 Scientific use cases6 3.1 List of scientific use cases..........................6 3.1.1 QUANTUM ESPRESSO......................6 3.1.2 Yambo............................... 11 3.1.3 FLEUR............................... 13 3.1.4 BigDFT............................... 14 3.1.5 CP2K................................ 15 3.1.6 SIESTA............................... 18 4 Profiling results, bottlenecks, and early actions 19 4.1 QUANTUM ESPRESSO.......................... 19 4.1.1 Profiling on pw.x ......................... 19 4.1.2 Profiling of pw.x on GPUs.................... 23 4.1.3 Profiling on cp.x ......................... 24 4.2 Yambo.................................... 29 4.2.1 Profiling on Yambo: the GW workflow.............. 29 4.2.2 Defective TiO2 structure: MPI and OpenMP scaling....... 30 4.2.3 Chevron-like polymer: MPI scaling................ 33 4.2.4 Intra-node profiling on Yambo: GPUs............... 33 4.3 FLEUR................................... 38 4.3.1 Performance of the FLEUR MAX Release 3........... 38 4.3.2 New data layout.......................... 40 4.4 BigDFT................................... 43 4.4.1 Uranium-dioxyde benchmarks - GPU............... 43 4.4.2 Bench Submission through AiiDA ................ 45 4.4.3 Uranium-dioxyde benchmarks - KNL............... 46 4.5 CP2K.................................... 48 4.5.1 RPA calculations with CP2K.................... 48 4.5.2 Linear scaling calculations..................... 49 4.5.3 Plane-wave pseudo potential calculations............. 50 4.6 SIESTA................................... 53 5 Structured plan of forward activities 59 5.1 QUANTUM ESPRESSO.......................... 59 5.2 Yambo.................................... 60 5.3 FLEUR................................... 61 5.4 BigDFT................................... 61 5.5 CP2K.................................... 63 5.6 SIESTA................................... 63 6 Conclusion and lessons learned 65 http://www.max-centre.eu 3 HORIZON2020 European Centre of Excellence Deliverable D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities. References 66 http://www.max-centre.eu 4 HORIZON2020 European Centre of Excellence Deliverable D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities. 1 Executive Summary This deliverable sets the baseline for the performance and scalability status towards exascale of MAX applications, and at the same time identifies the key bottlenecks that currently preclude MAX codes from efficiently executing a set of selected scientific use cases on future European pre-exascale and exascale systems. It is important to remark that MAX community codes are complex objects that can be configured to run in many different ways, by changing several parameters. Moreover, the required computational resources (time, memory, I/O, etc) connected to different input datasets may span several orders of magnitude, thereby making the codes display very different computational be- haviours. Being practically impossible to explore the whole space of possible working conditions and parameters of MAX codes, we have decided to focus our effort on those parameters that are potentially blocking for scientific use cases of interest for future exascale systems. We thus report on code profiling and bottleneck identification activities referring to such use cases. While this does not necessary imply that other scientific use cases may display the same bottlenecks or profiling patterns, the methodology and best practices adopted in this deliverable can be easily extended to all other use cases. Concerning the profiling, we have decided to consider the simulation wall-time as the main performance metric, since the timing of the whole workflow is what actually impacts the user perception and productivity. We have also decided that profiling and bottleneck identification in particular should be referred as much as possible to well identifiable kernels and modules (e.g.: "the code does not scale primarily because of eigenvectors orthogonalisation"), since this can help both code experts and scientists to review the applications. When a deeper analysis is needed –e.g., instruction or function- level profiling–, we have decided to get in contact with the PoP CoE, since these activities are at the core of their action and expertise. Given the evolution of HPC towards extreme heterogeneity, a relevant advancement of this Deliverable is the systematic inclusion of benchmarks on accelerated architectures. All above decisions were taken as a result of several discussions during the first months of activity, and finally reviewed in a three-day face-to-face meeting at CINECA (Bologna, IT on July 10-12, 2019) involving people from Work Packages 1, 2, 3, and 4. In this Deliverable, we report on the profiling and benchmarking campaign performed along the above lines during the first months of MAX on its six flagship codes, QUAN- TUM ESPRESSO, Yambo, FLEUR, BigDFT, CP2K, and SIESTA. Importantly, we have started to collect benchmark/profiling curated data (including the scientific datasets) in a MAX dedicated repository. On one side, this will allow code developers to automate the collection process (e.g. via AiiDA) and to follow the evolution of the code performance in time. On the other side it will make available the results to the broader community of code users. http://www.max-centre.eu 5 HORIZON2020 European Centre of Excellence Deliverable D4.2 First report on code profiling and bottleneck identification, structured plan of forward activities. 2 Introduction As defined in the description of work, WP4 is responsible for profiling and benchmarking releases of the MAX codes. This is done via a dedicated task, T4.4. The outcome of the activities of this task is functional to many other tasks of other work packages (WP1, WP2, WP3, and WP6) in order to provide feedback on identified code bottlenecks and progresses obtained in terms of performance enhancement with respect of the relevant metric (e.g. scalability, time to solution). In particular, this report provides the baseline results for the first release of the MAX flagship codes, to be used by all developers across the centre to measure the effectiveness of the solutions adopted to improve code performance. The same information will be used by other WP4 tasks to look for co-design opportunities (e.g. a bottleneck that is linked to a specific hardware feature), or justify a proof-of-concept with a new paradigm or software technology (e.g. if one of the adopted paradigm is found to be responsible for poor performances). This deliverable is organized as follows: in the first Section we report the selected scientific use cases being used to define the performance and bottleneck baseline for the different codes. The dataset for these use cases are stored in a dedicated MAX Gitlab repository1 and made available to all MAX developers to be also used in the activities of other WPs. In the second Section we report the results of the many benchmarks we have run along

D4.2 First Report on Code Profiling and Bottleneck Identification, Structured

GPAW, Gpus, and LUMI

D6.1 Report on the Deployment of the Max Demonstrators and Feedback to WP1-5

5 Jul 2020 (ﬁnite Non-Periodic Vs

D7.2 Second Report on the Activity of the High-Level Support Services (Second Year)

The CECAM Electronic Structure Library and the Modular Software Development Paradigm

Improvements of Bigdft Code in Modern HPC Architectures

Kepler Gpus and NVIDIA's Life and Material Science

Introduction to DFT and the Plane-Wave Pseudopotential Method

The CECAM Electronic Structure Library and the Modular Software Development Paradigm

Arxiv:2005.05756V2

(DFT) and Its Application to Defects in Semiconductors

Package Name Software Description Project