WP 8 Project Presentations - 28 September 2020
Total Page:16
File Type:pdf, Size:1020Kb
Linear Algebra, Krylov-subspace methods, and multigrid solvers for the discovery of New Physics (LyNcs): WP 8 Project Presentations - 28 September 2020 Jacob Finkenrath CaSToRC, The Cyprus Institute 1 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu PRACE 6IP WP8 LyNcs: People ▶ Constantia Alexandrou, CaSToRC – PI ▶ Emmanuel Agullo, Inria ▶ Simone Bacchio, CaSToRC ▶ Olivier Coulaud, Inria ▶ Jacob Finkenrath, CaSToRC ▶ Luc Giraud, Inria ▶ Kyriakos Hadjiyiannakou, CaSToRC ▶ Giannis Koutsou, CaSToRC ▶ Stéphane Lanteri, Inria ▶ Gilles Marait, Inria ▶ Michele Martone, LRZ ▶ Shuhei Yamamoto, CaSToRC 2 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Project Objectives LyNcs: Linear Algebra, Krylov-subspace methods, and multigrid solvers for the discovery of New Physics Address the Exascale challenge for sparse linear systems & iterative solvers ▶ Targeting all level of software stack ● on lowest level: sparse Matrix vector kernels: librsb ● on intermediate level: iterative linear solver libraries: fabulous, DDalphaAMG, QUDA ● on highest level : community codes: LyNcs API, tmLQCD, openQCD Applications in: ▶ Fundamental Physics - lattice QCD ▶ Electrodynamics ▶ Computational Chemistry 3 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Talk Outline ▶ Motivation ▶ LyNcs within 6IP Prace WP8 ● Project structure ● Tasks and Deliverables ▶ Overview on software developments ● Lyncs API - New community-driven software ● Fabulous - Iterative Block Krylov Solvers library ● librsb - a Sparse BLAS format library ● DDalphaAMG - Multigrid library for lattice QCD ▶ Outlook 4 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu LyNcs vision: Towards Exascale Computing ▶ Objective 1: Performance and Efficiency ● focusing on efficient node level performance by using state of the art implementations and algorithms like Block Krylov solvers and multigrid approaches ▶ Objective 2: Scaling ● extend scalability and speeding up the time to solution beyond DD+strong scaling with alternative parallelization strategies ▶ Objective 3: Diversity ● Hardware and software: Software stack of community software is rich and optimized for various hardware systems, optimizations and algorithm developments are on going in various tasks like given by DOEs Exascale initiative Focus of LyNcs is to be complementary/in a synergy to those efforts ▶ on user-friendly API to enable the rich software stack of solver and application codes ▶ on specific software optimization and developments used in the European Community 5 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Tasks & Targets: Project identifies 4 tasks: Task 1: Sparse linear solvers suitable for Exascale Task 2: Prototyping new methods for Exascale Task 3: Portable APIs Task 4: Community code enabling We target software implementation, matching the objectives of the tasks: Target 1: LyNcs API (Task 3,4,2) - New community-driven software - portable, flexible, modular and user-friendly Target 2: Fabulous (Task 1,2) - Iterative Block Krylov Solvers library - study and implementation of new algorithms Target 3: librsb (Task 1) - Recursive Sparse Blocks format library - improved portability, benchmarking and code quality Target 4: DDalphaAMG (Task 1,2) - Multigrid library for lattice QCD - extension to block solver and linkage to Fabulous 6 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Deliverables & contributions PRACE WP 8 – Deliverables Contribution by LyNcs ▶ D8.2 (MS 6) Interim progress report: ▶ Finalized set of tasks and milestones, PM distribution, and staff and established project structure team, including any new staff. ▶ D8.3 (MS 12) Interim progress report: ▶ First version of APIs from Task 3. Progress of library Public Prototype software release and optimization (Task 1) and evaluation of prototypes new development infrastructure algorithms (Task 2) ▶ D8.4 (MS 24) Interim progress report: ▶ First release of software, including documentation, Public Software release (docs, testing, benchmark results and issue tracking statistics (Tasks 1, issue tracker) and integration in external and 3). Report on evaluation of new methods of Task 2, and codes candidates selected for implementation within Task 1. Progress of integration into community codes (Task 4). ▶ D8.5 (MS 30) Final report: including ▶ Performance results on community software linked to APIs performance results on (pre)Exascale (Tasks 1, 2, 3, and 4). Report on results presented in systems conferences. Publication of results of Task 2 (as peer-review publication or PRACE white paper) 7 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Deliverables & contributions PRACE WP 8 – Deliverables Contribution by LyNcs ▶ D8.2 (MS 6) Interim progress report: ▶ Finalized set of tasks and milestones, PM distribution, and staff and established project structure team, including any new staff. ▶ D8.3 (MS 12) Interim progress report: ▶ First version of APIs from Task 3. Progress of library Public Prototype software release and optimization (Task 1) and evaluation of prototypes new development infrastructure algorithms (Task 2) ▶ D8.4 (MS 24) Interim progress report: ▶ First release of software, including documentation, Public Software release (docs, testing, benchmark results and issue tracking statistics (Tasks 1, issue tracker) and integration in external and 3). Report on evaluation of new methods of Task 2, and codes candidates selected for implementation within Task 1. Progress of integration into community codes (Task 4). ▶ D8.5 (MS 30) Final report: including ▶ Performance results on community software linked to APIs performance results on (pre)Exascale (Tasks 1, 2, 3, and 4). Report on results presented in systems conferences. Publication of results of Task 2 (as peer-review publication or PRACE white paper) 8 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Project Synergy DDalphaAMG Integration of major software for Lattice QCD simulations Advance Krylov block solvers for multigrid methods Lyncs API Implementation of high-quality community-software standards Fabulous librsb Performance improvements of low level kernels 9 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Lyncs API - Overview Lyncs-API A Python API for LQCD applications Flexible, portable, extendible Task and data parallelism ▶ Lyncs-API covers the entire stack, from low-level dask.distributed, mpi4py libraries to fully-fledged community codes Distributed data Python bindings dask.Array cppyy Main features: LQCD libraries Data management DDalphaAMG, QUDA, numpy, cupy, h5py tmLQCD, OpenQCD... ➔ Flexible framework for easy-integration of external libraries C/C++ libraries Python modules ➔ Parallel tasking and modular computing Developed / wrapped / contributed to ➔ Automatic benchmarking and cross Contribution to Objectives checking of different implementations ▶ O2: alternative parallelization methods will speed-up real ➔ High-level user-friendly Python API time to solution of Markov-Chains beyond strong scaling regime ➔ Low-level integration of C/C++ libraries ▶ O3: flexible to diversity in hardware by utilization of community software stack, e.g. QUDA for GPU systems 10 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Lyncs API - Targeted libraries ▶ Lyncs-API provides standalone Python In the Lyncs-API we aim to target as many LQCD interfaces to several LQCD libraries: libraries as possible, such that we can DDalphaAMG, clime, tmLQCD, QUDA, openQCD, QphiX and others to come... ▶ Optimally support various architecture with ▶ Source-codes are downloaded, patched and architecture-specific libraries compiled using git & CMake ▶ Crosscheck and benchmark various ▶ Python bindings are automatically generated implementations via cppyy ▶ Involve all the community and be open to contributions ▶ Cover most of the research interests cppyy ▶ Give a second life to legacy codes Automatic C/C++ bindings >>> cppyy.include(‘header.h’) >>> cppyy.load_library(‘lib.so’) For doing this we have created several low- and ➢ Direct access to library functions high-level interfacing tools... ➢ Automatic casting of Python types ➢ Support of templates and overloading 11 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Lyncs API - Python interface ▶ Top level: Lyncs, an API that offers a high-level common framework for LQCD Natively scales Python ▶ Task scheduling, parallelism and data distribution is managed by Dask with ➢ Management of distributed data and tasks improved support for MPI applications ➢ Task scheduling and parallelism via futures via Lyncs-mpi ➢ Distributed numpy arrays ▶ Available implementations of the same function are benchmarked, cross-checked and selected via tuneit. tuneit ▶ All the low-level interfaces to external Tune, benchmark and crosscheck libraries can be used independently of Lyncs and the Lyncs-API is independent from them ➢ Generic-purpose package developed relying on alternative implementations beside Lyncs as a necessity ➢ Function implementations/parameters >> pip install lyncs_DDalphaAMG are used for tuning to best performance >> pip install lyncs[DDalphaAMG] ➢ Benchmarking and testing tools provided 12 PRACE 6IP – WP8 – LyNcs – CaSToRC www.prace-ri.eu Lyncs API - Development status Available on github.com/Lyncs-API ▶ Currently in alpha-state ▶ Open-source, BSD-3-Clause license ▶ Modular structure of many packages: ○ divided purpose-wise, e.g. interfaces, common tools, target architecture, etc.. ○ with independent testing, version control and installation Contribution to Task 2, 3 and 4 ▶ High-quality community-software standards ▶ D8.4: First release of software, including