Exascale Computing Project -- Software

Exascale Computing Project -- Software Paul Messina, ECP Director Stephen Lee, ECP Deputy Director ASCAC Meeting, Arlington, VA Crystal City Marriott April 19, 2017 www.ExascaleProject.org ECP scope and goals Develop applications Partner with vendors to tackle a broad to develop computer spectrum of mission Support architectures that critical problems national security support exascale of unprecedented applications complexity Develop a software Train a next-generation Contribute to the stack that is both workforce of economic exascale-capable and computational competitiveness usable on industrial & scientists, engineers, of the nation academic scale and computer systems, in collaboration scientists with vendors 2 Exascale Computing Project, www.exascaleproject.org ECP has formulated a holistic approach that uses co- design and integration to achieve capable exascale Application Software Hardware Exascale Development Technology Technology Systems Science and Scalable and Hardware Integrated mission productive technology exascale applications software elements supercomputers Correctness Visualization Data Analysis Applicationsstack Co-Design Programming models, Math libraries and development environment, Tools Frameworks and runtimes System Software, resource Workflows Resilience management threading, Data Memory scheduling, monitoring, and management and Burst control I/O and file buffer system Node OS, runtimes Hardware interface ECP’s work encompasses applications, system software, hardware technologies and architectures, and workforce development 3 Exascale Computing Project, www.exascaleproject.org What is a capable exascale computing system? A capable exascale computing system requires an entire computational ecosystem that: This ecosystem will be developed using • Delivers 50× the performance of today’s 20 PF a co-design approach systems, supporting applications that deliver high- to deliver new software, fidelity solutions in less time and address problems applications, platforms, of greater complexity and computational • Operates in a power envelope of 20–30 MW science capabilities at heretofore unseen • Is sufficiently resilient (perceived fault rate: ≤1/week) scale • Includes a software stack that supports a broad spectrum of applications and workloads 4 Exascale Computing Project, www.exascaleproject.org The ECP Plan of Record • A 7-year project that follows the holistic/co-design approach, which runs through 2023 (including 12 months of schedule contingency) – To meet the ECP goals • Enable an initial exascale system based on advanced architecture and delivered in 2021 • Enable capable exascale systems, based on ECP R&D, delivered in 2022 and deployed in 2023 as part of an NNSA and SC facility upgrades • Acquisition of the exascale systems is outside of the ECP scope, will be carried out by DOE-SC and NNSA-ASC facilities 5 Exascale Computing Project, www.exascaleproject.org ECP leadership team Exascale Computing Chief Technology Integration Officer Project Manager Al Geist, ORNL Paul Messina, Project Director, ANL Julia White, ORNL Stephen Lee, Deputy Project Director, LANL Communications Manager Mike Bernhardt, ORNL Application Software Hardware Project Exascale Systems Development Technology Technology Management Doug Kothe, Rajeev Thakur, Terri Quinn, Director, LLNL Jim Ang, Director, SNL Kathlyn Boudwin, Director, ORNL Director, ANL Susan Coghlan, John Shalf, Director, ORNL Bert Still, Pat McCormick, Deputy Director, ANL Deputy Director, LBNL Deputy Director, LLNL Deputy Director, LANL 8 Exascale Computing Project, www.exascaleproject.org ECP Software Technology Overview • Build a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures • Accomplished by extending current technologies to exascale where possible, performing R&D required to conceive of new approaches where necessary – Coordinate with vendor efforts; i.e., develop software other than what is typically done by vendors, develop common interfaces or services – Develop and deploy high-quality and robust software products 9 Exascale Computing Project, www.exascaleproject.org Exascale Computing Project ECP WBS 1. Project Application Hardware Software Exascale Systems Management Development Technology Technology 1.5 1.1 1.2 1.3 1.4 Project Planning DOE Science and Programming Models NRE and Management Energy Apps PathForward and Runtimes 1.5.1 1.1.1 1.2.1 1.3.1 Vendor Node and System DOE NNSA Design Tools Testbeds Project Controls & Applications 1.4.1 1.3.2 1.5.2 Risk Management 1.2.2 1.1.2 Mathematical and Other Agency Scientific Libraries Design Space Co-design Business Applications and Frameworks Evaluation and Integration Management 1.2.3 1.3.3 1.4.2 1.5.3 1.1.3 Developer Training Data Management Co-Design Procurement and Productivity and Workflows and Integration Management 1.2.4 1.3.4 1.4.3 1.1.4 Co-Design and Data Analytics and Information PathForward II Integration Visualization Vendor Node and Technology and 1.2.5 1.3.5 Quality System Design 1.4.4 Management System Software 1.1.5 1.3.6 Communications & Outreach Resilience and 1.1.6 Integrity 1.3.7 Integration Co-Design and 1.1.7 Integration 1.3.8 SW PathForward 10 Exascale Computing Project, www.exascaleproject.org 1.3.9 ST Level 3 WBS Leads Programming Models and Runtimes Rajeev Thakur, ANL 1.3.1 Tools 1.3.2 Jeff Vetter, ORNL Mathematical and Scientific Libraries and Frameworks Mike Heroux, SNL 1.3.3 Data Management and Workflows Rob Ross, ANL 1.3.4 Data Analytics and Visualization Jim Ahrens, LANL 1.3.5 System Software 1.3.6 Martin Schulz, LLNL Resilience and Integrity Al Geist, ORNL 1.3.7 Co-Design and Integration Rob Neely, LLNL 1.3.8 11 Exascale Computing Project, www.exascaleproject.org Vision and Goals for the ECP Software Stack Deliver and Provide foundational software and infrastructure to applications and facilities necessary for project success in 2021-23, while also pushing to innovate beyond that Anticipate horizon Collaborate Encourage and incentivize use of common infrastructure and APIs within the software stack Integrate Work with vendors to provide a balanced offering between lab/univ developed (open source), vendor-offered (proprietary), and jointly developed solutions Quality Deploy production-quality software that is easy to build, well tested, documented, and supported Prioritize Focus on a software stack that addresses the unique requirements of exascale – including extreme scalability, unique requirements of exascale hardware, and performance-critical components Completeness Perform regular gap analysis and incorporate risk mitigation (including “competing” approaches) in high-risk and broadly impacting areas 12 Exascale Computing Project, www.exascaleproject.org ECP Requires Strong Integration to Achieve Capable Exascale Within the stack Software Technology Application Hardware Technology Exascale Systems Development & Vendors & Facilities • To achieve a coherent software stack, we must integrate across all the focus areas – Understand and respond to the requirements from the apps but also help them understand challenges they may not yet be aware of – Understand and repond to the impact of hardware technologies and platform characteristics – Work with the facilities and vendors towards a successful stable deployment of our software technologies – Understand and respond to dependencies within the stack, avoiding duplication and scope creep – This is a comprehensive team effort — not a set of individual projects! 13 Exascale Computing Project, www.exascaleproject.org Requirements for Software Technology Derived from • Analysis of the software needs of exascale applications • Inventory of software environments at major DOE HPC facilities (ALCF, OLCF, NERSC, LLNL, LANL, SNL) – For current systems and the next acquisition in 2–3 years (CORAL, APEX) • Expected software environment for an exascale system • Requirements beyond the software environment provided by vendors of HPC systems • What is needed to meet ECP’s Key Performance Parameters (KPPs) 14 Exascale Computing Project, www.exascaleproject.org Example: An Exascale Subsurface Simulator of Coupled Flow, Transport, Reactions and Mechanics* Exascale Challenge Problem Applications & S/W Technologies • Safe and efficient use of the subsurface for geologic CO2 sequestration, petroleum extraction, geothermal energy and nuclear waste isolation Applications • Predict reservoir-scale behavior as affected by the long-term integrity of hundreds of • Chombo-Crunch, GEOS thousands deep wells that penetrate the subsurface for resource utilization Software Technologies Cited • Resolve pore-scale (0.1-10 µm) physical and geochemical heterogeneities in • C++, Fortran, LLVM/Clang wellbores and fractures to predict evolution of these features when subjected to • MPI, OpenMP, CUDA geomechanical and geochemical stressors • Raja, CHAI • Integrate multi-scale (µm to km), multi-physics in a reservoir simulator: non-isothermal multiphase fluid flow and reactive transport, chemical and mechanical effects on • Chombo AMR, PETSc formation properties, induced seismicity and reservoir performance • ADIOS, HDF5, Silo, ASCTK • Century-long simulation of a field of wellbores and their interaction in the reservoir • VisIt Development Plan Risks and Challenges Y1: Evolve GEOS and Chombo-Crunch; Coupling framework v1.0; Large scale (100 m) mechanics • Porting to exascale results in suboptimal usage across platforms test

Exascale Computing Project -- Software

Integration of CUDA Processing Within the C++ Library for Parallelism and Concurrency (HPX)

Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing

Adaptive Data Migration in Load-Imbalanced HPC Applications

MVAPICH2 2.2 User Guide

HPX – a Task Based Programming Model in a Global Address Space

Benchmarking the Intel FPGA SDK for Opencl Memory Interface

Scalable and High Performance MPI Design for Very Large

BCL: a Cross-Platform Distributed Data Structures Library

Advances, Applications and Performance of The

Beowulf Clusters — an Overview

MVAPICH2 2.3 User Guide

Improving MPI Threading Support for Current Hardware Architectures