ECP Software Technology Capability Assessment Report–Public
Total Page:16
File Type:pdf, Size:1020Kb
ECP-RPT-ST-0002-2020{Public ECP Software Technology Capability Assessment Report{Public Michael A. Heroux, Director ECP ST Jonathan Carter, Deputy Director ECP ST Rajeev Thakur, Programming Models & Runtimes Lead Jeffrey S. Vetter, Development Tools Lead Lois Curfman McInnes, Mathematical Libraries Lead James Ahrens, Data & Visualization Lead Todd Munson, Software Ecosystem & Delivery Lead J. Robert Neely, NNSA ST Lead February 1, 2020 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website http://www.osti.gov/scitech/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone 703-605-6000 (1-800-553-6847) TDD 703-487-4639 Fax 703-605-6900 E-mail [email protected] Website http://www.ntis.gov/help/ordermethods.aspx Reports are available to DOE employees, DOE contractors, Energy Technology Data Exchange representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone 865-576-8401 Fax 865-576-5728 E-mail [email protected] Website http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. ECP-RPT-ST-0002-2020{Public ECP Software Technology Capability Assessment Report{Public Office of Advanced Scientific Computing Research Office of Science US Department of Energy Office of Advanced Simulation and Computing National Nuclear Security Administration US Department of Energy February 1, 2020 Exascale Computing Project (ECP) iii ECP-RPT-ST-0002-2020{Public REVISION LOG Version Date Description 1.0 July 1, 2018 ECP ST Capability Assessment Report 1.5 February 1, 2019 Second release 2.0 February 1, 2020 Third release Exascale Computing Project (ECP) iv ECP-RPT-ST-0002-2020{Public EXECUTIVE SUMMARY The Exascale Computing Project (ECP) Software Technology (ST) Focus Area is responsible for developing critical software capabilities that will enable successful execution of ECP applications, and for providing key components of a productive and sustainable Exascale computing ecosystem that will position the US Department of Energy (DOE) and the broader high performance (HPC) community with a firm foundation for future extreme-scale computing capabilities. This ECP ST Capability Assessment Report (CAR) provides an overview and assessment of current ECP ST capabilities and activities, giving stakeholders and the broader HPC community information that can be used to assess ECP ST progress and plan their own efforts accordingly. ECP ST leaders commit to updating this document on regular basis (every six to 12 months). Highlights from this version of the report are presented here. What is new in CAR V2.0: CAR V2.0 contains the following updates relative to CAR V1.5. • We introduce the FY20{23 project structure. ECP ST now consists of 6 (up from 5) level-3 (L3) technical areas, introducing the NNSA ST L3 area, which brings into one L3 the ECP open source development efforts at NNSA labs that are of particular importance to the rest of ECP. The number of ECP ST level-4 (L4) subprojects has been reduced from 55 to 33. The strategic aggregation of projects into fewer and larger units enables us to better manage L4 subprojects consistently as a portfolio. See Section 1.2. • We describe new and enhanced project management processes and resources including our iterative planning process, new KPP-3 capability integration process, a product dictionary and dependency management database. These new project features and related dashboards enable more insight and better information to effectively manage efforts across ECP. See Section2. • The two-page summaries of each ECP L4 projects have been updated to reflect recent progress and next steps. See Section4. • The Extreme-scale Scientific Software Stack (E4S) is further described. The third release, which is also the first major public release Version 1.0, was November 18, 2019. E4S is the primary integration and delivery vehicle for ECP ST capabilities. See Section 2.1.1. • The ECP ST SDK effort has further refined its groupings. See Section 2.1.2. The Exascale Computing Project Software Technology (ECP ST) focus area represents the key bridge between Exascale systems and the scientists developing applications that will run on those platforms. ECP ST efforts contribute to 70 software products (Section 2.1.3) in six technical areas (Table1). Since the publishing of CAR V1.5, we have introduced a product dictionary of official product names, which enables more rigorous mapping of ECP ST deliverables to stakeholders (Section 2.1.4). Programming Models & Runtimes: In addition to developing key enhancements to MPI and OpenMP for scalable systems with accelerated node architectures, we are working on performance portability layers (Kokkos and RAJA) and participating in OpenMP and OpenACC software design and development that will enable applications to write much of their source code without the need to provide vendor-specific implementations for each exascale system. We anticipate that one legacy of ECP ST efforts will be a software stack that supports Intel and AMD accelerators in addition to Nvidia. See Section 4.1. Development Tools: We are enhancing existing widely used compilers (LLVM) and performance tools for next-generation platforms. Compilers are critical for heterogeneous architectures, and LLVM is the most popular compiler for heterogeneous systems. As node architectures become more complicated and concurrency even more necessary, compilers must generate optimized code for many architectures, and the impediments to performance and scalability become even harder to diagnose and fix. Development tools provide essential insight into these performance challenges and code transformation and support capabilities that help software teams generate efficient code, utilize new memory systems and more. See Section 4.2. Mathematical Libraries: High-performance scalable math libraries have enabled parallel execution of many applications for decades. ECP ST is providing the next generation of these libraries to address needs for latency hiding, improved vectorization, threading and strong scaling. In addition, we are addressing Exascale Computing Project (ECP) v ECP-RPT-ST-0002-2020{Public new demands for system-wide scalability including improved support for coupled systems and ensemble calculations. See Section 4.3. The math libraries teams are also spearheading the software development kit (SDK) initiative that is a pillar of the ECP ST software delivery strategy (Section 2.1.2). Data & Visualization: ECP ST has a large collection of data management and visualization products that provide essential capabilities for compressing, analyzing, moving and managing data. These tools are becoming even more important as the volume of simulation data we produce grows faster than our ability to capture and interpret it. See Section 4.4. SW Ecosystem & Delivery: This technical area of ECP ST provides important enabling technologies such as Spack [1], a from-source build and package manager, and container environments for high-performance computers. This area also provides the critical resources and staffing that will enable ECP ST to perform continuous integration testing, and product releases. Finally, this area engages with software and system vendors, and DOE facilities staff to assure coordinated planning and support of ECP ST products. See Section 4.5. NNSA ST: This technical area brings into one L3 area all of the NNSA-funded work in ECP ST for easier coordination with other project work at the NNSA labs. Introducing this L3 enables continued integrated planning with the rest of ECP ST while permitting flexible coordination within the NNSA labs. See Section 4.6. ECP ST Software Delivery mechanisms: ECP ST delivers software capabilities to users via several mechanisms (Section3). Almost all products are delivered via source code to at least some of their users. Each of the major DOE computing facilities provides direct support of some users for about 20 ECP ST products. About 10 products are available via vendor software stack and via binary distributions such as Linux distributions. ECP ST Project Restructuring: ECP ST completed a significant restructuring in October 2019 (Section 1.2). We increased the number of technical areas from 5 to 6 and reduced the number of L4 projects from 55 to 33. We introduced a new L4 subproject for software packaging and delivery in the L3 technical area (SW Ecosystem & Delivery) that enhances existing NNSA funding for Spack and creates