ECP Software Technology Capability Assessment Report
Total Page:16
File Type:pdf, Size:1020Kb
ECP-RPT-ST-0001-2018 ECP Software Technology Capability Assessment Report Michael A. Heroux, Director ECP ST Jonathan Carter, Deputy Director ECP ST Rajeev Thakur, Programming Models & Runtimes Lead Jeffrey Vetter, Development Tools Lead Lois Curfman McInnes, Mathematical Libraries Lead James Ahrens, Data & Visualization Lead J. Robert Neely, Software Ecosystem & Delivery Lead July 1, 2018 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website http://www.osti.gov/scitech/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone 703-605-6000 (1-800-553-6847) TDD 703-487-4639 Fax 703-605-6900 E-mail [email protected] Website http://www.ntis.gov/help/ordermethods.aspx Reports are available to DOE employees, DOE contractors, Energy Technology Data Exchange representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone 865-576-8401 Fax 865-576-5728 E-mail [email protected] Website http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. ECP-RPT-ST-0001-2018 ECP Software Technology Capability Assessment Report Office of Advanced Scientific Computing Research Office of Science US Department of Energy Office of Advanced Simulation and Computing National Nuclear Security Administration US Department of Energy July 1, 2018 Exascale Computing Project (ECP) iii ECP-RPT-ST-0001-2018 REVISION LOG Version Date Description 1.0 July 1, 2018 ECP ST Capability Assessment Report Exascale Computing Project (ECP) iv ECP-RPT-ST-0001-2018 EXECUTIVE SUMMARY The Exascale Computing Project (ECP) Software Technology (ST) Focus Area is responsible for developing critical software capabilities that will enable successful execution of ECP applications, and for providing key components of a productive and sustainable Exascale computing ecosystem that will position the US Department of Energy (DOE) and the broader high performance (HPC) community with a firm foundation for future extreme-scale computing capabilities. This ECP ST Capability Assessment Report (CAR) provides an overview and assessment of current ECP ST capabilities and activities, giving stakeholders and the broader HPC community information that can be used to assess ECP ST progress and plan their own efforts accordingly. ECP ST leaders commit to updating this document on regular basis (targeting approximately every six months). Highlights from the report are presented here. Origin of the ECP ST CAR: The CAR is a follow-on and expansion of the ECP Software Technology Gap Analysis [1]. The CAR includes a gap analysis as well as a broader description of ECP ST efforts, capabilities and plans. The Exascale Computing Project Software Technology (ECP ST) focus area represents the key bridge between Exascale systems and the scientists developing applications that will run on those platforms: ECP ST efforts contribute to 89 software products (Section 3.1) in five technical areas (Table1). 33 of the 89 products are broadly used in the HPC community and require substantial investment and transformation in preparation for Exascale architectures. An additional 23 are important to some existing applications and typically represent new capabilities that enable new usage models for realizing the potential that Exascale platforms promise. The remaining products are in early development phases, addressing emerging challenges and opportunities that Exascale platforms present. Programming Models & Runtimes: ECP ST is developing key enhancements to MPI and OpenMP, addressing in particular the important design and implementation challenges of combining massive inter-node and intra-node concurrency into an application. We are also developing a diverse collection of products that further address next generation node architectures, to improve realized performance, ease of expression and performance portability. Development Tools: We are enhancing existing widely used performance tools and developing new tools for next-generation platforms. As node architectures become more complicated and concurrency even more necessary, impediments to performance and scalability become even harder to diagnose and fix. Development tools provide essential insight into these performance challenges and code transformation and support capabilities that help software teams generate efficient code, utilize new memory systems and more. Mathematical Libraries: High performance scalable math libraries have enabled parallel execution of many applications for decades. ECP ST is providing the next generation of these libraries to address needs for latency hiding, improved vectorization, threading and strong scaling. In addition, we are addressing new demands for system-wide scalability including improved support for coupled systems and ensemble calculations. The math libraries teams are also spearheading the software development kit (SDK) initiative that is a pillar of the ECP ST software delivery strategy (Section 1.2.1). Data & Visualization: ECP ST has a large collection of data management and visualization products that provide essential capabilities for compressing, analyzing, moving and managing data. These tools are becoming even more important as the volume of simulation data we produce grows faster than our ability to capture and interpret it. SW Ecosystem & Delivery: This new technical area of ECP ST provides important enabling tech- nologies such as containers and experimental OS environments that allow ECP ST to provide requirements, analysis and design input for vendor products. This area also provides the critical resources and staffing that will enable ECP ST to perform continuous integration testings, and product releases. Finally, this area engages with software and system vendors, and DOE facilities staff to assure coordinated planning and support of ECP ST products. ECP ST Software Delivery mechanisms: ECP ST delivers software capabilities to users via several mechanisms (Section3). Almost all products are delivered via source codes to at least some of their users. Each of the major DOE computing facilities provides direct support of some users for about 20 ECP ST products. About 10 products are available via vendor software stack and via binary distributions such as Linux distributions. Exascale Computing Project (ECP) v ECP-RPT-ST-0001-2018 ECP ST Project Restructuring: ECP ST completed a significant restructuring in November 2017 (Section 1.3). We reduced the number of technical areas from 8 to 5 and reduced the number of L4 projects significantly by simplifying the organization of ATDM projects. We introduced new projects for software development kits that are a key organizational feature for designing, testing and delivering our software. Finally, we introduced a new technical area (SW Ecosystem & Delivery) that provides the critical capabilities we need for delivering a sustainable software ecosystem. ECP ST Project Overviews: A significant portion of this report includes 2-page synopses of each ECP ST project (Section4), including a project overview, key challenges, solution strategy, recent progress and next steps. Project organization: ECP ST has established a tailored project management structure using impact goals/metrics, milestones, regular project-wide video meetings, monthly and quarterly reporting, and an annual review process. This structure supports project-wide communication, and coordinated planning and development that enables 55 projects and more than 250 contributors to create the ECP ST software stack. Exascale Computing Project (ECP) vi ECP-RPT-ST-0001-2018 TABLE OF CONTENTS EXECUTIVE SUMMARY v LIST OF FIGURES x LIST OF TABLES xiii 1 Introduction 1 1.1 Background..............................................1 1.2 ECP Software Technology Approach................................4 1.2.1 Software Development Kits.................................4 1.2.2 ECP ST Software Delivery.................................6 1.3 ECP ST Project Restructuring...................................8 1.4 New Project Efforts.........................................8 1.4.1 FFTs............................................. 12 1.4.2 LLNL Math Libraries.................................... 13 2 ECP ST Technical Areas 14 2.1 Programming Models & Runtimes................................. 14 2.1.1 Scope and Requirements................................... 14 2.1.2 Assumptions and Feasibility................................. 14 2.1.3 Objectives..........................................