2017 Annual Report on the Cover: This Visualization Shows a Small Portion of a Cosmology Simulation Tracing Structure Formation in the Universe
Total Page:16
File Type:pdf, Size:1020Kb
ARGONNE LEADERSHIP COMPUTING FACILITY 2017 Annual Report On the cover: This visualization shows a small portion of a cosmology simulation tracing structure formation in the universe. The gold-colored clumps are high-density regions of dark matter that host galaxies in the real universe. This BorgCube simulation was carried out on Theta by the HACC team as part of the ALCF’s Theta Early Science Program. Image: Joseph A. Insley, Silvio Rizzi, and the HACC team, Argonne National Laboratory CONTENTS 2017 Annual Report 02 Year in Review 10 RedefiningHPC 05 Director’s Message 12 The ALCF’s New Paradigm: Simulation, Data, and 06 Operations and Science Updates Learning 08 ALCF at a Glance 14 Aurora: Argonne’s Future Exascale Supercomputer 22 Theta: A New Architecture for Simulation, Data, and Learning 34 Preparing for the Future of HPC 38 Growing the HPC Community 52 Expertise and Resources 40 Partnering with Industry for High-Impact Research 54 The ALCF Team 42 Shaping the Future of Supercomputing 60 ALCF Computing Resources 46 Engaging Current and Future HPC Researchers 64 Enabling Science with HPC Services and Tools 70 Science 72 Accessing ALCF 77 Direct Numerical Simulation of 81 Data-Driven Molecular 85 Advancing the Scalability Compressible, Turbulent Flow Engineering of Solar-Powered of LHC Workflows to Resources for Science Jonathan Poggie Windows Enable Discoveries at the Jacqueline M. Cole Energy Frontier 74 Large-Scale Computing and 78 Multiphase Simulations of Taylor Childers Visualization on the Nuclear Reactor Flows 82 Petascale Simulations for Connectomes of the Brain Igor Bolotnov Layered Materials Genome 86 Fundamental Properties Doga Gursoy, Aiichiro Nakano of QCD Matter Produced Narayanan Kasthuri 79 Atomistic Simulations at RHIC and the LHC of Nanoscale Oxides 83 Predictive Modeling of Claudia Ratti 75 Frontiers in Planetary and and Oxide Interfaces Functional Nanoporous Stellar Magnetism Through Subramanian Sankaranarayanan Materials, Nanoparticle 87 Kinetic Simulations of High-Performance Computing Assembly, and Reactive Relativistic Radiative Magnetic Jonathan Aurnou 80 Computational Engineering Systems Reconnection of Defects in Materials J. Ilja Siepmann Dmitri Uzdensky 76 Quantification of Uncertainty for Energy and Quantum in Seismic Hazard Using Information Applications 84 State-of-the-Art Simulations Physics-Based Simulations Marco Govoni of Liquid Phenomena 88 ALCF Projects Thomas H. Jordan Mark Gordon 94 ALCF Publications YEAR IN REVIEW The Argonne Leadership Computing Facility (ALCF) provides supercomputing resources and expertise to the research community to accelerate the pace of discovery and innovation in a broad range of science and engineering disciplines. As part of a Theta Early Science Program project, a research team led by Giulia Galli focused on performing high-performance calculations of nanoparticles and aqueous systems for energy applications. Image: Nicholas Brawand, The University of Chicago DIRECTOR’S MESSAGE Operating leadership-class systems for the research community means operating at the edge of computing capabilities, while offering the greatest capacity for breakthrough science. The ALCF performed its mission in 2017—launching a next-generation system, supporting a growing number of simulation and data science projects, and training future users—while making substantial progress toward new collaborations and new capabilities for new workloads. The addition of the ALCF’s latest supercomputer, Theta, doubled our capacity to do impactful science and marks the next waypoint in our journey to exascale computing. Notably, nearly all of the Theta Early Science projects reported significant science results, capping off another successful Early Science Program used to ready a new architecture for a diverse workload on day one. Michael E. Papka ALCF Director The ALCF is now operating at the frontier of data centric and high-performance supercomputing, with powerful capabilities for equally supporting data, learning, and simulation projects. We will continue to bolster the data and learning programs alongside the simulation science program as we prepare for the ALCF’s future exascale system, Aurora, expected in 2021. Our goal of science on day one, in all three areas, requires much work: Aurora will require not only innovation in parallel processing and network and storage performance, but also new software and tools to process the vast quantities of data needed to train deep learning networks to find patterns in data. We are also adding new services that make our computing and data analysis capabilities available to scientific users working in far-flung places, such as large experiments and observatories. In 2017, ALCF staff deployed a pilot program to allow easier access to ALCF systems by giving external users the ability to inject jobs from outside the facility. Other possible future services include data transfer workflows and co-scheduling betweenALCF systems. I want to take a moment to recognize the many staff-directed training, outreach, and mentoring efforts, from the summer student research program to the Argonne Training Program on Extreme-Scale Computing to our many topical seminars and hands-on workshops that serve our users of today and prepare our users of tomorrow. This extends to the growing number of activities at Argonne’s Joint Laboratory for Systems Evaluation as well, where ALCF researchers evaluate future HPC systems components and platforms. This annual report offers readers a deep dive into the many activities happening at the ALCF. As the ALCF has done throughout its history, we remain dedicated to supporting our users’ needs. It is a privilege to serve the science teams that use our resources to turn data into insights and competitive advantages. Most of all, I am deeply proud of the staff for what they do every day for our facility. ALCF 2017 ANNUAL REPORT 05 YEAR IN REVIEW Operations and Science Updates Operations When Theta, our new Intel-Cray system, entered production mode in July, we began providing our user community with a new resource equipped with advanced capabilities for research involving simulation, data analytics, and machine learning techniques. Along with Mira and Cetus, we now operate three systems on the Top 500 list of the fastest supercomputers in the world. Our goal is to ensure that our supercomputers and all of the supporting resources that go along with them—computing clusters, storage systems, network infrastructure, systems environments, and software tools—are stable, secure, and highly available to users. In 2017, our team worked tirelessly to integrate Theta into our supercomputing environment. Over the course of the year, we performed two upgrades on the system, bringing it to 24 racks, with 4,392 nodes and a peak performance Mark Fahey ALCF Director of Operations of 11.69 petaflops. We have also been heavily involved in preparations for our future exascale system, Aurora. This effort includes working closely with Intel and Cray to provide guidance on architectural features and the system’s software stack. This collaborative work will only intensify as we work toward delivering Aurora in 2021. In addition to working with the machines, we also interact directly with users, providing technical support to resolve issues as they arise. In 2017, we supported 1,000 allocations on five machines, on-boarded nearly 600 new users, and addressed more than 6,500 support tickets. We also deployed two new services to support the evolving needs of our user community. Our Jupyter Notebook service provides staff members and users with a web-based tool that is great for supporting collaborative research, as well as teaching. HTCondor-CE, a gateway software service that enables remote job submission, makes it easier for large-scale collaborative projects, like CERN’s Large Hadron Collider, to integrate ALCF computing resources into their workflows. Finally, we developed a new web application for Director’s Discretionary allocation requests. The new site allows us to better manage the request process by enabling staff members to easily filter and search for reviews, and check on the status of previous requests. 0 6 ALCF.ANL.GOV Science With Theta and Mira now operating alongside each other, we have two diverse petascale architectures to power scientific discoveries and engineering innovations. To prepare Theta to deliver science on day one, our Early Science Program (ESP) gave research teams pre-production time on the system for science and code development projects. The program was a great success, with teams achieving both application performance improvements and science advances, while paving the way for the larger ALCF user community to run on the supercomputer. In 2017, our users consumed more than 8 billion combined core-hours on our systems. Their research produced some impressive results, including Katherine Riley publications in high-impact journals, such as Science, Nature, and Physical ALCF Director of Science Review Letters. In total, our users published more than 200 papers this year. In addition to delivering scientific achievements, ALCF users also made advances in computational methods, such as data mining and deep learning, demonstrating new tools to enable data-driven discoveries. Much of this research has been enabled by our ALCF Data Science Program (ADSP), which expanded from four to eight projects in 2017. The use of emerging computational methods aligns with our paradigm shift—to support data and learning approaches alongside