S&TR March 2017
Application developers and their industry partners are working to achieve both performance and cross-platform portability as they ready science applications for the arrival of Livermore’s next flagship supercomputer.
4 Lawrence Livermore National Laboratory S&TR March 2017 Supercomputing Advancements
A Center of Excellence Prepares for
N November 2014, then Secretary of I Energy Ernest Moniz announced a partnership involving IBM, NVIDIA, and Mellanox to design and deliver high-performance computing (HPC) systems for Lawrence Livermore and Oak Ridge national laboratories. (See S&TR, March 2015, pp. 11–15.) The Livermore system, Sierra, will be the latest in a series of leading-edge Advanced Simulation and Computing (ASC) Program supercomputers, whose predecessors include Sequoia and BlueGene/L, Purple, White, and Blue Pacific. As such, Sierra will be expected to help solve the most demanding computational challenges faced by the National Nuclear Security Administration’s (NNSA’s) ASC Program in furthering its stockpile stewardship mission.
Lawrence Livermore National Laboratory 5 Supercomputing Advancements S&TR March 2017
At peak speeds of up to 150 petaflops from one processor type to the other based depends on those applications running (a petaflop is 1015 floating-point on computational requirements. (CPUs smoothly. “The introduction of GPUs operations per second), Sierra is projected are the traditional number-crunchers of as accelerators into the production ASC to provide at least four to six times the computing. GPUs, originally developed environment at Livermore, starting with performance of Sequoia, Livermore’s for graphics-intensive applications the delivery of Sierra, will be disruptive current flagship supercomputer. such as computer games, are now being to our applications,” admits Rob Neely. Consuming “only” 11 megawatts, Sierra incorporated into supercomputers to “However, Livermore chose the GPU will also be about five times more power improve speed and reduce energy usage.) accelerator path only after concluding, efficient. The system will achieve this Powerful hybrid computing units known first, that performance-portable solutions combination of power and efficiency as nodes will each contain multiple CPUs would be available in that timeframe through a heterogeneous architecture and GPUs connected by an NVLink and, second, that the use of GPU that pairs two types of processors—IBM network that transfers data between accelerators would likely be prominent in Power9 central processing units (CPUs) components at high speeds. In addition future systems.” and NVIDIA Volta graphics processing to CPU and GPU memory, the complex To run efficiently on Sequoia, units (GPUs)—so that programs can shift node architecture incorporates a generous applications must be modified to achieve a amount of nonvolatile random-access level of task division and coordination well memory, providing memory capacity for beyond what previous systems demanded. The Department of Energy has contracted with many operations historically relegated to Building on the experience gained through an IBM-led partnership to develop and deliver far slower disk-based storage. Sequoia, and integrating Sierra’s ability advanced supercomputing systems to Lawrence These hardware features— to effectively use GPUs, Livermore Livermore and Oak Ridge national laboratories heterogeneous processing elements, fast HPC experts are hoping that application beginning this year. These powerful systems networking, and use of different memory developers—and their applications—will are designed to maximize speed and minimize types—anticipate trends in HPC that be better positioned to adapt to whatever energy consumption to provide cost-effective are expected to continue in subsequent hardware features future generations of modeling, simulation, and big data analytics. generations of systems. However, these supercomputers have to offer. The primary mission for the Livermore machine, innovations also represent a seismic Sierra, will be to run computationally demanding architectural shift that poses a significant A Platform for Engagement calculations to assess the state of the nation’s challenge for both scientific application IBM will begin delivery of Sierra in nuclear stockpile. developers and the researchers whose work late 2017, and the machine will assume its
Compute node Compute system Central processing unit (CPU) Memory: 2,1–2.7 petabytes (PB) Graphics processing unit (GPU) Speed: 120–150 petaflops Solid-state drive: 800 gigabytes (GB) Power: 11 megawatts High-bandwidth coherent shared memory: 512 GB
CPU GPU
File system CPU–GPU Usable storage: 120 PB interconnect Bandwidth: 1.0 terabytes/second
6 Lawrence Livermore National Laboratory S&TR March 2017 Supercomputing Advancements
Current homogeneous node Sierra’s heterogeneous node
Memory Memory
CPU CPU
GPU Core Core
Lower complexity Higher complexity
full ASC role by late 2018. Recognizing Compared to today’s relatively simpler nodes, cutting-edge Sierra will feature nodes combining several the complexity of the task before them, types of processing units, such as central processing units (CPUs) and graphics-processing units (GPUs). Livermore physicists, computer scientists, This advancement offers greater parallelism—completing tasks in parallel rather than serially—for and their industry partners began faster results and energy savings. Preparations are underway to enable Livermore’s highly sophisticated preparing applications for Sierra shortly computing codes to run efficiently on Sierra and take full advantage of its leaps in performance. after the contract was awarded to the IBM partnership. The vehicle for these preparations is a nonrecurring engineering that scientists can start running their helps make powerful HPC resources (NRE) contract, a companion to the applications as soon as possible on available to all areas of the Laboratory. master contract for building Sierra that is Sierra—so that the transition sparks One way the M&IC Program achieves structured to accelerate the new system’s discovery rather than being a hindrance. this goal is by investing in smaller-scale development and enhance its utility. In turn, the interactions give IBM and versions of new ASC supercomputers, “Livermore contracts for new machines NVIDIA deeper insight into how real- such as Vulcan, a quarter-sized version of always devote a separate sum to enhance world scientific applications run on their Sequoia. Sierra will also have a scaled- development and make the system even new hardware. Neely observes that the down counterpart slated for delivery in better,” explains Neely. “Because we buy new COE also dovetails nicely with 2018. To prepare codes to run on this new machines so far in advance, we can work Livermore’s philosophy of codesign, system in the same manner as the COE with the vendors to enhance some features that is, working closely with vendors is doing with ASC codes, institutional and capabilities.” For example, the Sierra to create first-of-their-kind computing funding was allocated in 2015 to establish NRE calls for investigations into GPU systems, a practice stretching back to the a new component of the COE, named the reliability and advanced networking Laboratory’s founding in 1952. Institutional COE, or iCOE. capabilities. The contract also calls for The COE features two major To qualify for COE or iCOE support, the creation of a Center of Excellence thrusts—one for ASC applications and applications were evaluated according (COE)—a first for a Livermore NRE— another for non-ASC applications. to mission needs, structural diversity, to foster more intensive and sustained The ASC component, which is funded and, for iCOE codes, topical breadth. engagement than usual among domain by the multilaboratory ASC Program Bert Still, iCOE’s lead, says, “We chose scientists, application developers, and and led by Neely, mirrors what other to fund preparation of a strategically vendor hardware and software experts. national laboratories awaiting delivery important subset of codes that touches The COE provides Livermore of new HPC architectures over the on every discipline at the Laboratory.” application teams with direct access to coming years are doing. The non-ASC Applications selected by iCOE include vendor expertise and troubleshooting component, however, was pioneered by those that help scientists understand as codes are optimized for the new Lawrence Livermore, specifically the earthquakes, refine laser experiments, architecture. (See the box on p. 9.) Multiprogrammatic and Institutional or test machine learning methods. Such engagement will help ensure Computing (M&IC) Program, which For instance, the machine learning
Lawrence Livermore National Laboratory 7 Supercomputing Advancements S&TR March 2017
application chosen (see S&TR, June Performance and Portability collaborate with Livermore application 2016, pp. 16–19) uses data analytics Nearly every major domain of physics teams through the COE on an ongoing techniques considered particularly in which the massive ASC codes are used basis, with several of them at the likely to benefit from some of Sierra’s has received some COE attention, and Livermore campus full or part time. Neely architectural features because of their nine of ten iCOE applications are already notes that the COE’s founders considered memory-intensive nature. Applications in various stages of preparation. A range it crucial for vendors to be equal under COE purview make up only a of onsite and offsite activities has been partners in projects, rather than simply small fraction of the several hundred organized to bolster these efforts, such as consultants, and that the vendors’ work Livermore-developed codes presently in training classes, talks, workshops focused align with and advance their own research use. The others will be readied for Sierra on hardware and application topics, and in addition to Livermore’s. For instance, later, in a phased fashion, so that lessons hackathons. At multiday hackathons Livermore and IBM computer scientists learned by the COE teams will likely held at the IBM T. J. Watson Research plan to jointly author journal articles and save significant time and effort. Still Center in New York, Livermore and IBM are currently organizing a special journal says, “At the end of the effort, we will personnel have focused on applications, issue on application preparedness to share have some codes ready to run, but just as leveraging experience gained with early what they have learned with the broader importantly, we will have captured and versions of Sierra compilers. (Compilers HPC and application development spread knowledge about how to prepare are specialized pieces of software that communities. A “Hit Squad” for Troubleshooting our codes.” Although some applications convert programmer-created source code Application teams are also sharing do not necessarily need to be run at the into the machine-friendly binary code ideas and experiences with their highest available computational level to that a computer understands.) With all counterparts at the other four major fulfill mission needs, their developers will the relevant experts gathered together in Department of Energy HPC facilities. still need to familiarize themselves with one place, hackathon groups were able to All are set to receive new leadership- GPU technology, because current trends quickly identify and resolve a number of class HPC systems in the next few years suggest this approach will soon trickle compatibility issues between compilers and are making preparations through down to the workhorse Linux cluster and applications. their own COEs. Last April, technical machines relied on by the majority of In addition to such special events, a experts from the facilities’ COEs and users at the Laboratory. group of 14 IBM and NVIDIA personnel their vendors attended a joint workshop on application preparation. Participants demonstrated a strong shared interest Livermore’s Center of Excellence (COE), jointly funded by the Laboratory and the multilaboratory in developing portable programming Advanced Simulation and Computing (ASC) Program, aims to accelerate application preparation strategies—methods that ensure their for Sierra through close collaboration among experts in hardware and software, as shown in this applications continue to run well not workflow illustration. just on multiple advanced architectures but also on the more standard cluster technologies. Neely, who chaired the Inputs COE App ication and e i raries ands on co a orations Outcomes n site vendor e pertise ode deve opers odes tuned or Sierra p at orm ustomi ed training