Application Developers and Their Industry Partners Are Working To

S&TR March 2017 Application developers and their industry partners are working to achieve both performance and cross-platform portability as they ready science applications for the arrival of Livermore’s next flagship supercomputer. 4 Lawrence Livermore National Laboratory S&TR March 2017 Supercomputing Advancements A Center of Excellence Prepares for N November 2014, then Secretary of I Energy Ernest Moniz announced a partnership involving IBM, NVIDIA, and Mellanox to design and deliver high-performance computing (HPC) systems for Lawrence Livermore and Oak Ridge national laboratories. (See S&TR, March 2015, pp. 11–15.) The Livermore system, Sierra, will be the latest in a series of leading-edge Advanced Simulation and Computing (ASC) Program supercomputers, whose predecessors include Sequoia and BlueGene/L, Purple, White, and Blue Pacific. As such, Sierra will be expected to help solve the most demanding computational challenges faced by the National Nuclear Security Administration’s (NNSA’s) ASC Program in furthering its stockpile stewardship mission. Lawrence Livermore National Laboratory 5 Supercomputing Advancements S&TR March 2017 At peak speeds of up to 150 petaflops from one processor type to the other based depends on those applications running (a petaflop is 1015 floating-point on computational requirements. (CPUs smoothly. “The introduction of GPUs operations per second), Sierra is projected are the traditional number-crunchers of as accelerators into the production ASC to provide at least four to six times the computing. GPUs, originally developed environment at Livermore, starting with performance of Sequoia, Livermore’s for graphics-intensive applications the delivery of Sierra, will be disruptive current flagship supercomputer. such as computer games, are now being to our applications,” admits Rob Neely. Consuming “only” 11 megawatts, Sierra incorporated into supercomputers to “However, Livermore chose the GPU will also be about five times more power improve speed and reduce energy usage.) accelerator path only after concluding, efficient. The system will achieve this Powerful hybrid computing units known first, that performance-portable solutions combination of power and efficiency as nodes will each contain multiple CPUs would be available in that timeframe through a heterogeneous architecture and GPUs connected by an NVLink and, second, that the use of GPU that pairs two types of processors—IBM network that transfers data between accelerators would likely be prominent in Power9 central processing units (CPUs) components at high speeds. In addition future systems.” and NVIDIA Volta graphics processing to CPU and GPU memory, the complex To run efficiently on Sequoia, units (GPUs)—so that programs can shift node architecture incorporates a generous applications must be modified to achieve a amount of nonvolatile random-access level of task division and coordination well memory, providing memory capacity for beyond what previous systems demanded. The Department of Energy has contracted with many operations historically relegated to Building on the experience gained through an IBM-led partnership to develop and deliver far slower disk-based storage. Sequoia, and integrating Sierra’s ability advanced supercomputing systems to Lawrence These hardware features— to effectively use GPUs, Livermore Livermore and Oak Ridge national laboratories heterogeneous processing elements, fast HPC experts are hoping that application beginning this year. These powerful systems networking, and use of different memory developers—and their applications—will are designed to maximize speed and minimize types—anticipate trends in HPC that be better positioned to adapt to whatever energy consumption to provide cost-effective are expected to continue in subsequent hardware features future generations of modeling, simulation, and big data analytics. generations of systems. However, these supercomputers have to offer. The primary mission for the Livermore machine, innovations also represent a seismic Sierra, will be to run computationally demanding architectural shift that poses a significant A Platform for Engagement calculations to assess the state of the nation’s challenge for both scientific application IBM will begin delivery of Sierra in nuclear stockpile. developers and the researchers whose work late 2017, and the machine will assume its Compute node Compute system Central processing unit (CPU) Memory: 2,1–2.7 petabytes (PB) Graphics processing unit (GPU) Speed: 120–150 petaflops Solid-state drive: 800 gigabytes (GB) Power: 11 megawatts High-bandwidth coherent shared memory: 512 GB CPU GPU File system CPU–GPU Usable storage: 120 PB interconnect Bandwidth: 1.0 terabytes/second 6 Lawrence Livermore National Laboratory S&TR March 2017 Supercomputing Advancements Current homogeneous node Sierra’s heterogeneous node Memory Memory CPU CPU GPU Core Core Lower complexity Higher complexity full ASC role by late 2018. Recognizing Compared to today’s relatively simpler nodes, cutting-edge Sierra will feature nodes combining several the complexity of the task before them, types of processing units, such as central processing units (CPUs) and graphics-processing units (GPUs). Livermore physicists, computer scientists, This advancement offers greater parallelism—completing tasks in parallel rather than serially—for and their industry partners began faster results and energy savings. Preparations are underway to enable Livermore’s highly sophisticated preparing applications for Sierra shortly computing codes to run efficiently on Sierra and take full advantage of its leaps in performance. after the contract was awarded to the IBM partnership. The vehicle for these preparations is a nonrecurring engineering that scientists can start running their helps make powerful HPC resources (NRE) contract, a companion to the applications as soon as possible on available to all areas of the Laboratory. master contract for building Sierra that is Sierra—so that the transition sparks One way the M&IC Program achieves structured to accelerate the new system’s discovery rather than being a hindrance. this goal is by investing in smaller-scale development and enhance its utility. In turn, the interactions give IBM and versions of new ASC supercomputers, “Livermore contracts for new machines NVIDIA deeper insight into how real- such as Vulcan, a quarter-sized version of always devote a separate sum to enhance world scientific applications run on their Sequoia. Sierra will also have a scaled- development and make the system even new hardware. Neely observes that the down counterpart slated for delivery in better,” explains Neely. “Because we buy new COE also dovetails nicely with 2018. To prepare codes to run on this new machines so far in advance, we can work Livermore’s philosophy of codesign, system in the same manner as the COE with the vendors to enhance some features that is, working closely with vendors is doing with ASC codes, institutional and capabilities.” For example, the Sierra to create first-of-their-kind computing funding was allocated in 2015 to establish NRE calls for investigations into GPU systems, a practice stretching back to the a new component of the COE, named the reliability and advanced networking Laboratory’s founding in 1952. Institutional COE, or iCOE. capabilities. The contract also calls for The COE features two major To qualify for COE or iCOE support, the creation of a Center of Excellence thrusts—one for ASC applications and applications were evaluated according (COE)—a first for a Livermore NRE— another for non-ASC applications. to mission needs, structural diversity, to foster more intensive and sustained The ASC component, which is funded and, for iCOE codes, topical breadth. engagement than usual among domain by the multilaboratory ASC Program Bert Still, iCOE’s lead, says, “We chose scientists, application developers, and and led by Neely, mirrors what other to fund preparation of a strategically vendor hardware and software experts. national laboratories awaiting delivery important subset of codes that touches The COE provides Livermore of new HPC architectures over the on every discipline at the Laboratory.” application teams with direct access to coming years are doing. The non-ASC Applications selected by iCOE include vendor expertise and troubleshooting component, however, was pioneered by those that help scientists understand as codes are optimized for the new Lawrence Livermore, specifically the earthquakes, refine laser experiments, architecture. (See the box on p. 9.) Multiprogrammatic and Institutional or test machine learning methods. Such engagement will help ensure Computing (M&IC) Program, which For instance, the machine learning Lawrence Livermore National Laboratory 7 Supercomputing Advancements S&TR March 2017 application chosen (see S&TR, June Performance and Portability collaborate with Livermore application 2016, pp. 16–19) uses data analytics Nearly every major domain of physics teams through the COE on an ongoing techniques considered particularly in which the massive ASC codes are used basis, with several of them at the likely to benefit from some of Sierra’s has received some COE attention, and Livermore campus full or part time. Neely architectural features because of their nine of ten iCOE applications are already notes that the COE’s founders considered memory-intensive nature. Applications in various stages of preparation. A range it crucial for vendors to be equal under COE purview make up

Application Developers and Their Industry Partners Are Working To

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support