New Computational Heights

Home , ASC Purple, Sequoia (supercomputer), Sierra (supercomputer)

S&TR July/August 2013

NEW COMPUTATIONAL HEIGHTS After demonstrating stellar performance in early science runs, Livermore’s newest and largest supercomputer has assumed dual roles as a platform for SEQUOIA stockpile stewardship research and exascale computing preparation.

4 Lawrence Livermore National Laboratory S&TR July/August 2013 Sequoia

NEW COMPUTATIONAL HEIGHTS

T Lawrence Livermore and across In 2006, the 100-teraflop (trillion SEQUOIA A the National Nuclear Security floating-point operations) Advanced Administration’s (NNSA’s) complex, Simulation and Computing (ASC) Purple leadership in supercomputing is not only supercomputer delivered the first-ever a core competency but also essential to three-dimensional, full-physics nuclear the national security mission. Demands weapons simulation, and the 360-teraflop on computer systems used for weapons BlueGene/L was in the midst of its three- assessments and certification continue year reign as the world’s fastest computer. to grow as the nuclear stockpile moves While these two Livermore systems were further from the test base against which still setting records, leaders in NNSA’s simulations are calibrated. Simulation ASC Program—the organization that capabilities are also strained because integrates work by researchers at Los weapons behavior spans such a range Alamos, Lawrence Livermore, and Sandia of timescales, from detonation, which national laboratories to develop nuclear happens on the micro- to millisecond weapons simulation tools—were planning scale, to nuclear fusion, which lasts mere for a new computing system that would be femtoseconds. delivered in 2012. Supplying the next level The perpetual push for higher of performance necessary for supporting resolution computer simulations and faster stockpile stewardship work, they soon number crunching is motivated by a desire realized, would require the most ambitious to verify, with high confidence and without leap in computing ever attempted. They resorting to nuclear testing, that a critical needed a machine that could deliver 12 national resource remains secure and to 24 Purple-class weapons calculations functional. Says Fred Streitz, director of simultaneously to credibly evaluate the Livermore’s High Performance Computing uncertainties in weapons physics. The Innovation Center, “Every time we jump machine also needed to have 20 to to a new generation of supercomputers, we 50 times the capability of BlueGene/L open another window through which we for running materials science codes. can look to discover new science, and that These performance milestones formed helps in our quest to understand materials the design basis for Sequoia, the “serial and phenomena with a high degree number one” IBM BlueGene/Q machine. of accuracy.” The newest and largest member of

Lawrence Livermore National Laboratory 5 Sequoia S&TR July/August 2013

Livermore’s supercomputing arsenal, and exascale machines some 100 times limit in size and an upper limit in speed. Sequoia has a peak performance speed of faster than today’s top performer (an Although individual transistors could be 20 petaflops (quadrillion floating-point exaflop machine would perform at least pushed to run yet faster, further speeding operations per second). IBM designed 1 quintillion floating-point operations up the millions of transistors on a typical BlueGene/Q in partnership with Lawrence per second). While BlueGene/Q is still microprocessor would drive energy Livermore and Argonne national grounded in today’s computer architecture, demands and operational costs for large- laboratories, with national laboratory it introduces hardware features and scale computing to unsupportable levels. researchers providing user input at every supports programming models likely to Researchers are therefore seeking new stage of development. Livermore experts carry over to and potentially proliferate in ways of designing processors to increase worked to create a machine that could tomorrow’s systems. their performance, while reducing energy be programmed with relative ease, while According to Michel McCoy, ASC consumption enough to make the units helping to explore and prepare for future program director at Livermore, computer affordable to operate. One promising computer architectures. Most critically, architecture is undergoing its toughest approach, exemplified by IBM’s BlueGene the machine would provide an effective transition in the past 70 years. The last series, is using large numbers of relatively platform for the existing trove of weapons truly revolutionary design shift came in the low-frequency, simple processing units to codes, in which NNSA has invested mid-1990s, when groups of interconnected achieve high cumulative performance with several billion dollars and more than cache-based microprocessors became moderate energy usage. With a peak power 15 years of effort. standard in computers of every scale. requirement of less than 10 megawatts Succeeding generations of these and the capability to perform 2.1 billion Energy Drives Hardware Decisions microprocessors have grown faster by floating-point operations per second per A breakthrough system with over boosting the speed and shrinking the size watt, BlueGene/Q is among the world’s 1.5 million processor units, or cores, of transistors, effectively packing more most energy-efficient supercomputers. and 1.6 petabytes of memory, Sequoia calculations into every unit of time and BlueGene/Q’s ingenious hardware serves as a bridge, design-wise, between space occupied by the computer. But design begins with the chip. Processors, supercomputers of the past 15 years now transistors are approaching a lower memory, and networking logic are all

Many Core Era This chart shows the evolution of Lawrence Livermore supercomputers Sequoia Petascale throughout the Laboratory’s history. Petascale BlueGene/L upgrade Dawn Performance advances in early systems BlueGene/L came through increases in the number of vacuum tubes and other components. Terascale In the late 1970s, hardware companies Terascale started building machines that used Purple vector processing to manipulate data and speed up computing. Beginning in the mid-1990s, cache-based microprocessors Gigascale were tied together through a message- passing interface; subsequent increases in speed resulted from processor Megascale technology improvements. Sequoia Megascale stands on the threshold of yet another era, which could be dominated by energy- efficient, multicore systems. Kiloscale Kiloscale

Mainframe Era Vector Era Distributed 1952 1979 1993 Memory Era 2013

6 Lawrence Livermore National Laboratory S&TR July/August 2013 Sequoia

Compute Card 1 chip module Processing unit with L1 cache featuring list prefetcher The 20-petaflop BlueGene/Q Compute Chip 18 processor units (quadrillion floating-point operations) Sequoia supercomputer boasts more than 1.5 million processing units, or cores, arranged in a compact, water-cooled, Node Card energy-efficient system. 32 compute cards Its processing power is compressed into 96 racks, each the size of a large refrigerator, arrayed across just 3,400 square feet of floor space. Hardware features of the BlueGene/Q compute Integrated Chip-to-chip Shared L2 chip are highlighted in the memory communication cache with controller logic speculative dye photograph at far left, memory where L1, L2 = Level 1, 2. features Sequoia 96 racks, 3,072 node cards, 20 petaflops integrated into an exceptionally energy- efficient single chip that operates at both lower voltage and lower frequency than many of the chips found in leading supercomputers of the past decade. Each BlueGene/Q compute chip contains 18 cores—more than four times the core density of its recent predecessor, the BlueGene/P chip. Sixteen cores are used for computation, while one runs the operating system and another serves as a backup to increase the yield of usable these racks, for a total of just fewer than data from a nearby memory cache but chips. Each core has a relatively small 100,000 16-core processors. instead must fetch data on a separate amount of nearby memory for fast data memory chip or another module. As the access, called a Level 1 cache, and all the Bandwidth as Performance Inhibitor volume of cores and tasks has grown, the cores on the chip share a larger Level 2 Modern supercomputers, including rate at which the system can retrieve data cache. Each compute chip is packaged BlueGene/Q, have several types of for the processor in order to complete with memory chips and water-cooling on-chip memory storage. “However, calculations (known as bandwidth) has components into a compute card, and real problems don’t fit in the cache,” gradually become the limiting factor in compute cards are organized in sets of 32 says Bronis de Supinski, Livermore computer performance, more so than (called a node card). Thirty-two node cards Computing’s chief technology officer. how fast the processor can theoretically comprise a rack, and Sequoia has 96 of Most operations cannot simply access perform calculations.

Lawrence Livermore National Laboratory 7 Sequoia S&TR July/August 2013

Hardware designers have implemented prefetcher. This cache can learn complex where to incorporate these innovations various methods to ameliorate delay memory access patterns and retrieve into the codes. The undertaking began in and improve performance. BlueGene/Q frequently visited data sets from the more 2009, a full three years before Sequoia uses hardware threading, an energy- and distant memory devices, making them arrived. Dawn, a 500-teraflop BlueGene/P space-efficient alternative. Each processor available in the cache when needed. machine, served as the primary platform supports four execution threads that share Improving computational performance for code preparation and scaling. resources. The situation is analogous to and bandwidth through threading and Additional support was provided by four workers who must share a single special memory functions would largely BlueGene/Q simulators, other Livermore tool. While one worker (thread) is waiting be for naught without a high-speed method high-performance computing (HPC) for data retrieval, it can lend the tool to for moving the data between nodes. systems, and starting in 2011, some small- another worker so that the core’s resources Most supercomputers use commodity scale BlueGene/Q prototypes. are always being used. interconnects or Infiniband for their Physicist David Richards observes, Keeping otherwise idle threads or cores primary connections. BlueGene/Q has a “With parallel computers, many people occupied was a goal for the BlueGene/Q novel switching infrastructure that takes mistakenly think that using a million hardware designers. Many parallel advantage of advanced fiber optics at processors will automatically make programs contain shared data and portions every communication level. With the latest a program run a million times faster of code that must be executed by a single generation of BlueGene, IBM has also than with a single processor, but that is process or thread at a time to ensure upgraded the toroidal connection between not necessarily so. The work must be accurate results. Threads attempting to node cards from three to five dimensions divided into pieces that can be performed update or access the same data must take for more robust interconnection and independently, and how well this can turns, through locks or other mechanisms, to halve the maximum latency. In this be done depends on the problem.” For which can cause a queue to form. As the configuration, each node is connected to instance, Richards has found that running number of access attempts increases, 10 others (instead of 6), thereby reducing a 27 million particle molecular-dynamics program performance deteriorates. the maximum distance between nodes from simulation on 2,000 nodes of Sequoia is BlueGene/Q’s multiversioning Level 72 hops to 31. barely faster than doing so on 1,000 nodes. 2 cache supports two new alternatives Accompanying Sequoia and connected Determining the optimal level of task to locks: speculative execution and via high-speed parallelized interconnect division for the given problem is essential. transactional memory. Together, they may is one of the world’s largest file storage Computational physicist Bert Still divides improve application performance and ease systems. The 55-petabyte Grove is the code readiness work into two distinct the programmer’s task compared with designed to manage and store the immense components—scaling out and scaling in. traditional locking methods. amounts of data needed for Sequoia’s Scaling out is modifying the code to perform Using either memory feature requires research functions. effectively across a greater number of nodes that a programmer designate regions in by breaking a problem or calculation into the code as speculative. Instead of waiting Greater Parallelism in Codes smaller but still functional chunks. Using for up-to-date versions of all the data it Knowing that supercomputer hardware more nodes allows researchers to save on needs, which might depend on another and software must function as a team, calculation time, increase the problem size core finishing a computation, a thread is Livermore experts have worked to ensure or resolution, or both. allowed to begin speculatively performing not only that Sequoia’s hardware will be Essentially, the scaling-in task work with the available data. The cache compatible with ASC codes, but also that entails streamlining operations within then checks the work. If the data are fresh, these codes are tuned to run on Sequoia. each node. In the past, programmers the work will be saved, and the system will Because one goal is to improve the codes’ focused on parallelism exclusively at the gain a performance bonus because the labor predictive capability, code preparation node level. Sequoia offers parallelism was completed before the relevant value has required new physics models, new opportunities at lower levels within the was changed by another thread. If the data mathematical representations for the node as well. Sequoia’s individual cores are stale, the speculative work is discarded, models, and algorithms for solving are comparatively slow and equipped and the operation is reexecuted with the these representations at a new scale. with rather simple logic for deciding correct value. In addition, researchers have begun which tasks to run next. The programmer Another innovative memory element is analyzing threading techniques and other is largely responsible for efficiently BlueGene/Q’s intelligent Level 1 cache list hardware features to determine how and organizing and apportioning work, while

8 Lawrence Livermore National Laboratory S&TR July/August 2013 Sequoia

ensuring that the assignment lists fit within programmers can attempt to enhance programming that combines MPI with the small memory of these processing operations by equalizing memory access hardware-threading methods. units. Says physicist Steve Langer, and calculation functions, cycle by cycle. Several projects were launched to “Almost any next-generation machine, assist Livermore computer scientists whatever its architecture, will require the Scaling Down Message Passing with applying these hybrid methods ability to break the code down into small The “Livermore model” for HPC and implementing new programming work units, so the code preparation work programming, originally developed in styles. One such project investigated for Sequoia is an investment in the future.” the early 1990s and designed to scale accomplishing larger tasks using a group An opportunity for parallelism also through petascale system operations, of threads. Strategically deciding when, lies within single-instruction, multiple- relies on the message-passing interface where, and how much threading to add data (SIMD) units. Modern cores are (MPI) communication standard to tie to the code is crucial. Shifting to a more designed to perform several of the processors together for tackling massive thread-focused scheme allows code writers same operations at the same time, for problems. But while MPI is very good to worry less about communication and the highest possible throughput. In the at managing communications up to a more about coordinating thread activity case of Sequoia, a four-wide SIMD few hundred thousand cores, it is not through a strategic combination of locks, configuration allows each core to perform the optimal approach for a million-core speculative execution, and transactional up to eight floating-point operations machine. As a problem is split into more memory. The latter two features have at a time, but doing so quadruples the pieces, the volume of MPI communication just begun to be explored, but Livermore bandwidth requirements. Programmers increases dramatically, reducing available programmers suspect that the new memory must balance their code accordingly. memory and potentially degrading overall features could boost performance in Each core has two functional units, performance. More communication also code sections not already optimized for one for floating-point math and one uses more electricity. Exploiting the full threading and locks. for integer math and memory loading capabilities and memory bandwidth of Code preparation, enhancement, and and storing operations. Livermore the system demands a hybrid style of testing have been challenging, surprising, and even gratifying. Livermore researchers have found that MPI programming Preparing facility space for Sequoia was a three-year endeavor. Planners organized the 4-foot-high functions better than anticipated on space beneath the rack-level floor (shown here during installation) to hold a dense assortment of Sequoia. In addition, says Still, “While we electrical connections, sprinkler systems, network cables, water-cooling equipment, and custom- knew that Sequoia would be a good HPC designed stands to support the 210-ton machine. throughput engine, we were surprised how good it was as a data analytic machine. Graph analytics is representative of a large class of data science and is relevant to mission work at the Laboratory.” Sequoia maintains the highest ranking on the Graph 500 list, which measures the ability of supercomputers to solve big data problems. (See S&TR, January/February 2013, pp. 4–11.) Sequoia has also presented new scaling complications. While preparing the laser–plasma code pf3D for a simulation requiring 1 trillion zones and 300 terabytes of memory, Langer realized that scaling up the code for Sequoia would require him to parallelize every aspect of the effort, including preparatory calculations and post-simulation data analysis. Tasks

Lawrence Livermore National Laboratory 9 Sequoia S&TR July/August 2013

previously considered relatively trivial the building’s already impressive “green” alarming rates. The integration team began or not particularly data intensive were credentials as a Leadership in Energy and removing cards while IBM performed now too large for a typical workstation Environmental Design gold-rated facility. random dye-injection leak tests. The to handle. Langer notes, “Sequoia is so Facilities manager Anna Maria Bailey tests revealed that the solder attaching big, it challenges our assumptions. We notes, “Sequoia’s electrical, mechanical, chips to their compute cards was, in some keep bumping against the sheer size of and structural requirements are demanding. instances, exhibiting microscopic cracks. the machine.” Preparing for Sequoia really pushed us to Investigation revealed that unevenly think creatively to make things work.” applied force during manufacturing tests Integrating Sequoia into the Family Integrating any first-of-its-kind had damaged the solder on a portion of While software experts were beginning computer system is challenging, but Sequoia’s cards. These cracks had widened code optimization and development work Sequoia was Livermore’s most grueling during thermal cycling, overheating the for Sequoia, engineering and facilities in recent history because of the machine’s memory controllers beneath. teams were preparing to install the new size and complexity. System testing and The Livermore team overcame this and system in the two-level space previously stabilization spanned a full 14 months, other integration hurdles with assistance occupied by ASC Purple at Livermore’s but the integration schedule itself was from IBM computer scientists and Terascale Simulation Facility. (See S&TR, unusually tight, leaving a mere 5 weeks January/February 2005, pp. 22–24.) The between delivery of the last racks and teams first created a three-dimensional the deadline for completing Linpack model of the computer room to determine testing—a performance benchmark used to how to best use the available space, with rank the world’s fastest computers. consideration for sustainability and energy Issues ranged from straightforward efficiency. They then made changes to inconveniences, such as paint chips in the the facility on a coordinated schedule to water system during pump commissioning avoid interrupting existing operations. and faulty adhesives on a bulk power Changes included gradually increasing module gasket, to more puzzling errors, the air and chilled-water temperatures such as intermittent electrical problems to meet Sequoia’s requirements and to caused by connections bent during save energy. rack installation. Sequoia’s cooling Sequoia is equipped with energy- infrastructure also presented some initial efficient features such as a novel 480-volt operational challenges, including uneven electrical distribution system for reducing node-card cooling and false tripping of a energy losses and a water-cooling system leak-detection monitor. that is more than twice as energy efficient A more serious manufacturing as standard air cooling. Incorporating defect was encountered during the final such systems into the space necessitated integration phase. During intentionally two significant modifications to IBM’s aggressive thermal cycling as part of the facility specifications. First, the facilities Linpack testing, the team experienced team designed innovative in-floor power a high volume of uncorrectable and distribution units to minimize congestion seemingly random memory errors. In and reduce, by a factor of four, the conduit effect, compute cards were failing at distribution equipment bridging the utilities and rack levels. Second, although IBM specified stainless-steel pipe for the Workers install Sequoia’s five-dimensional optical cooling infrastructure, the team selected interconnect. Close coordination was necessary more economical polypropylene piping to ensure that the three contractors working on that met durability requirements and the machine, its file system, and the repurposed relaxed water-treatment demands. The power system could proceed with their tasks polypropylene piping also contributed to simultaneously.

10 Lawrence Livermore National Laboratory S&TR July/August 2013 Sequoia

BlueGene/Q’s unusually sophisticated error-detection control system software. Within 40 days of detecting the memory errors, IBM and Livermore troubleshooters had pinpointed the cause and replaced all 25,000 faulty cards. Although the system was accepted in December 2012, IBM continued to work with Livermore to fine-tune Sequoia hardware and software until the machine converted to classified operations in April 2013, demonstrating a notable level of dedication and partnership in machine deployment.

Performance on a Grand Scale A jet noise simulation performed on Sequoia by researchers from Stanford University demonstrated None of the integration challenges the feasibility of million-core fluid-dynamics simulations. A new design for a jet engine nozzle is prevented Sequoia from completing, with shown in gray at left. Exhaust temperatures are in red and orange, and the sound field is blue. only hours to spare, a 23-hour Linpack (Courtesy of Center for Turbulence Research, Stanford University.) benchmark run at 16.324 petaflops and assuming the lead position on the Top500 Supercomputer Sites list for a 6-month period in 2012. Kim Cupps, a division leader in Livermore Computing, observes, “We’re proud that Sequoia was named the world’s fastest computer, but really, what’s important about the machine is what we can do with it. When we hear people talk about the work they’re doing on the machine and what they’ve accomplished, that’s what makes all the work worthwhile.” The speed, scaling, and versatility that Sequoia has demonstrated to date is impressive indeed. For a few months prior An OSIRIS simulation on Sequoia reveals the interaction of a fast-ignition-scale laser with dense to the transition to classified work and the deuterium–tritium plasma. The laser field is shown in green, the blue arrows illustrate the magnetic access limitation that entails, university field lines at the plasma interface, and the red and yellow spheres are the laser-accelerated and national laboratory researchers electrons that will heat and ignite the fuel. This fast-ignition simulation is the largest performed thus conducted unclassified basic-science far on any machine. research on Sequoia. These science code and multiphysics simulation runs helped unearth a number of previously undetected The Livermore–IBM-developed better than 90-percent parallel efficiency hardware and software bugs and provided Cardioid code, which rapidly models across all 1,572,864 cores on Sequoia. scientists with a preview of Sequoia’s the electrophysiology of a beating human Scientists hope to use Cardioid to model capabilities, while accomplishing heart at near-cellular resolution, was the various heart conditions and explore how compelling research. first to use more than a million cores and the heart responds to certain medications. Over the course of the science runs, the first to achieve more than 10 petaflops (See S&TR, September 2012, pp. 22–25.) Sequoia repeatedly set new world in sustained performance. Cardioid clocked In another study, HACC, a highly records for core usage and speed. in at nearly 12 petaflops while scaling with scalable cosmology simulation created

Lawrence Livermore National Laboratory 11 Sequoia S&TR July/August 2013

by Argonne National Laboratory, achieved model laser–plasma interactions at the simulation studies. By supporting 14 petaflops on Sequoia (or 70 percent of necessary scale and speed for optimizing computationally demanding, high- peak) in a 3.6 trillion particle benchmark experimental designs used in a fusion resolution, three-dimensional physics run. Argonne’s code is designed to help approach called fast ignition. simulations, Sequoia will allow researchers scientists understand the nature of dark In July, following a period for classified to gain a more complete understanding matter and dark energy. The HACC and code development, science runs, and of the physical processes underlying past Cardioid projects were 2012 finalists in the system optimization, Sequoia will become nuclear test results and the data gathered in prestigious Gordon Bell competition for the first advanced architecture system to nonnuclear tests. Weapons science results achievement in HPC. shoulder ASC production environment such as these may be used to improve Using a sophisticated fluid-dynamics simulations for Livermore, Los Alamos, integrated design calculations, which are code called CharLES, researchers at and Sandia national laboratories. suites of design packages that simulate the Stanford University’s Center for Turbulence Before now, stockpile stewardship and safety and reliability of a nuclear device. Research modeled noise generation for programmatic milestones were met using Integrated design calculations are several supersonic jet-engine designs commercial-grade capability systems. the target for Sequoia’s uncertainty to investigate which design results in a Sequoia will serve primarily as a tool quantification (UQ) efforts. Researchers quieter engine. A calculation that had for building better weapons science have endeavored to incorporate UQ into taken 100 hours to run on Dawn took just models and quantifying error in weapons their annual weapons assessments to better 12 hours on Sequoia. In one of the final science runs, a plasma physics simulation performed by Livermore scientists using the OSIRIS code also displayed magnificent scaling across the entire machine. This 3D UQ run demonstrated that, with a petascale Region of predictive Exascale computer, researchers can realistically

length-scale resolution 3D science-based microscopic- Region Sierra (2018) of initial Uncertainty quantification (UQ) uses statistical Trinity (2016) 3D UQ Sequoia (2013) Petascale methods to determine likely effects of minor Cielo (2012) differences in input parameters. This chart 3D science-based displays the primary roles of key past, enhanced physics present, and future National Nuclear Security Purple (2005) Administration resources responsible for Terascale 3D geometric- performing UQ calculations of weapons codes.

Trinity and Cielo are located at Los Alamos length-scale resolution National Laboratory, while Purple, Sequoia, and Sierra are Livermore supercomputers. Gigascale (D = dimensional.) 2D standard Small number of Large number of resolution calculations to confirm calculations to assess predictive accuracy uncertainty

12 Lawrence Livermore National Laboratory S&TR July/August 2013 Sequoia

understand and reduce sources of error in is both a production platform for UQ laboratories’ expertise in leading-edge these studies. Until now, they lacked the and weapons research and an exploration HPC and their history of successful code computing resources to perform UQ at the platform for future architectures. It really development with hardware companies, desired scale and level of mathematical serves an amazing role as a prototype a collaboration between laboratory and rigor. Scientists conducting UQ analyses for advanced technology and as a industry experts has an excellent chance will run many integrated design calculations way to develop a system strategy that of addressing the obstacles on the path to simultaneously on Sequoia and examine will propel us forward.” Finding the exascale supercomputing, while ensuring how the outcomes are affected by slight ideal recipe for an exascale computer that next-generation computer designs variations in the input parameters. A that balances power-consumption continue to meet ASC Program and other complete study could involve hundreds or limits, memory and communications mission-driven needs. thousands of runs. Sequoia will be the first requirements, programmability, space Says McCoy, “Our Laboratory’s system to allow for the routine use of two- limits, and many other factors may take intellectual vitality depends on our staying dimensional UQ studies at high resolution; some years. However, using Sequoia, in a leadership position in computing, from it will also be capable of entry-level three- Livermore programmers can explore how the perspective not only of the weapons dimensional UQ studies. Routine, highly best to exploit both new and potentially program but also of every scientific resolved, three-dimensional UQ must enduring architectural elements, such as endeavor at Livermore that depends on await more powerful platforms, such as the shared memory, vast quantities of cores, vital and world-class computing. The Livermore Sierra system slated to go into and hardware threading. Whatever the 21st-century economy will depend on production in 2018. future of computing might bring, ASC using HPC better than our competitors As a platform for both UQ and codes need to be compatible. To that end, and adversaries.” The knowledge and weapons science research, Sequoia is a Still is leading a Laboratory Directed vision that have helped make Sequoia a powerful resource that will improve the Research and Development project, success have also positioned Lawrence predictive capability of the ASC Program. using Sequoia, to make codes more Livermore to help forge a path to a new era These capabilities have broader national architecturally neutral. of computing. security applications as well, according Livermore computational experts —Rose Hansen to computational physicist Chris Clouse. aim to do far more than simply react to “It’s not just about predicting an aging computer architecture trends, though. Even Key Words: Advanced Simulation and stockpile,” says Clouse. “We also need during Sequoia’s integration, Laboratory Computing (ASC); BlueGene; Cardioid; powerful computers and UQ to understand researchers were developing a research exascale; hardware threading; high-performance the weapons design of proliferant nation portfolio to propel innovation and prepare computing (HPC); Leadership in Energy and Environmental Design; Linpack; message- states or organizations, for instance, for exascale computing. Through efforts passing interface (MPI); parallel computing; where we don’t have a large test base such as NNSA’s Fast Forward Program and single-instruction, multiple-data (SIMD) for calibration.” in partnership with HPC companies, they unit; stockpile verification; uncertainty have begun exploring potential computer quantification (UQ). Experts Prepare for the Unknown technologies and testing prototype Sequoia serves a vital function beyond hardware in Livermore environments and For further information contact Michel McCoy weapons research. Still remarks, “Sequoia with Livermore simulations. Given NNSA (925) 422-4021 ([email protected]).

Lawrence Livermore National Laboratory 13