CORE CONCEPTS

Nascent exascale offer promise, present challenges CORE CONCEPTS Adam Mann, Science Writer

Sometime next year, managers at the US Department Laboratory in NM. “We have to change our computing of Energy’s (DOE) Argonne National Laboratory in paradigms, how we write our programs, and how we Lemont, IL, will power up a calculating machine the arrange computation and data management.” size of 10 tennis courts and vault the country into That’s because supercomputers are complex a new age of computing. The $500-million main- beasts, consisting of cabinets containing hundreds of frame, called , could become the world’sfirst thousands of processors. For these processors to oper- “exascale” , running an astounding ate as a single entity, a supercomputer needs to pass 1018, or 1 quintillion, operations per second. data back and forth between its various parts, running Aurora is expected to have more than twice the huge numbers of computations at the same time, all peak performance of the current supercomputer record while minimizing power consumption. Writing pro- holder, a machine named at the RIKEN Center grams for such is not easy, and the- for in Kobe, Japan. Fugaku and orists will need to leverage new tools such as machine its calculation kin serve a vital function in modern learning and to make scientific scientific advancement, performing simulations crucial breakthroughs. Given these challenges, researchers for discoveries in a wide range of fields. But the transition have been planning for for more to exascale will not be easy. “As these machines grow, than a decade (1). they become harder and harder to exploit efficiently,” Multiple countries are competing to get to exascale says Danny Perez, a physicist at Los Alamos National first. China has said it would have an exascale machine

Rows of cabinets hold incredible processing power for one of the world’s best supercomputers, , at Oak Ridge National Laboratory in TN. Exascale computing will surpass these existing by leaps and bounds. Image credit: Flickr/Oak Ridge National Laboratory, licensed under CC BY 2.0.

Published under the PNAS license.

www.pnas.org/cgi/doi/10.1073/pnas.2015968117 PNAS Latest Articles | 1of3 Downloaded by guest on September 28, 2021 by the end of 2020, although experts outside the country sometimes even recomputing a quantity rather than have expressed doubts about this timeframe even restoring it from slow memory. Issues with memory before the delays caused by the global severe acute and data retrieval are only expected to get worse in respiratory syndrome coronavirus 2 (SARS-CoV-2) exascale. Should any link in the chain of computation pandemic. The aims to have Aurora have bottlenecks, it can cascade into larger problems. ’ operational sometime in 2021. This means that a machine s peak performance, the Engineers in Japan and the European Union are theoretical highest processing power it can reach, will be not far behind. “Everyone’s racing to exascale,” says different from its real-world, sustainable performance. “In the best case, we can get to around 60 or 70 percent efficiency,” says Depei Qian, an emeritus sci- “Everyone’s racing to exascale. Who gets there first, I entist at Beihang University in Beijing, China, who helps ’ ” lead China’sexascaleefforts. don t know. — — Hardware is not the only challenge the software Jack Dongarra comes with its own set of problems. Before the tran- sition to petascale, Moore’s law brought performance improvements without having to completely rethink Jack Dongarra, a computer scientist at the University how a program was written. “You could just use the of Tennessee in Knoxville. “Who gets there first, I old programs,” says Perez. “That era is over. The low don’t know.” Along with bragging rights, the nations hanging fruits—we’ve definitely plucked them.” that achieve this milestone early will have a leg up in That’s partly because of those GPUs. But even the scientific revolutions of the future. before they came along, programs were parallelized for speed: They were divided into parts that ran at the Computational Boost same time on different CPUs, and the outputs were The increase in the power of computers has long fol- recombined into cohesive results. The process be- ’ lowed Moore s Law, named after cofounder came even more difficult when some parts of a pro- Gordon Moore, who observed in 1965 that the pro- gram had to be executed on a CPU and some on a cessing power of computer chips was doubling GPU. Exascale machines will contain on the order of roughly every two years (2). Supercomputers shifted 135,000 GPUs and 50,000 CPUs, and each of those from being able to do thousands of operations per chips will have many individual processing units re- second to millions, then billions, then trillions per quiring engineers to write programs that execute al- second, at a cadence of roughly a thousand-fold in- most a billion instructions simultaneously. crease in ability per decade. So running existing scientific simulations on the Such powerful computing required enormous new exascale computers is not going to be trivial. “It’s amounts of electricity. Unfortunately, much of this power not just picking out a [simulation] and putting it on a — it was getting lost as wasted heat a considerable con- big computer,” says L. Ruby Leung, an atmospheric cern in the mid-2000s, as researchers grappled with scientist at the Pacific Northwest National Laboratory in 15 capable of 10 calculations per Richland, WA. Researchers are being forced to reex- second. By 2006, IBM partly solved this problem by amine millions of lines of code and optimize them to designing chips known as graphics processing units make use of the unique architectures of exascale (GPUs), meant for the newly released Sony PlayStation 3. computers, so that the programs can reach as close to GPUs are specialized for rapidly rendering high-resolution the theoretical maximum processing power as possible. images. They divide complex calculations into smaller Teams around the world are wrestling with the tasks that run simultaneously, a process known as paral- different tradeoffs of achieving exascale machines. lelization, making them quicker and more energy-efficient Some groups have focused on figuring out how to add than generalist central processing units (CPUs). GPUs more CPUs for calculations, making these mainframes were a boon for supercomputers. easier to program but harder to power. The alternative In 2008, when Los Alamos Laboratory unveiled approach has been to sacrifice programmability for Roadrunner, the world’s first petascale supercom- energy efficiency, striving to find the best balance of puter, it contained 12,960 GPU-inspired chips along CPUs and GPUs without making it too cumbersome with 6,480 CPUs and performed twice as well as the for users to run their applications. Architectures that next best system at the time. Besides GPUs, Roadrun- minimize the transfer of data inside the machine, or ner included other innovations to save electricity, such use specialized chips to speed up specific algorithms, as turning on components only when necessary. Such are also being explored. energy efficiency was important because predictions for achieving exascale back then suggested that engi- Critical Calculations neers would need “something like half of a nuclear Despite all these challenges, researchers are intent on power plant to power the computer,” says Perez. harnessing the power of exascale machines. Science For such highly interconnected supercomputers, has technically already entered the exascale era with performance might be pinched by bottlenecks, such the project Folding@home. as the ability to access memory or store and retrieve Users can download the program and allow it to data quickly. Newer machines in fact try to avoid commandeer tiny bits of available processing power on shuffling around information as much as possible, their home PCs to solve biomedical conundrums.

2of3 | www.pnas.org/cgi/doi/10.1073/pnas.2015968117 Mann Downloaded by guest on September 28, 2021 Folding@home announced in March that when all of its Chemistry, cosmology, high-energy physics, ma- 700,000 participants were online, the project had the terials science, oil exploration, and transportation will combined capacity to perform more than 1.5 quintillion likely all benefit from exascale computing. Paired with operations per second. These simulation abilities have machine learning, exascale computers should enhance been put to use during the pandemic to search for researchers’ capacity for teasing out important patterns drugs effective against COVID-19. in complex datasets. For instance, experimental nuclear Indeed, the field of biochemistry makes heavy use fusion reactors, where superheated plasma is contained of supercomputers, which can act like “computational within powerful magnetic fields, have artificial intelli- microscopes” to let researchers peer closely at the gence (AI) programs on supercomputers that indicate otherwise invisible ways that molecules interact. In the when the plasma might be on the verge of becoming 1990s, researchers were only able to study a single unstable. Computers can then adjust the magnetic organic chemical in silico for a few virtual trillionths of a fields to shepherd the plasma and keep it from ’ second. But today s best machines can routinely breaching its constraints and hitting the walls of a re- model the movement of complex entities, such as actor. Exascale machines should allow for faster reac- viruses, over timescales of milliseconds. tion times and greater precision in such systems. Exascale supercomputers will enable simulations “Artificial intelligence is helping to identify rela- that are more complex and of higher resolution, tionships that are impossible to find using traditional allowing researchers to explore the molecular interac- computing,” says Paresh Kharya, who is responsible tions of viruses and their hosts with unprecedented for data center product management at the AI com- fidelity. In principle, the boost in computing power puting platform company NVIDIA in Santa Clara, CA. could help researchers better understand how lifesav- Computing power figures to increase considerably ing molecules bind to various proteins, guide bio- in the coming years. Following Aurora, the DOE plans medical experiments in HIV and cancer trials, or even to bring online a $600-million machine named Fron- aid in the design of a universal influenza vaccine. And tier at Oak Ridge National Laboratory in TN in late whereas current computers can only model one per- 2021 and a third supercomputer, El Capitan, at Law- cent of the human brain’s 100 billion neurons, exascale rence Livermore National Laboratory in CA, two years machines are expected be able to simulate 10 times later, each of which will be more powerful than their more of the brain’s capabilities, in principle helping to predecessor. The European Union has a range of elucidate memory and other neurological processes. exascale programs in the works under its European On a massively grander scale, the next generation of computers promise to offer insight into the poten- High-Performance Computing Joint Undertaking, tially disastrous effects of climate change. Weather whereas Japan is aiming for the exascale version of phenomena are prototypical examples of chaotic be- Fugaku to be available to users within a couple years. — havior in action, with countless minor feedback loops China which had no supercomputers as recently as that have planetary-scale consequences. A coordi- 2001 but now boasts the fourth and fifth most pow- — nated effort is ongoing into building the Energy erful machines on Earth is pursuing three exascale Exascale Earth System Model (E3SM), which will sim- projects (4). China has said that it expected the first, ulate biogeochemical and atmospheric processes Tianhe-3, to be complete this year, but project man- over land, ocean, and ice with up to two orders of agers say that the coronavirus pandemic has pushed magnitude better resolution than current models (3). back timelines. This should more accurately reproduce real-world Ironically, it is just these sorts of urgent, seemingly observations and satellite data, helping determine intractable problems—swiftly developing vaccines where adverse effects such as sea-level rise or storm and therapeutics to address COVID-19, for example— inundation might do the most damage to lives and for which exascale computers are meant. If groups can livelihoods. Exascale power will allow climate fore- solve the technical challenges, there should be an casters to swiftly run thousands of simulations, intro- impressive array of applications. Says Qian, “Super- ducing tiny variations in the initial conditions to better computing is supposed to benefit ordinary people in gauge the likelihood of events a hundred years hence. their daily lives.”

1 Kogge, P. et al., Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical Representative 15 (2008). 2 G. Moore, Cramming more components onto integrated circuits. Electronics (Basel) 38,114–117 (1965). 3 J. Golaz et al., The DOE E3SM coupled model version 1: Overview and evaluation at standard resolution. J. Adv. Model. Earth Syst. 11, 2089–2129 (2019). 4 Y. Lu, Paving the way for China exascale computing. CCF Transactions on High Performance Computing 1,63–72 (2019).

Mann PNAS Latest Articles | 3of3 Downloaded by guest on September 28, 2021