Big Iron Moves Toward Exascale Computing Neal Leavitt

Technology News

Developers and researchers face many challenges in trying to produce exascale supercomputers, which would perform a thousand times faster than today’s most powerful systems.

upercomputing is entering a in the US Lawrence Berkeley National use supercomputers. “As always,” new frontier: the exaflops era, Laboratory’s Computing Research explained Intel Labs Fellow Shekhar in which high-performance Division. Borkar, “the technology will trickle machines could run a thou- “While transistor density on silicon down to mainstream computing.” Ssand times faster than today’s is projected to increase with Moore’s However, building exascale petaflops systems. law, the energy efficiency of silicon machines faces some significant Some experts say that some- is not,” he noted. “Power [consump- challenges. one could build an exaflops tion] has rapidly become the leading machine—capable of performing design constraint for future high- BACKGROUND 1018 floating-point operations per performance systems.” University of Illinois at Urbana- second—by the end of this decade. Thus, said Dittmer, software devel- Champaign researchers started More speed would be welcome, opers will have to optimize code for building supercomputers in the early as supercomputing is used in many power efficiency rather than just 1950s and parallel supercomputers in areas—including nuclear-weapon performance. the early 1960s. testing simulations, analyzing the Researchers are also looking into Seymour Cray, who founded geologies of various areas for possible a number of disruptive hardware Cray Research—the forerunner of oil deposits, astronomy, astrophysics, technologies that could dramatically today’s Cray Inc.—in the 1970s, is financial services, life sciences, and increase efficiency, including new considered the father of commercial climate modeling—that could benefit types of memory, silicon photonics, supercomputing. from higher performance. stacked-chip architectures, and com- Early supercomputers were “This will necessitate new hard- putational accelerators, explained designed like mainframes but adapted ware and software paradigms,” said Dimitrios S. Nikolopoulos, professor for higher speed. Arend Dittmer, director of product and director of research at Queen’s In the 1980s, the next wave of HPC marketing for Penguin Computing, a University Belfast. machines used custom processors. high-performance computing (HPC) The US, China, Japan, the European During the 1990s, general-purpose services provider. Union, and Russia are each investing commercial processors began offer- But for the first time in decades, billions of dollars in supercomputer ing good performance, low prices, and computing-technology advances research. reduced development costs, which might be threatened, said John Shalf, Achieving HPC improvements made them attractive for use in super- Computer Science Department head could even help those who don’t computers, said Stanford University

Maximum throughput Power Rank Site Manufacturer Computer Country Cores (petaflops) (megawatts)

1 US Lawrence Livermore IBM Sequoia USA 1,572,864 16.30 7.89 National Laboratory 2 RIKEN Advanced Institute for Fujitsu K computer Japan 795,024 10.50 12.66 Computational Science 3 US Argonne National IBM Mira USA 786,432 8.16 3.95 Laboratory 4 Leibniz Rechenzentrum IBM SuperMUC Germany 147,456 2.90 3.52 5 National Supercomputer National University of Tianhe-1A China 186,368 2.57 4.04 Center in Tianjin Defense Technology 6 US Oak Ridge National Cray Jaguar USA 298,592 1.94 5.14 Laboratory 7 CINECA IBM Fermi Italy 163,840 1.73 0.82 8 Forschungszentrum Juelich IBM JuQUEEN Germany 131,072 1.38 0.66 9 Commissariat a l’Energie Bull Curie thin France 77,184 1.36 2.25 Atomique nodes 10 National Supercomputing Dawning Nebulae China 120,640 1.27 2.58 Center in Shenzhen Source: Professor Jack Dongarra, University of Tennessee, Top500 project

professor William Dally, who is also “All are far more complex to solve BlueGene/Q HPC line and performs chief scientist and senior vice presi- than what has been done in the past, 16.3 Pflops. dent of research at GPU maker Nvidia. and it’s only now, with petascale The Top500 project, in which sev- The first machine to break the going to exascale, that we can begin eral academic and research experts petaflops barrier was IBM’s Roadrun- to solve these in less than a lifetime,” rank the world’s nondistributed super- ner in 2008. he explained. computer systems, placed Sequoia at Getting to exascale computing is the top of its list in its recent semi- critical for numerous reasons. Faster TOMORROW’S BIG IRON annual report, as Table 1 shows. supercomputers could conduct calcu- While some aspects of Ranking second was Fujitsu’s K com- lations that have been beyond reach supercomputing—such as the puter, which performs 10.5 Pflops. because of insufficient performance, traditional forms of security it uses— Sequoia, which runs Linux and is noted IBM Research director of com- are unlikely to change to enable primarily water cooled, consists of puting systems Michael Rosenfield. exascale computing, others will. 96 racks, 98,304 16-core compute Another issue is that many com- For example, developers are nodes, 1.6 million total cores, and 1.6 plex problems have a large number placing processing engines inside petabytes of RAM. of parameters. The only way to deal memory, rather than outside, Despite being so powerful, the with such problems is to simultane- to overcome the bottlenecks of system at peak speeds is 90 times ously run multiple sets of calculations today’s memory-to-processor more energy efficient than ASC using different combinations of connections. They are also working Purple and eight times more than parameters, which requires tremen- with alternate programming Blue Gene/L, two other very fast IBM dous computational resources. languages that optimize, enhance, supercomputers. Bill Kramer, deputy director for and simplify parallelism, as the National Center for Supercom- well as communications and Power consumption puting Applications’ (NCSA’s) Blue control approaches that improve Sequoia uses 7.89 megawatts at Waters petascale computing project, performance. peak performance. At that rate, a one- said research teams are working on exaflops machine would consume difficult problems in areas such as Fastest Supercomputer: 400 MW, about one-fifth of Hoover solar science, astrophysics, astron- IBM’s Sequoia Dam’s entire generation capacity, said omy, chemistry, material science, Sequoia, an IBM supercomputer at Nathan Brookwood, research fellow medicine, social networks, and neu- the US Lawrence Livermore National with semiconductor consultancy tron physics. Laboratory, is part of the company’s Insight 64.

NOVEMBER 2012 15 Technology News

Without improvements, an exa- Processing teristics can be complex and time scale computer “might need its own Processor performance will be a consuming. nuclear power plant or large hydro- key factor in exascale computing. And Investigating such matters is the electric dam,” said Carl Claunch, vice parallelism, created via multiple cores Heterogeneous System Architecture president and distinguished analyst on a chip working on different tasks (HSA) Foundation consortium of chip- with market research firm Gartner simultaneously, is driving processor- makers, academics, equipment man- Inc. performance improvements. ufacturers, software companies, and When an early Univac powered Future supercomputers will operating-system vendors. up in the 1950s, the lights dimmed have more cores per chip and each According to King-Smith, super- in the surrounding neighborhood, core will run many threads to hide computer developers will face a Brookwood noted. “Imagine the latency, according to Stanford’s Dally. challenge in balancing performance, impact of powering up a 400-MW There is an emerging consensus power, and chip size. supercomputer and watching the that future supercomputers will be Moreover, noted Dally, an exa- lights dim in the Southwest,” he heterogeneous multicore computers, scale machine will have to run over continued. “Future systems must with each processing chip having dif- a billion parallel threads at any time improve their performance per watt ferent types of cores specialized for to keep busy—many times more by a factor of 40 or more to deliver different tasks, he said. than today’s fastest computers—and exascale results within reasonable this will require new programming power envelopes.” approaches. The US Department of Energy Designers of exascale computers has set a goal that supercomputers There could be an could turn to 3D processors with use no more than 20 MW of power, exascale computer various layers of circuitry stacked on which would require radical re- by the end of this top of one another. Integrating a large designs of processors, interconnects, decade. processing system this way improves and memory, noted Alan Lee, vice memory access and performance, president of research and advanced added Dally. development for chipmaker AMD. However, this also creates chal- Supercomputer designers now For example, Dally explained, lenges for cooling and power routinely incorporate energy- the majority of the cores would be supply. For example, noted Georgia conserving features that turn off throughput-optimized to execute Institute of Technology assistant pro- idle elements within their chips parallelized work quickly and with fessor Richard Vuduc, heat builds up when possible. Modern chips, noted minimum energy consumption, as is between layers, making them harder Stanford’s Dally, use both power the case with GPUs. to cool. In addition, he said, the inter- gating—in which power to parts of A small number of cores would be layer connections are difficult to a chip is shut off—and clock gating— latency-optimized, like those in CPUs, design, and there are few tools for in which the power is left on but the for use in critical serial tasks. developing and testing 3D circuits. clock is turned off. For parallel tasks, said AMD’s Stacked DRAM placed close to the Lee, GPUs are more energy efficient Memory and interchip processor increases memory band- than CPUs because they use a single communications width while requiring significantly instruction to perform many opera- Two crucial limitations that less power to transfer data than cur- tions and because they run at lower exascale computing faces are the rent designs, he noted. voltages and frequencies. increasing speed disparity between Supercomputers could become Each type of processor provides a CPU and external memory, and the more energy efficient by using low- distinct advantages, said Tony King- relative slowdown in interconnect power memory and also components Smith, vice president of marketing support. that run at lower frequencies, as well for Imagination Technologies, which Processing speeds have increased as reducing the amount of data move- designs and licenses multimedia and exponentially, but the connecting ment, added Lee. communications semiconductor fabric and memory controllers are In the future, said University of cores. still working to keep up. California, San Diego (UCSD) pro- However, using different kinds of Introducing solutions such as data fessor Michael Taylor, using highly cores would not come without chal- compression, as well as optimizing specialized, low-speed application- lenges. For example, programming memory organization and usage by specific coprocessors could help multiple types of processors to take adding localized caches and thereby decrease energy consumption. advantage of their distinct charac- keeping more of the processing on

16 computer chip, would improve some memory- Thus, exascale systems might If exascale systems aren’t built and interconnect-related issues, said implement hybrid cooling systems and computing performance stalls King-Smith. using both liquid and air cooling, said at today’s levels, said the Lawrence Optical links and 3D chip stacking Cray’s Blake. Berkeley National Laboratory’s Shalf, could improve interchip communica- A current trend, noted IBM’s the information-technology industry tions and lower power consumption, Rosenfield, is to use room- will shift from a growth industry to but further research in this area is temperature water to efficiently a replacement industry, and future necessary, said the UCSD’s Taylor. conduct heat away from sensitive societal impacts of computing will be He predicted that optical connec- components without requiring water limited. tions will be built onto chips during chillers, which would increase energy the next few years. consumption. Neal Leavitt is president of Leavitt However, Stanford’s Dally noted, Communications (www.leavcom. com), a Fallbrook, California-based some major technical hurdles must ccording to Dally, the individ- international marketing communica- be cleared for this to happen, such as ual components of an exascale tions company with affiliate offices reducing the cost and power dissipa- Amachine could be reliable and in Brazil, France, Germany, Hong tion of optical links. have low failure rates, but combin- Kong, India, and the UK. He writes ing so many into one large computer frequently on technology topics and Internode networking increases the chance a failure will can be reached at [email protected]. Many supercomputers use high- occur somewhere. speed Ethernet for communication “Concerted government and indus- Editor: Lee Garber, Computer; between processing nodes. How- try investment and collaboration are [email protected] ever, the technology’s deep router needed to overcome the challenges queues and tendency to occasionally [of exascale computing]. Leadership drop packets could create high and is necessary … as evidenced by sov- Selected CS articles and columns unpredictable latencies unsuitable for ereign strategic commitments to HPC are available for free at http:// exascale computing. in Japan, China, Russia, and Europe,” ComputingNow.computer.org. Proprietary networking technolo- said IBM’s Rosenfield. gies like those used in Cray machines Added King-Smith, industry con- and other supercomputers have sortiums such as the Khronos Group lower-latency routers, topologies and the HSA Foundation should con- with fewer hops, and more efficient tinue pushing mainstream adoption buffer management, said Bill Blake, of technologies like heterogeneous Cray’s senior vice president and chief processing and GPU computation. technology officer. This approach In the past 20 years, US expendi- provides low internode latency and tures on HPC have steadily grown, high bandwidth. but it’s not certain that the country COMPUTING However, proprietary technologies will have the first practical exascale are expensive. system, noted the NCSA’s Kramer. THEN The technical challenges and uncer- Cooling tainties; the complexity of industrial, Supercomputers generate huge government, and national laboratory Learn about computing history amounts of heat. If the heat is partnerships; and budget problems and the people who shaped it. not either cooled or moved away might mean there won’t be the http://computingnow. from chips, connectors, and the focused US effort necessary for such machine’s many other heat-sensitive an expensive and technically chal- computer.org/ct components, they—and the entire lenging undertaking. system—will fail. Current plans for a US system In the past, supercomputers by 2018 are no longer likely to bear have used liquid cooling and/or fruit, and the country may not have air cooling via fans and heat sinks. an exascale machine until between Liquid cooling is highly effective, 2023 and 2025, Kramer said. “China, but it can be expensive and would Europe, even Russia may arrive at become much more so in exascale some type of exascale system first,” systems. he added.

NOVEMBER 2012 17