<<

news

Technology | doi:10.1145/1978542.1978549 Tom Geller Supercomputing’s ExaflopT arget The twin challenges of parallelism and energy consumption are enlivening ’ progress.

sia has come out swinging at the top of June 2011’s Top500 list, which rates the world’s fastest com- puters based on the LIN- APACK . Leading the list is the K , which achieved 8.2 quadrillion floating-point operations per second (petaflops) to give its first appearance in the much- coveted number-one position since November 2004. It knocks ’s Tianhe-1A, at 2.6 petaflops, to sec- ond place. The U.S.’s (1.75 petaflops) was pushed from second to third place. China’s (1.27 petaflops) dropped from third to fourth, and Japan’s 2.0 (1.19 petaflops) slipped from fourth to the fifth position. Asia’s success comes at a time when national governments are reconsid- ering the value of these ratings—and what defines supercomputing suc- cess. The U.S. President’s Council of The Top500’s leading supercomputer, the , is more powerful than its five closest Advisors on Science and Technology competitors combined. (PCAST) recently warned that a focus on such rankings could “crowd out … combinations now popular in super- (1 exaflop). In asking for funding, they fundamental research in computer ] is more difficult than for wrote that “Our global competitors in science and .” Even Jack commodity CPUs,” and that “chang- Asia and Europe are already at work on Dongarra, the list’s founder and a com- ing from sequential to parallel code is technology … we puter science professor at the Univer- not easy.” He predicts that such a tran- cannot afford to risk our leadership po- sity of Tennessee, believes the rank- sition would take three to five years. sition in computational sciences.” ings need to be seen in a larger context. But even with superb programming, David Kahaner, founding director “You can’t judge a car solely by how real-world aspects of data delivery and of the Asian Technology Information many RPMs its engine can do,” he says, error correction could significantly Program, believes that Tianhe-1A is the pointing to the list and HPC reduce application speeds from those leading edge of a Chinese push to not Challenge Benchmark as other sources reported by the Top500. only increase supercomputing speeds, of comparative supercomputing data. Regardless, the November 2010 but also domesticate production. “It Computer scientists are also skepti- Top500 list’s release, with China’s represents a real commitment from cal about the value of such rankings. Tianhe-1A in first place, spurred po- the Chinese government to develop Petaflops are not the same as useful litical discussions about national com- supercomputing and the infrastruc- work; practical applications need to mitment to high performance comput- ture to support it,” he says. “A Chinese both preserve and take advantage of ing (HPC) throughout the world. In the domestic HPC ecosystem is evolving. their power. Xuebin Chi, director of U.S., 12 senators cited Tianhe-1A in Domestic components are being devel- the Supercomputing Center at the Chi- a letter to President Obama warning oped and incorporated, and their use nese Academy of Sciences, points out that “the race is on” to develop super- is likely to increase.” Dongarra agrees,

that “programming [for the CPU/GPU capable of 1,000 petaflops noting, “The rate at which they’re do- credit tk

16 communications of the acm | month 2009 | vol. 00 | no. 00 news

In today’s supercomputers, GPUs Programming “If you peek a little provide the brute calculation power, but rely heavily on CPUs for other tasks. For ACM-ICPC bit further into example, the number-two Tianhe-1A contains two six-core X5670 graphics problems, CPUs for each 448-core Tesla M2050 World they look a lot like GPU (14,336 to 7,168); it also contains a much smaller number of eight-core Finals supercomputing Chinese-built Feiteng CPUs (2,048). Al- problems,” says together, GPUs in Tianhe-1A contribute Students from Zhejiang approximately three million cores—30 University won the 2011 ACM Sumit Gupta. International Collegiate times as many as are in its CPUs. Programming Contest World “Modeling graphics But speed is not simply a matter Finals in Orlando, FL, in May, of throwing more cores into the mix, beating student teams from is the same as as it is not easy to extract all of their 104 universities in the IBM- sponsored competition. modeling molecule processing power. First, data must be En route to its queued and managed to feed them— championship, Zhejiang movement in a and to put the results together when University students used IBM’s large-scale analytics and cloud chemical process.” they come out. “You need the CPU computing technologies to solve to drive the GPU,” Chi explains. “If eight out of 11 programming your problem can’t be fit into the problems in five hours. GPU itself, data will need to move fre- The only other team to solve eight problems, the University quently between the two, hurting per- of Michigan, won a gold medal ing that is something we’ve not seen formance.” Dongarra correlates this, for finishing second. before with other countries.” saying, “The speed of moving data to The University of Michigan team’s success was due to the GPU and the speed of computing it several factors, says coach Kevin CPUs vs. CPU/GPU Hybrids once it’s there are so mismatched that Compton, associate professor If the June Top500 list had been a foot the GPU must do many computations in the school’s electrical race, the K Computer would have lapped with it before you see benefits.” engineering and department. “First, they the competition. At 8.2 petaflops, it The K Computer, though, bucks are very talented programmers wields more power than the next five su- this trend. Unlike Tianhe-1A and other and problem solvers,” he says. “A percomputers combined. The K Com- recent large supercomputers, it does couple of them began entering programming competitions in puter’s name alludes to the Japanese not utilize GPUs or accelerators. The K high school.” In addition, from word “Kei” for 10 quadrillions, and rep- Computer uses 68,554 SPARC64 VIIIfx the time the regional contests resents the researchers’ desired perfor- CPUs, each with eight cores, for a total were held in October 2010 and mance goal of 10 petaflops. of 548,352 cores. And the Japanese en- the World Finals, the Michigan team practiced intensely. Aside from national aspirations, the gineers plan to boost the K Computer’s The Michigan team also Top500 list reveals technical trends in power by increasing the number of its created an efficient strategy HPC research worldwide. Most nota- circuit board-filled cabinets from 672 for solving problems, with team ble is an increased use of general-pur- to 800 in the near future. members having clearly defined roles, Compton says. pose graphics processing units (GPUs) “Each member had a fairly in a hybrid configuration with CPUs. Asia’s Ascension specialized role, either as a The June 2011 list includes 19 super- Asian researchers appear to be well po- coder or debugger and tester,” he says. “By following this computers that use GPU technology; sitioned to exploit GPUs for massively strategy, they were able to avoid the June 2010 list contained just 10. getting bogged down on a GPUs contain many more cores Top500 List, June 2011. particular problem.” than CPUs, allowing them to per- Tsinghua University and St. Petersburg University Rank Computer Nation form a larger number of calculations finished in third and fourth in parallel. While originally used for 1 K Computer Japan place, respectively, winning graphics tasks, such as rendering 2 Tianhe-1A China gold medals. The other top 12 finalists, each of which received 3 Jaguar U.S. every pixel in an image, GPUs are in- a silver medal, were Nizhny creasingly applied to a wide variety of 4 Nebulae China Novgorod State University (5th), data-intensive calculations. “If you 5 TSUBAME 2.0 Japan Saratov State University (6th), peek a little bit further into graphics 6 Cielo U.S. Friedrich-Alexander-University Erlangen-Nuremberg (7th), problems, they look a lot like super- 7 U.S. Donetsk National University computing problems,” says Sumit 8 Hopper U.S. (8th), Jagiellonian University th Gupta, manager of Tesla Products at 9 Tera-100 in Krakow (9 ), Moscow State University (10th), Ural State . “Modeling graphics is the 10 Roadrunner U.S. University (11th), and University same as modeling molecule move- Source: http://www.top500.org of Waterloo (12th). ment in a chemical process.” —Bob Violino

Month 2009 | vol. 00 | no. 00 | communications of the acm 17 news

parallel supercomputing. Kahaner be- especially when compared with tradi- lieves China’s relative isolation from tional expectations. “Look at Moore’s Western influences may have led to In today’s hybrid law,” he says. “Computers will get about economics that favor such innovations. CPU/GPU 100 times faster in 10 years. But going “They’re not so tightly connected with from petascale to exascale in 10 years is U.S. vendors who have their own per- supercomputers, a multiple of a thousand.” Having said ception of things,” he says. “Potential GPUs provide the that, he notes that it is been done be- bang for the buck is very strong in Asia, fore—twice. “We went from gigaflops in especially in places like China or , brute calculation 1990 to teraflops in about 10 years, and which are very price-sensitive markets. power, but rely then to petaflops in another 10 years. If your applications work effectively on Extrapolating from this, we could go to those kinds of accelerator technolo- heavily on CPUs exascale in the next 10 years.” gies, they can be very cost effective.” for other tasks. But Dongarra warns that we won’t According to Satoshi Matsuoka, di- reach that stage solely by focusing on rector of the Computing Infrastructure hardware. “We need to ensure that Division at the Global Scientific Infor- the ecosystem has some balance in it. mation and Computing Center of the Major changes in the hardware will re- Tokyo Institute of Technology, China’s quire major changes in the algorithms comparatively recent entry into HPC K Computer attains an impressive and software,” he says. “We’re look- may help them in this regard. “Six 825 megaflops (Mflops) per watt even ing at machines in the next few years years ago, they were nowhere, almost at as the third-place, CPU-based Jaguar that could potentially have billions of zero,” he says. “They’ve had less legacy ekes out a so-so 250 Mflops/watt. By operations at once. How do we exploit to deal with.” By contrast, Gupta says, comparison, the hybrid Tianhe-1A billion-way parallelism?” programmers in more experienced achieves 640 Mflops/watt, Nebulae The payoffs could be enormous. Su- countries have to undergo re-educa- gets about 490 Mflops/watt, and Tsub- percomputing is already widely used in tion. “Young programmers have been ame 2.0 gets 850 Mflops/watt. (The fields as diverse as weather modeling, tainted into thinking sequentially,” list’s average is 248 Mflops/watt.) financial predictions, animation, fluid he notes. “Now that parallel program- The most energy-efficient system is dynamics, and data searches. Each of ming is becoming popular, everybody the U.S.’s CPU-based IBM BlueGene/Q these fields embodies several applica- is having to retrain themselves.” Prototype supercomputer, which en- tions. By way of example, Matsuoka These issues will only get more tered the Top500 in 109th place, with says, “You can’t do genomics without complicated as time progresses. Horst an efficiency of 1,680 Mflops/watt. The very large supercomputers. Because of Simon, deputy laboratory director of IBM BlueGene/Q tops the , a genomics, we have new drugs, ways of Lawrence Berkeley National Labora- list derived from the Top500 that ranks diagnosing disease, and crime investi- tory, says a high level of parallelism is supercomputers based on energy ef- gation techniques.” While exaflop com- necessary to progress past the 3–4GHz ficiency. But despite BlueGene/Q’s su- puters will spawn now-unimagined physical limit on individual proces- premacy, eight of the Green500’s top uses, any current increases in speed sors. “The typical one-petaflop system 10 are GPU-accelerated machines. as we race toward that goal will greatly of today has maybe 100,000 to 200,000 Energy is no small matter. The K benefit many existing applications. cores,” says Simon. “We can’t get those Computer consumes enough energy to cores to go faster, so we’d have to get a power nearly 10,000 homes, and costs Further Reading thousand times as many cores to get to $10 million a year to operate. These Anderson, M. an exaflop system. We’re talking about costs would significantly increase in an Better benchmarking for supercomputers, 100 million to a billion cores. That will exaflop world, notes Simon. IEEE Spectrum 48, 1, Jan. 2011. require some very significant concep- Endo, T., Nukada, A., Matsuoka, S., and tual changes in how we think about ap- Looking Ahead Maruyama, N. plications and programming.” Despite the headlines and U.S. sena- Linpack evaluation on a supercomputer with heterogeneous accelerators, 2010 tors’ statements, Dongarra and col- IEEE International Symposium on Parallel Matters of Energy leagues are quick to dismiss the super- & Distributed Processing (IPDPS), Atlanta, Hybrid architectures have histori- computing competition as a “race.” GA, April 19–23, 2010. cally had another advantage besides At the same time, he expects to see an Miller, C. their parallelism. They have also usu- increase in Top500 scores, and notes Most popular supercomputing videos, July ally used less energy than comparable that several projects are aiming for 13, 2010. http://www.datacenterknowledge. CPU-only systems. In the November the 10-petaflop target, which could be com/most-popular-supercomputing-videos/ 2010 list, hybrid systems generally de- realized by the end of 2012. But the Top500 list livered more efficiently than the real prize is the exaflop, which the U.S. http://www.top500.org CPU-only systems. government, among others, hopes to But the new Top500 list shows that achieve by 2020. Tom Geller is an Oberlin, OH-based science, technology, the architectural battle over energy ef- Matsuoka believes this goal is possi- and business writer. ficiency is still raging. The CPU-based ble, but it will be “a very difficult target,” © 2011 ACM 0001-0782/11/08 $10.00

18 communications of the acm | month 2009 | vol. 00 | no. 00