Supercomputing's Exaflop Target

news Technology | doi:10.1145/1978542.1978549 Tom Geller supercomputing’s exaflopt arget The twin challenges of parallelism and energy consumption are enlivening supercomputers’ progress. sia has come out swinging at the top of June 2011’s Top500 list, which rates the world’s fastest computers based on the LIN- APACK benchmark. Leading the list is the K Computer, which achieved 8.2 quadrillion floating-point operations per second (petaflops) to give Japan its first appearance in the much- coveted number-one position since November 2004. It knocks China’s Tianhe-1A, at 2.6 petaflops, to second place. The U.S.’s Jaguar (1.75 petaflops) was pushed from second to third place. China’s Nebulae (1.27 petaflops) dropped from third to fourth, and Japan’s Tsubame 2.0 (1.19 petaflops) slipped from fourth to the fifth position. Asia’s success comes at a time when national governments are reconsid- ering the value of these ratings—and what defines supercomputing success. The U.S. President’s Council of the top500’s leading supercomputer, the K computer, is more powerful than its five closest Advisors on Science and Technology competitors combined. (PCAST) recently warned that a focus on such rankings could “crowd out … combinations now popular in super- (1 exaflop). In asking for funding, they fundamental research in computer computing] is more difficult than for wrote that “Our global competitors in science and engineering.” Even Jack commodity CPUs,” and that “chang- Asia and Europe are already at work on Dongarra, the list’s founder and a com- ing from sequential to parallel code is exascale computing technology … we puter science professor at the Univer- not easy.” He predicts that such a tran- cannot afford to risk our leadership po- sity of Tennessee, believes the rank- sition would take three to five years. sition in computational sciences.” ings need to be seen in a larger context. But even with superb programming, David Kahaner, founding director “You can’t judge a car solely by how real-world aspects of data delivery and of the Asian Technology Information many RPMs its engine can do,” he says, error correction could significantly Program, believes that Tianhe-1A is the pointing to the Graph500 list and HPC reduce application speeds from those leading edge of a Chinese push to not Challenge Benchmark as other sources reported by the Top500. only increase supercomputing speeds, of comparative supercomputing data. Regardless, the November 2010 but also domesticate production. “It Computer scientists are also skepti- Top500 list’s release, with China’s represents a real commitment from cal about the value of such rankings. Tianhe-1A in first place, spurred po- the Chinese government to develop Petaflops are not the same as useful litical discussions about national com- supercomputing and the infrastruc- work; practical applications need to mitment to high performance comput- ture to support it,” he says. “A Chinese both preserve and take advantage of ing (HPC) throughout the world. In the domestic HPC ecosystem is evolving. their power. Xuebin Chi, director of U.S., 12 senators cited Tianhe-1A in Domestic components are being devel- the Supercomputing Center at the Chi- a letter to President Obama warning oped and incorporated, and their use nese Academy of Sciences, points out that “the race is on” to develop super- is likely to increase.” Dongarra agrees, that “programming [for the CPU/GPU computers capable of 1,000 petaflops noting, “The rate at which they’re do- credit tk 16 communications of the acm | month 2009 | vol. 00 | no. 00 news In today’s supercomputers, GPUs Programming “if you peek a little provide the brute calculation power, but rely heavily on CPUs for other tasks. For ACM-ICPC bit further into example, the number-two Tianhe-1A contains two six-core Intel Xeon X5670 graphics problems, CPUs for each 448-core Tesla M2050 World they look a lot like GPU (14,336 to 7,168); it also contains a much smaller number of eight-core Finals supercomputing Chinese-built Feiteng CPUs (2,048). Al- problems,” says together, GPUs in Tianhe-1A contribute students from Zhejiang approximately three million cores—30 University won the 2011 ACM sumit Gupta. international collegiate times as many as are in its CPUs. Programming contest World “modeling graphics But speed is not simply a matter Finals in orlando, FL, in may, of throwing more cores into the mix, beating student teams from is the same as as it is not easy to extract all of their 104 universities in the iBm- sponsored competition. modeling molecule processing power. First, data must be en route to its queued and managed to feed them— championship, Zhejiang movement in a and to put the results together when University students used iBm’s large-scale analytics and cloud chemical process.” they come out. “You need the CPU computing technologies to solve to drive the GPU,” Chi explains. “If eight out of 11 programming your problem can’t be fit into the problems in five hours. GPU itself, data will need to move fre- The only other team to solve eight problems, the University quently between the two, hurting per- of michigan, won a gold medal ing that is something we’ve not seen formance.” Dongarra correlates this, for finishing second. before with other countries.” saying, “The speed of moving data to The University of michigan team’s success was due to the GPU and the speed of computing it several factors, says coach Kevin cPus vs. cPu/GPu hybrids once it’s there are so mismatched that compton, associate professor If the June Top500 list had been a foot the GPU must do many computations in the school’s electrical race, the K Computer would have lapped with it before you see benefits.” engineering and computer science department. “First, they the competition. At 8.2 petaflops, it The K Computer, though, bucks are very talented programmers wields more power than the next five su- this trend. Unlike Tianhe-1A and other and problem solvers,” he says. “a percomputers combined. The K Com- recent large supercomputers, it does couple of them began entering programming competitions in puter’s name alludes to the Japanese not utilize GPUs or accelerators. The K high school.” in addition, from word “Kei” for 10 quadrillions, and rep- Computer uses 68,554 SPARC64 VIIIfx the time the regional contests resents the researchers’ desired perfor- CPUs, each with eight cores, for a total were held in october 2010 and mance goal of 10 petaflops. of 548,352 cores. And the Japanese en- the World Finals, the michigan team practiced intensely. Aside from national aspirations, the gineers plan to boost the K Computer’s The michigan team also Top500 list reveals technical trends in power by increasing the number of its created an efficient strategy HPC research worldwide. Most nota- circuit board-filled cabinets from 672 for solving problems, with team ble is an increased use of general-pur- to 800 in the near future. members having clearly defined roles, compton says. pose graphics processing units (GPUs) “each member had a fairly in a hybrid configuration with CPUs. asia’s ascension specialized role, either as a The June 2011 list includes 19 super- Asian researchers appear to be well po- coder or debugger and tester,” he says. “By following this computers that use GPU technology; sitioned to exploit GPUs for massively strategy, they were able to avoid the June 2010 list contained just 10. getting bogged down on a GPUs contain many more cores top500 List, June 2011. particular problem.” than CPUs, allowing them to per- Tsinghua University and st. Petersburg University Rank computer nation form a larger number of calculations finished in third and fourth in parallel. While originally used for 1 K Computer Japan place, respectively, winning graphics tasks, such as rendering 2 tianhe-1A China gold medals. The other top 12 finalists, each of which received 3 Jaguar U.S. every pixel in an image, GPUs are in- a silver medal, were Nizhny creasingly applied to a wide variety of 4 nebulae China Novgorod state University (5th), data-intensive calculations. “If you 5 tSUBAmE 2.0 Japan saratov state University (6th), peek a little bit further into graphics 6 Cielo U.S. Friedrich-alexander-University erlangen-Nuremberg (7th), problems, they look a lot like super- 7 Pleiades U.S. Donetsk National University computing problems,” says Sumit 8 hopper U.S. (8th), Jagiellonian University th Gupta, manager of Tesla Products at 9 tera-100 France in Krakow (9 ), moscow state University (10th), Ural state NVIDIA. “Modeling graphics is the 10 Roadrunner U.S. University (11th), and University same as modeling molecule move- Source: http://www.top500.org of Waterloo (12th). ment in a chemical process.” —Bob Violino month 2009 | vol. 00 | no. 00 | communications of the acm 17 news parallel supercomputing. Kahaner be- especially when compared with tradi- lieves China’s relative isolation from tional expectations. “Look at Moore’s Western influences may have led to in today’s hybrid law,” he says. “Computers will get about economics that favor such innovations. cPu/GPu 100 times faster in 10 years. But going “They’re not so tightly connected with from petascale to exascale in 10 years is U.S. vendors who have their own per- supercomputers, a multiple of a thousand.” Having said ception of things,” he says. “Potential GPus provide the that, he notes that it is been done be- bang for the buck is very strong in Asia, fore—twice. “We went from gigaflops in especially in places like China or India, brute calculation 1990 to teraflops in about 10 years, and which are very price-sensitive markets.

Supercomputing's Exaflop Target

Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators

2020 ALCF Science Report

Tsubame 2.5 Towards 3.0 and Beyond to Exascale

Lessons Learned in Deploying the World's Largest Scale Lustre File

Titan: a New Leadership Computer for Science

Musings RIK FARROWOPINION

Biology at the Exascale

(Intel® OPA) for Tsubame 3

Germany's Top500 Businesses

Jaguar Supercomputer

ECP Software Technology Capability Assessment Report

Experimental and Analytical Study of Xeon Phi Reliability