Sunway Taihulight Supercomputer Makes Its Appearance Jack Dongarra

RESEARCH HIGHLIGHTS Dongarra 265 The OX-ZEO process, with its super- Mingyuan He1,2 REFERENCE selectivity, low H /CO ratio in the syn- 1School of Chemistry and Molecular Engineering, 2 1. Jiao F, Li J and Pan X et al. Science 2016; 351: gas feedstock and rather mild operating East China Normal University, China 1065–8. conditions, is a very attractive system and 2Research Institute of Petroleum Processing, National Science Review shows promise with regard to eventual in- SINOPEC, China 3: 264–265, 2016 dustrial applications. E-mail: [email protected] doi: 10.1093/nsr/nww038 COMPUTER SCIENCE Sunway TaihuLight supercomputer makes its appearance Jack Dongarra The Sunway TaihuLight System1 [ ]is of data from memory on the SW26010 four Management Processing Elements very impressive with over 10 million is 22.4 Flops(DP)/Byte transfer, which (MPEs), four Computing Processing El- cores and a peak performance of just over shows an imbalance or an overcapacity of ements (CPEs) (a total of 260 cores), 125 Pflop/s. The Top500 list has the floating point operations per data trans- four memory controllers (MC), and a Sunway TaihuLight as the fastest com- fer from memory. By comparison the In- Network on Chip connected to the Sys- puter. The Sunway is almost three times tel Knights Landing processor with 7.2 tem Interface. Each of the four MPE, (2.75 times) as fast and three times as ef- Flops(DP)/Byte transfer. So for many CPE, and MC have access to 8 GB of ficient as the system it displaces inthe ‘real’ applications the performance on the DDR3 memory, see Fig. 1.Thereare number one spot. In fact, the sum of TaihuLight will be nowhere near the peak 40 960 nodes in the complete system. the performance for computers in posi- performance rate. Also the primary mem- The Sunway TaihuLight System is us- tion 2 through 7 is just barely greater ory for this system is on low side at 1.3 PB ing Sunway Raise OS 2.0.5 based on than the performance of the Sunway sys- (Tianhe-2 has 1.4 PB and Cray Titan sys- Linux as the operating system. The basic tem. The HPL Benchmark results at93 tem at Oak Ridge National Laboratory software stack for the many-core proces- Pflop/s or 74% of theoretical peak per- has .71 PB). sor includes basic compiler components, formance is impressive, with an efficiency The Sunway TaihuLight system, such as C/C++ and Fortran compilers, of 6 Gflops per Watt; this is the high- based on a homegrown processor, an automatic vectorization tool, and basic est efficiency of all the top computers. demonstrates the significant progress math libraries. There is also the Sunway The HPCG benchmark performance at that China has made in the domain of OpenACC, a customized parallel compi- only 0.3% of peak performance shows the designing and manufacturing large-scale lation tool that supports OpenACC 2.0 weakness of the Sunway TaihuLight ar- computation systems. A computer node syntax and targets the SW26010 many- chitecture with slow memory and mod- of this machine is based on one many- core processor. est interconnect performance. The ra- core processor chip called the SW26010 The Gordon Bell Prize [2]isawarded tio of floating point operations per byte processor. Each processor is composed of each year to recognize outstanding Figure 1. Basic layout of a node, the SW26010. 266 Natl Sci Rev, 2016, Vol. 3, No. 3 RESEARCH HIGHLIGHTS achievement in high-performance com- will develop an exascale computer during puting. The purpose of the award isto the 13th Five-Year-Plan period (2016– track the progress over time of parallel 2020). It is clear that they are on a path computing, with particular emphasis on which will take them to an exascale com- rewarding innovation in applying high- puter by 2020, well ahead of the US plans performance computing to applications for reaching exascale by 2023. in science, engineering, and large-scale data analytics. Financial support of the Jack Dongarra $10 000 award is provided by Gordon Innovative Computing Laboratory, EECS Bell, a pioneer in high-performance and Figure 2. Distribution of supercomputers. Department, University of Tennessee, USA parallel computing. There are three sub- E-mail: [email protected] missions which are finalists for the Gor- running real applications and not just a don Bell Award at SC16 that are based ‘stunt machine’. on the new Sunway TaihuLight system. China has made a big push into These three applications are as follows: high-performance computing. In 2001, REFERENCES (1) a fully implicit non-hydrostatic dy- there were no supercomputers listed on 1. Fu HH, Liao JF and Yang JZ et al. Sci China Inf namic solver for cloud-resolving atmo- the Top500 list [3] in China. Today Sci 2016; 59: 072001; doi:10.1007/s11432-016- spheric simulation; (2) a highly effective China has 167 systems on the June 2016 5588-7. global surface wave numerical simulation Top500 list compared to 165 systems 2. Gordon Bell Prize. https://en.wikipedia.org/ with ultrahigh resolution; and (3) large- in the USA, see Fig. 2. China has one- wiki/Gordon Bell Prize scale phase-field simulation for coars- third of the systems, while the num- 3. Top500 List. https://www.top500.org/lists/2016 ening dynamics based on Cahn-Hilliard ber of systems in the USA has fallen to /06/ equation with degenerated mobility. The the lowest point since the TOP500 list fact that there are sizeable applications was created. No other nation has seen and Gordon Bell contender applications such rapid growth. According to the Chi- National Science Review running on the system is impressive and nese national plan for the next generation 3: 265–266, 2016 shows that the system is capable of of high-performance computers, China doi: 10.1093/nsr/nww044.

Load more