Introduction of Fujitsu's Next-Generation Supercomputer

Introduction of Fujitsu’s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of providing Post-FX10 several types of supercomputers such as vector, MPP and cluster, are inherited PRIMEHPC FX10 and implemented in latest supercomputer product line No.1 in Top500 K computer (June and Nov., 2011) FX1 World’s Fastest Vector Processor (1999) Highest Performance VPP5000 SPARC efficiency in Top500 NWT* Enterprise (Nov. 2008) Developed with NAL No.1 in Top500 PRIMEQUEST VPP300/700 (Nov. 1993) PRIMEPOWER PRIMERGY CX400 Gordon Bell Prize HPC2500 Skinless server (1994, 95, 96) ⒸJAXA VPP500 World’s Most Scalable PRIMERGY VP Series Supercomputer BX900 Cluster node AP3000 (2003) HX600 Cluster node F230-75APU AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Cluster in Top500 Japan’s First (July 2004) Vector (Array) *NWT: Supercomputer Numerical Wind Tunnel (1977) 1 Copyright 2014 FUJITSU LIMITED K computer and Fujitsu PRIMEHPC series Single CPU/node architecture for multicore Good Bytes/flop and scalability Key technologies for massively parallel supercomputers Original CPU and interconnect Support for tens of millions of cores (VISIMPACT*, Collective comm. HW) VISIMPACT SIMD extension HPC-ACE HPC-ACE HPC-ACE2 Collective comm. HW Direct network Tofu Direct network Tofu Tofu interconnect 2 HMC & Optical connections CY2008~ CY2010~ CY2012~ Coming soon 40GF, 4-core/CPU 128GF, 8-core/CPU 236.5GF, 16-core/CPU 1TF~, 32-core/CPU * VISIMPACT: Virtual Single Processor by Integrated Multi-core Parallel Architecture 2 Copyright 2014 FUJITSU LIMITED Architecture continuity for compatibility Upper compatible CPU: Binary-compatible with the K computer & PRIMEHPC FX10 Good byte/flop balance New features: New instructions (stride load/store, indirect load/store, permutation, concatenation) Improved micro architecture (out-of-order, branch-prediction, etc.) For distributed parallel executions: Compatible interconnect architecture Improved interconnect bandwidth 3 Copyright 2014 FUJITSU LIMITED The K computer and the evolution of PRIMEHPC K computer PRIMEHPC Post-FX10 FX10 CPU SPARC64 VIIIfx SPARC64 IXfx SPARC64 XIfx Peak perf. 128 GFLOPS 236.5 GFLOPS 1TFLOPS ~ # of cores 8 16 32 + 2 Memory DDR3 SDRAM ← HMC Interconnect Tofu Interconnect ← Tofu Interconnect 2 System size 11PFLOPS Max. 23PFLOPS Max. 100PFLOPS Link BW 5GB/s x bidirectional ← 12.5GB/s x bidirectional 4 Copyright 2014 FUJITSU LIMITED Feature and Configuration of Post-FX10 Fujitsu designed Chassis TM SPARC64 XIfx 1 CPU/1 node 1TF~(DP)/2TF~(SP) 12 nodes/2U Chassis 32 + 2 core CPU Water cooled HPC-ACE2 support Tofu2 integrated Cabinet 200~ nodes/cabinet High-density 100% water cooled CPU Memory Board with EXCU (option) Tofu Interconnect 2 Three CPUs 12.5 GB/s×2(in/out)/link 3 x 8 Micron’s HMCs 10 links/node 8 Finisar's opt modules, BOA, for Optical technology inter-chassis connections 5 Copyright 2014 FUJITSU LIMITED Flexible SIMD operations New 256bit wide SIMD functions enable versatile operations Four double-precision calculations Stride load/store, Indirect (list) load/store, Permutation, Concatenation SIMD Stride load Indirect load Permutation Double precision reg S reg S Memory reg S1 Specified stride Memory reg S2 128 Arbitrary shuffle regs simultaneous calculation reg D reg D reg D reg D 6 Copyright 2014 FUJITSU LIMITED Tofu Interconnect 2 Successor to Tofu Interconnect Highly scalable, 6-dimensional mesh/torus topology Increased link bandwidth by 2.5 times to 12.5 GB/s Interconnect integrated into CPU System-on-chip (SoC) removes off-chip I/O Improved packaging density and energy efficiency Optical cable connection between chassis Scalable three-dimensional torus Three-dimensional torus unit Well-balanced shape available 2×3×2 7 Copyright 2014 FUJITSU LIMITED Entire software stack is enhanced for Post-FX10 Applications HPC Portal / System Management Portal Technical Computing Suite System Management High Performance Automatic parallelization compiler File System System management Fortran System control FEFS C System monitoring C++ System operation support Tools and math libraries Lustre based high performance distributed Programming support tools file system Mathematical libraries Job Management High scalability, high reliability and Job manager availability Parallel languages and libraries Job scheduler Resource management Parallel OpenMP MPI XPFortran Linux based OS (enhanced for FX series) PRIMEHPC FX series 8 Copyright 2014 FUJITSU LIMITED HPC Platform Solutions Fujitsu has a long history of supercomputing since early 1980 Technologies and experience of providing Post-FX10 several types of supercomputers such as SPARC64 XIfx Exascale vector, MPP and cluster, are inherited PRIMEHPC FX10 and implemented in latestSPARC64 IXfx supercomputer product line No.1 in Top500 Post-FX10 (June and Nov., 2011) K computer FX1 SPARC64 VIIIfx World’s Fastest Vector ProcessorFUJITSU (1999) Highest Performance SupercomputerVPP5000 SPARC efficiency in Top500 NWT* PRIMEHPC FX10Enterprise (Nov. 2008) Developed with NAL 100Pflops class No.1 in Top500 PRIMEQUEST VPP300/700 (Nov. 1993) PRIMEPOWER PRIMERGY CX400 Gordon Bell Prize HPC2500 Skinless server K (1994,computer 95, 96) ⒸJAXA VPP500 World’s Most Scalable PRIMERGY VP Series Supercomputer BX900 20Pflops class Cluster node AP3000 (2003) HX600 Cluster node F230-75APU 10Pflops class AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Cluster in Top500 Japan’s First (July 2004) Vector (Array) *NWT: Supercomputer Numerical Wind Tunnel (1977) 9 Copyright 2014 FUJITSU LIMITED Open a bright future with Technical Computing 10 Copyright 2014 FUJITSU LIMITED .

Introduction of Fujitsu's Next-Generation Supercomputer

Xinya (Leah) Zhao Abdulahi Abu Outline

Fujitsu PRIMEHPC FX10 Supercomputer

Supercomputer Fugaku

Masahori Nunami (NIFS)

Computer Architectures an Overview

Programming on K Computer

40 Jahre Erfolgsgeschichte BS2000 Vortrag Von 2014

Toward Automated Cache Partitioning for the K Computer

FX10 Supercomputer System (Oakleaf-FX) with 1.13 PFLOPS Started Operation at the University of Tokyo

Fujitsu Standard Tool

Primehpc Fx10

World's Fastest Computer