Supercomputer “Fugaku” Formerly known as Post-K
0 Copyright 2019 FUJITSU LIMITED Supercomputer Fugaku Project
RIKEN and Fujitsu are currently developing Japan's next-generation flagship supercomputer, the successor to the K computer, as the most advanced general-purpose supercomputer in the world
PRIMEHPC FX100 PRIMEHPC FX10 No.1(2017) Finalist(2016) No.1(2018) K computer Supercomputer Fugaku
© RIKEN
RIKEN and Fujitsu announced that manufacturing started in March 2019
RIKEN announced on May 23, 2019 that the supercomputer is named*Formerly “Fugaku known as” Post-K
1 Copyright 2019 FUJITSU LIMITED Goals and Approaches for Fugaku Goals
High application Good usability Keeping application performance and wide range of uses compatibility
RIKEN announced predicted performance: More than 100x+ faster than K computer for GENESIS and NICAM+LETKF Geometric mean of speedup over K computer in 9 Priority Issues is greater than 37x+ https://postk-web.r-ccs.riken.jp/perf.html
2 Copyright 2019 FUJITSU LIMITED Goals and Approaches for Fugaku Goals
High application Good usability Keeping application performance and wide range of uses compatibility
Approaches Develop Achieve 1. High-performance Arm CPU A64FX in HPC and AI areas - High performance in real applications 2. Cutting-edge hardware design - High efficiency in key features for AI 3. System software stack applications
3 Copyright 2019 FUJITSU LIMITED 1. High-Performance Arm CPU A64FX in HPC and AI Areas Architecture features
CMG (Core Memory Group) ISA Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension) specification TofuD 13 cores 28 Gbps x 2 lanes x 10 ports SIMD width 512-bit L2 Cache 8 MiB I/O Memory 8 GiB, 256 GB/s PCIe Gen3 16 lanes Precision FP64/32/16, INT64/32/16/8 Cores 48 computing cores + 4 assistant cores (4 CMGs) Memory HBM2: Peak B/W 1,024 GB/s TofuD PCIe Controller Controller Interconnect TofuD: 28 Gbps x 2 lanes x 10 ports HBM2 HBM2 Peak performance (Chip level)
(TOPS) HPC AI
HBM2 Network on Chip HBM2 25 21.6+ A64FX (Fugaku) 20 SPARC64 VIIIfx (K computer) 15 10.8+ 10 5.4+ 2.7+ 5 0.128 0.128 N/A N/A 0 64 bits 32 bits 16 bits 8 bits (Element size)
4 Copyright 2019 FUJITSU LIMITED 1. High-Performance Arm CPU A64FX in HPC and AI Areas Architecture features
ISA Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension) TofuD Interface PCIe Interface SIMD width 512-bit HBM2 Precision FP64/32/16, INT64/32/16/8 Core Core Core Core Core Core Core Core Core Core
Interface Cores 48 computing cores + 4 assistant cores (4 CMGs) Memory HBM2: Peak B/W 1,024 GB/s L2 L2
Core Core Core Core Interface HBM2 Cache Cache Interconnect TofuD: 28 Gbps x 2 lanes x 10 ports
Core Core Core Core Core Core RingBus Core Core Core Core Core Core Peak performance (Chip level)
(TOPS) HPC AI Core Core Core Core Core Core Core Core Core Core Core Core 25 21.6+ A64FX (Fugaku) 20 HBM2Interface SPARC64 VIIIfx (K computer) L2 L2 Core Core Core Core 15 Cache Cache 10.8+ 10 5.4+ 5 2.7+ Core Core Core Core Core Core Core Core Core Core HBM2Interface 0.128 0.128 N/A N/A 0 64 bits 32 bits 16 bits 8 bits (Element size)
5 Copyright 2019 FUJITSU LIMITED [Flops] Performance Nodes 2. Cutting Footprint Nodes Configuration Scalable design Scalable 1PFlopsby 2.7 T+ CPU 1 5.4 T+ CMU 2 - edge Fugaku and 43 T+ 43 BoB 16 Hardware Design 129 T+ 129 Shelf 1.1 m 1x 48 rackincluding SSDs 2 K Fugaku (0.8 mx1.4 m) 384 computer 1 384 Rack P+ Fugaku 400 P+ 400 150k+ 6 water and electricalsignals mate Single O QSFP forAOC(Active3x cooling direct100% water CMU(CPU Memory Unit) Memory CMU(CPU pticalCables) connectorsfor - sided 80x compute racks & 20x disk racks disk compute& 20x racks blind © 128 m 128 RIKEN K 2 8,160 (4 m x (4 m computer IN 32 m) 32 OUT AOC QSFP28 (Z) Copyright 2019 FUJITSU LIMITED FUJITSU 2019 Copyright AOC QSFP28 (Y) AOC QSFP28 (X) connector PCIe coupler Water cables TofuD connector TofuD 3. System Software Stack
Fujitsu developing system software in collaboration with RIKEN Fujitsu Technical Computing Suite implementing development and execution environments with great usability on large-scale system
Fugaku applications Fujitsu Technical Computing Suite / RIKEN developing system software Management software File system Programming environment
System management FEFS XcalableMP MPI for high availability & power Lustre-based distributed file (Open MPI, MPICH) saving operation system OpenMP, Compilers Job management for higher Coarray (C, C++, Fortran) LLIO system utilization & power NVM-based file I/O accelerator Debugging and efficiency tuning tools Math. libs.
Linux OS / McKernel (Lightweight kernel) Fugaku system hardware
Fugaku Under development w/ RIKEN 7 Copyright 2019 FUJITSU LIMITED 3. System Software Stack
Fujitsu developing system software in collaboration with RIKEN Fujitsu Technical Computing Suite implementing development and execution environments with great usability on large-scale system
Fugaku applications Fujitsu Technical Computing Suite / RIKEN developing system software Management software File system Programming environment Exploits hardware performance by compiler System management FEFS XcalableMPXcalableMP* MPI for optimizationshigh availability & power such as SVELustre vectorization-based distributed file (Open MPI, MPICH) saving operation system OpenMP, Compilers Job management for higher COARRAYCoarray (C, C++, Fortran) Supports new programming languageLLIO system utilization & power Math. libs. NVM-based file I/O accelerator Debugging and standardsefficiency and data type FP16 tuning tools Math. libs.
Linux OS / McKernel (Lightweight kernel) Fugaku system hardware
Fugaku Under development w/ RIKEN 8 Copyright 2019 FUJITSU LIMITED Achieve High Performance in Real Application
WRF: Weather Research and Forecasting model (v3.8.1) Vectorizing loops including IF statements is key optimization High memory B/W and long SIMD length work Himeno Benchmark (Fortran90, size: XL) effectively Stencil calculation to solve Poisson’s equation by Jacobi method
WRF v3.8.1 (48-hour,12km, CONUS) on 48 cores Himeno Benchmark (Fortran90)
2 400 346 1.56 286 305 1.5 300 1.00 1 200
GFlops 85 103
0.5 100
Performance Ratio * Ratio Performance Higher is Higher better 0 is Higher better 0 Skylake A64FX Skylake(Xeon FX100 A64FX SX-Aurora Tesla V100 (Xeon Platinum 8168) 1 CPU Platinum 8168) 1CPU 1CPU TSUBASA 1GPU† 2 CPUs with source tuning 2CPUs 1VE† †Performance evaluation of a vector supercomputer SX-aurora TSUBASA * Normalized by the average elapsed time for timestep of Skylake https://dl.acm.org/citation.cfm?id=3291728
9 Copyright 2019 FUJITSU LIMITED Achieve High Efficiency in Key Features for AI Applications
High FP16 & INT8 peak performance and high memory peak B/W
INT8 partial dot product
FP16 performance: 10.8+ TOPS, > 90%@HGEMM 8-bit 8-bit 8-bit 8-bit
A0 A1 A2 A3 X X X X INT8 performance : 21.6+ TOPS in partial dot product B0 B1 B2 B3
C Memory B/W : 1,024 GB/s, > 80%@STREAM Triad 32-bit
Functions contributing to key features in AI fields
• 2x 512-bit wide SIMD pipelines per core for FP16 and INT8 A64FX CPU • High memory B/W and calculation throughput
• Vectorization and software pipelining Compilers & libraries • FP16 as data type of programming language (e.g., real (kind=2) in Fortran) • Mathematical Library for HGEMM
10 Copyright 2019 FUJITSU LIMITED Future Plans Supercomputer Fugaku Today Operations starting around CY2021
CY2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Manufacturing, Basic Design Implementation Installation Operations Fugaku Design and Tuning
Development Operations Operations ending Aug, 2019 © RIKEN K computer
Fujitsu HPC Products Fujitsu will begin global sales of supercomputers based on the Supercomputer Fugaku technology in the 2nd half of FY2019
11 Copyright 2019 FUJITSU LIMITED 12