Post-K Computer Development

0 Copyright 2018 LIMITED Fujitsu’s High-end HPC Development

 Fujitsu has provided HPC systems with original technologies, developed for over 40 years, to accelerate advanced research

K computer The K computer continues to be PRIMEHPC competitive in various fields; from FX100 advanced research to manufacturing

© Post-K computer RIKEN and Fujitsu are developing Gordon Bell HPCG PRIMEHPC FX10 Prize Finalist No.3(2018) No.1(2018) the Post-K to achieve superior (2016) application performance

1 Copyright 2018 FUJITSU LIMITED Japan’s Post-K Computer Development Project  Project Overview RIKEN and Fujitsu are currently developing the post-K computer, the most advanced general-purpose , in the world  Project Goal and Approach

Approach 1. High-performance CPU 2. Compilers for exploiting hardware performance Application Low power User Ability to produce performance 3. Lightweight Layered IO-Accelerator ground-breaking performance consumption convenience results

2 Copyright 2018 FUJITSU LIMITED 1. High performance CPU  Microarchitecture using Fujitsu’s sophisticated technology • High scalability with a Core Memory Group (“CMG”) configuration • B/F ratio equivalent to that of K computer by using stacked memory • Hardware “prefetch” that achieves high cache efficiency PCle Tofu Functions & Architecture Post-K K computer Controller Interface

Base ISA + SIMD Extensions Armv8-A + SVE SPARC-V9 + HPC-ACE CMG CMG

Memory Memory SIMD width [bit] 512 128 C C # of Compute Cores per Socket 48 8 N # of CMG per Socket 4 1

CMG O CMG Memory B/F ratio 0.4~0.5 Memory Features for C C High Application Hardware “Prefetch” ✔ enhanced ✔ C Performance Inter-core Barrier ✔ ✔ AssistantCore ✔ - C:Compute core CMG:Core Memory Group NOC:Network on Chip

3 Copyright 2018 FUJITSU LIMITED 2. Compilers for Exploiting Hardware Performance

 The compiler cooperates with hardware to improve performance

Improves memory access performance Improves thread-parallel performance Memory Compiler + Hardware Bandwidth-intensive • Software Prefetch • Hardware Prefetch Applications • Loop-Blocking • Stacked Memory Compiler • CMG-aware Math Library • OpenMP 5.0 API & Fast Barrier Improves computational performance + Computation-intensive Compiler + Hardware Hardware • Software Pipelining with Loop Applications • Out-of-Order • 48cores in 4CMG(NUMA node) Fission • Inter-core Barrier • Auto-Vectorization with SVE • 512bit SVE

4 Copyright 2018 FUJITSU LIMITED 3. Lightweight Layered IO-Accelerator (LLIO)

 Provides Fast & Stable I/O for Conceptual image of LLIO & Job Assignment Each Job • Connects SSD close to compute nodes Job A Job B for fast shared scratch & shared cache of persistent storage Node • Allocates job-dedicated SSDs for Compute cores avoiding I/O conflicts Job Assistant cores • Offloads data transfer to assistant cores flash LLIO  Keeps High-Scalability Efficiently connected by Tofu interconnect

• Offloads I/O management tasks (ex. IO Network coherency, metadata handling) to assistant cores, reducing “OS jitter” Persistent Storage

5 Copyright 2018 FUJITSU LIMITED Post-K Hardware Features  High-density mounting, shortening transmission distance between CPUs • High-efficiency water cooling unit, • There are several kinds of connectors saving space on the CPU memory connected at same time on the CMU, unit (CMU) reducing rack space • The back-to-back layout eliminates the Several kinds of work area to connect the CMUs connectors

High-efficiency Cable box water cooling unit Conventional connection Post-K connection CMU

6 Copyright 2018 FUJITSU LIMITED Post-K Current Status  CPU powered-on  Design verification testing

Development Proceeding on Schedule

7 Copyright 2018 FUJITSU LIMITED 8