Post-K Supercomputer with Fujitsu's Original CPU, Powered by ARM

Post-K Supercomputer with Fujitsu’s Original CPU, Powered by ARM ISA Toshiyuki Shimizu June 27th, 2017 Vice President, System Development Division, Next Generation Technical Computing Unit FUJITSU LIMITED 0 Copyright 2017 FUJITSU LIMITED Outline Fujitsu’s HPC solutions Post-K development Goals and approach of Post-K Software and applications Summary 1 Copyright 2017 FUJITSU LIMITED Fujitsu Supercomputers Fujitsu has been providing high FX100 performance supercomputers for 40 years, increasing application performance while FX10 Post-K Under Development maintaining application compatibility w/ RIKEN No.1 in Top500 (June and Nov., 2011) K computer World’s Fastest FX1 Vector Processor (1999) Most Efficient Performance VPP5000 SPARC in Top500 (Nov. 2008) NWT* Enterprise Developed with NAL No.1 in Top500 PRIMEQUEST (Nov. 1993) VPP300/700 PRIMEPOWER PRIMERGY CX400 Gordon Bell Prize Skinless server (1994, 95, 96) ⒸJAXA HPC2500 VPP500 World’s Most Scalable PRIMERGY VP Series Supercomputer BX900 Cluster node AP3000 (2003) HX600 F230-75APU Cluster node AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Japan’s First Cluster in Top500 Vector (Array) (July 2004) Supercomputer (1977) *NWT: Numerical Wind Tunnel 2 Copyright 2017 FUJITSU LIMITED Fujitsu HPC Solutions to Meet Customer Demands Providing supercomputers w/ Fujitsu-developed CPU & x86 clusters High performance, high scalability, and high reliability “Single system image” operation w/ Fujitsu system software Post-K is being developed to focus on high application performance and low power consumption State-of-art technologies & additional features for the future High scalability with Fujitsu-developed High © RIKEN CPU and -end K computer Post-K interconnect Co-developed with RIKEN PRIMEHPC FX10 PRIMEHPC FX100 Under Development w/ RIKEN Divisional x86 Cluster PRIMERGY x86 cluster systems Departmental support the latest CPUs and RX2530/RX2540 CX400 CX600 Workgroup accelerators 3 Copyright 2017 FUJITSU LIMITED Achievements with the K computer Prestigious Benchmark Awards TOP500: 10.5Pflops, 93% efficiency No. 7 *at SC16* 6 years from the HPCG: 602Tflops, 5.3% efficiency No. 1 initial delivery Graph500: 38.6TTEPS No. 1 HPC Challenge Class 1: No.1 in all categories No. 1 (1) Global HPL, (2) Global Random Access, (3) EP STREAM, (4) Global FFT Gordon Bell Prize Awards “First-Principles Calculation of Electron States of a Silicon Nanowire with 100,000 Atoms on the K computer” (2011) “4.45 Pflops Astrophysical N-Body Simulation on K Computer – The Gravitational Trillion-Body Problem” (2012) “Simulations of Below-Ground Dynamics of Fungi: 1.184 Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes” (2016 finalist) 4 Copyright 2017 FUJITSU LIMITED Latest News from ISC17 Last Week K computer kept #1 positions for Graph500 & HPCG Post-K would be a very worthy successor to the K computer heritage 5 Copyright 2017 FUJITSU LIMITED Outline Fujitsu’s HPC solutions Post-K development Goals and approach of Post-K Software and applications Summary 6 Copyright 2017 FUJITSU LIMITED Post-K Goals and Approaches Post-K Goals Attains high application performance and good power efficiency Keeps application compatibility while advancing from predecessors Provides good usability and better accessibility for users Our Approaches Develops high performance and scalable CPU w/ custom designed CPU core Maintains performance balance for application compatibility & performance Adopts ARM ISA and its standard frameworks 7 Copyright 2017 FUJITSU LIMITED High & Scalable Performance Introduced in FX100 A scalable, many-core micro architecture concept, “SMaC” Assistant core for OS daemon and MPI offload to minimize OS Jitter Tofu interconnect enables the larger configuration w/ scalability Conceptual architecture of FX100 and beyond Single process multi-threading Single chip CPU compute node 12-node Tofu 6D mesh/torus Tofu single Linux on coherent memory unit interconnect Core Memory Group Compute Engine(CE) Core Core Core Hardware barrier Memory Memory Memory Memory Core Core Core CE Tofu Shared L2 $ Thin connection CE AC Dedicated memory controller Memory Memory Memory Memory Memory Memory Memory Memory Assistant core SMaC Scalable Architecture Tofu scalable interconnect 8 Copyright 2017 FUJITSU LIMITED Performance Balance Realized in FX100 End-to-end data bandwidth by SMaC Tofu interconnect is integrated into the CPU Conceptual architecture of FX100 and beyond Single process multi-threading Single chip CPU compute node 12-node Tofu 6D mesh/torus Tofu single Linux on coherent memory unit interconnect Core Memory Group Compute Engine(CE) Core Core Core Tofu is integrated Hardware barrier Memory Memory Memory Memory Core Core Core CE Tofu Shared L2 $ data end Thin connection - CE AC to Dedicated memory - controller Memorybandwidth Memory Memory Memory End Assistant core Memory Memory Memory Memory SMaC Scalable Architecture Enhanced Tofu interconnect 9 Copyright 2017 FUJITSU LIMITED Adopting ARM Standard Architecture ARMv8-A with SVE The latest HPC extension promises the high performance codes Co-operate with ARM communities and utilize OSS •Linaro, OpenHPC, OpenSFS, Open MPI, OpenMP, and etc. •Contribute our experiences w/ HPC applications and optimization Getting involved in the ARM HPC ecosystem by providing high performance machines in applications and low power consumption ARM’s standard frameworks, SBSA, SBBR, etc. Assure software compatibility among ARM platforms 10 Copyright 2017 FUJITSU LIMITED Post-K Supports a New Data Type: FP16 FP16 is one of the key features for further optimization opportunities regarding hardware, system software, and applications • Power consumption Reducing data bits & their movement • Bandwidth • Smaller data types resolve, directly Challenges • Cost Post-K CPU will support GPUs, Xeon Phi and Fujitsu processors will Packed vectors of: Modern • Half precision FP16 elements support: new data types Post-K Processors • 8 & 16 bit fixed-point elements • Existing numerical apps Mixed precision: Linear Algebra • Brand-new apps, incl. Relaxed precision: Molecular Dynamics Applications deep learning Half precision: Deep Learning 11 Copyright 2017 FUJITSU LIMITED Post-K Hardware Features Fujitsu CPU cores support the ARM SVE instruction set architecture Fujitsu CPU & Tofu maintain the programming models and provide high application performance FP16 (“giant vector throughput”) for supercomputers Functions & architecture Post-K FX100 FX10 K Instruction set architecture ARMv8-A SPARC V9 SIMD width 512bit 256bit 128bit 128bit CPU Core Double precision (64bit) ✔ ✔ ✔ ✔ Single precision (32bit) ✔ ✔ ✔ ✔ Half precision (16bit) ✔ - - - Interconnect Tofu interconnect Enhanced Tofu2 Tofu Tofu 12 Copyright 2017 FUJITSU LIMITED Outline Fujitsu’s HPC solutions Post-K development Goals and approach of Post-K Software and applications Summary 13 Copyright 2017 FUJITSU LIMITED Post-K Software Stack Valuable feedbacks through “co-design” from application R&D teams Post-K Applications (in next slides) FUJITSU Technical Computing Suite / RIKEN Advanced System Software Management Software Hierarchical File I/O Software Programming Environment System management XcalableMP for highly available & Application-oriented power saving operation file I/O middleware MPI (Open MPI, MPICH) OpenMP, COARRAY, Math Libs. Job management for Lustre-based higher system distributed file system Compilers (C, C++, Fortran) utilization & power FEFS efficiency Debugging and tuning tools Linux OS / McKernel (Lightweight Kernel) Post-K System Hardware Post-K Under Development w/ RIKEN 14 Copyright 2017 FUJITSU LIMITED Target science: 9 Priority Issues 重点課題① 生体分子システムの機能制御による革新的創薬基盤の構築重点課題② 個別化・予防医療を支援する統合計算生命科学重点課題③ 地震・津波による複合災害の ①Innovative Drug Discovery ②Personalized and Preventive ③Hazard統合的予測システムの構築 and Disaster induced by Medicine Earthquake and Tsunami Society with health and longevity Disaster prevention RIKEN Quant. Biology Center Inst. Medical Science, U. Tokyo Earthquakeand globalRes. Inst., climateU. Tokyo 重点課題⑧ 近未来型ものづくりを先導する革新的設計・製造プロセスの開発重点課題⑨ 宇宙の基本法則と進化の解明重点課題④ 観測ビッグデータを活用した ⑧ Innovative Design and Production ⑨Fundamental Laws and ④Environmental気象と地球環境の予測の高度化 Predictions with Processes for the Manufacturing Evolution of the Universe Observational Big Data Industry in the Near Future Basic science Industrial Cent. forcompetitivenessEarth Info., JAMSTEC Cent. for Comp. Science, U. Tsukuba Center for Earth Info., JAMSTEC 重点課題⑦ 次世代の産業を支える重点課題⑥ 革新的クリーンエネルギーシステムの実用化重点課題⑤ エネルギーの高効率な創出、変換・貯蔵、利用の新規基盤技術の開発 ⑦New新機能Functionalデバイス・高性能材料の創成 Devices and ⑥Innovative Clean Energy ⑤High-Efficiency Energy Creation, High-Performance Systems Conversion/Storage and Use Energy issues Inst. For Solid State Phys., U. Grad. Sch. Engineering, U. Tokyo Inst. Molecular Science, NINS 2017/06/22 GoingARM@ISC2017 15 Tokyo Presented by Dr. Sato Target science: Exploratory Issues 萌芽的課題⑪ 複数の社会経済現象の相互作用の Interactiveモデル構築とその応用研究 Models of Socio- Economic Phenomena and their Applications 萌芽的課題⑩ 基礎科学のフロンティア Frontiers－極限への挑戦 of Basic Science - challenge to extremes - 萌芽的課題⑫ 太陽系外惑星（第二の地球）の誕生と Formation太陽系内惑星環境変動の解明 of exo-planets (second Earth) and Environmental Changes of Solar Planets 萌芽的課題⑬ 思考を実現する神経回路機構の Mechanisms解明と人工知能への応用 of Neural Circuits for Human Thoughts and Artificial Intelligence 2017/06/22 GoingARM@ISC2017 16 Presented by Dr. Sato Summary Fujitsu’s HPC solutions Goals and approach of Post-K Software and applications Post-K Under Development w/ RIKEN 17 Copyright 2017 FUJITSU LIMITED 18 Copyright 2017 FUJITSU LIMITED.

Post-K Supercomputer with Fujitsu's Original CPU, Powered by ARM

Xinya (Leah) Zhao Abdulahi Abu Outline

Fujitsu PRIMEHPC FX10 Supercomputer

Supercomputer Fugaku

Masahori Nunami (NIFS)

Computer Architectures an Overview

Programming on K Computer

40 Jahre Erfolgsgeschichte BS2000 Vortrag Von 2014

Toward Automated Cache Partitioning for the K Computer

FX10 Supercomputer System (Oakleaf-FX) with 1.13 PFLOPS Started Operation at the University of Tokyo

Fujitsu Standard Tool

Primehpc Fx10

World's Fastest Computer