Looking Back on Supercomputer Fugaku Development Project
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
File System and Power Management Enhanced for Supercomputer Fugaku
File System and Power Management Enhanced for Supercomputer Fugaku Hideyuki Akimoto Takuya Okamoto Takahiro Kagami Ken Seki Kenichirou Sakai Hiroaki Imade Makoto Shinohara Shinji Sumimoto RIKEN and Fujitsu are jointly developing the supercomputer Fugaku as the successor to the K computer with a view to starting public use in FY2021. While inheriting the software assets of the K computer, the plan for Fugaku is to make improvements, upgrades, and functional enhancements in various areas such as computational performance, efficient use of resources, and ease of use. As part of these changes, functions in the file system have been greatly enhanced with a focus on usability in addition to improving performance and capacity beyond that of the K computer. Additionally, as reducing power consumption and using power efficiently are issues common to all ultra-large-scale computer systems, power management functions have been newly designed and developed as part of the up- grading of operations management software in Fugaku. This article describes the Fugaku file system featuring significantly enhanced functions from the K computer and introduces new power management functions. 1. Introduction application development environment are taken up in RIKEN and Fujitsu are developing the supercom- separate articles [1, 2], this article focuses on the file puter Fugaku as the successor to the K computer and system and operations management software. The are planning to begin public service in FY2021. file system provides a high-performance and reliable The Fugaku is composed of various kinds of storage environment for application programs and as- system software for supporting the execution of su- sociated data. -
Supercomputer Fugaku
Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst. -
Management Direction Update
• Thank you for taking the time out of your busy schedules today to participate in this briefing on our fiscal year 2020 financial results. • To prevent the spread of COVID-19 infections, we are holding today’s event online. We apologize for any inconvenience this may cause. • My presentation will offer an overview of the progress of our management direction. To meet the management targets we announced last year for FY2022, I will explain the initiatives we have undertaken in FY2020 and the areas on which we will focus this fiscal year and into the future. Copyright 2021 FUJITSU LIMITED 0 • It was just around a year ago that we first defined our Purpose: “to make the world more sustainable by building trust in society through innovation.” • Today, all of our activities are aligned to achieve this purpose. Copyright 2021 FUJITSU LIMITED 1 Copyright 2021 FUJITSU LIMITED 2 • First, I would like to provide a summary of our financial results for FY2020. • The upper part of the table shows company-wide figures for Fujitsu. Consolidated revenue was 3,589.7 billion yen, and operating profit was 266.3 billion yen. • In the lower part of the table, which shows the results in our core business of Technology Solutions, consolidated revenue was 3,043.6 billion yen, operating profit was 188.4 billion yen, and our operating profit margin was 6.2%. • Due to the impact of COVID-19, revenue fell, but because of our shift toward the services business and greater efficiencies, on a company-wide basis our operating profit and profit for the year were the highest in Fujitsu’s history. -
Integrated Report 2019 the Fujitsu Way CONTENTS
FUJITSU GROUP Integrated Report 2019 Report FUJITSU Integrated GROUP Fujitsu Group Integrated Report 2019 The Fujitsu Way CONTENTS The Fujitsu Way articulates the principles LETTER FROM THE MANAGEMENT 02 of the Fujitsu Group’s purpose in society, the values MESSAGE TO 02 MESSAGE TO SHAREHOLDERS AND INVESTORS SHAREHOLDERS AND INVESTORS it upholds, and the code of conduct that Group Takahito Tokita Representative Director and President employees follow in their daily business activities. CORPORATE VALUES By adhering to the values of the Fujitsu Way FUJITSU GROUP OVERVIEW What we strive for: 08 FUJITSU AT A GLANCE in their daily business activities, all Fujitsu Group Society and In all our actions, we protect the Environment environment and contribute to society. 10 FINANCIAL HIGHLIGHTS employees aim to enhance corporate value Profit and Growth We strive to meet the expectations of 11 ENVIRONMENT, SOCIETY, AND GOVERNANCE HIGHLIGHTS customers, employees, and shareholders. and contribute to global society. 12 MANAGEMENT CAPITAL: KEY DATA Shareholders We seek to continuously increase our 14 BOARD OF DIRECTORS / AUDIT & SUPERVISORY BOARD MEMBERS and Investors corporate value. Global Perspective We think and act from a global perspective. What we value: SPECIAL FEATURE: MANAGEMENT DIRECTION Employees We respect diversity and support individual 16 MANAGEMENT DIRECTION—TOWARD FUJITSU’S GROWTH growth. 17 Market Overview Customers We seek to be their valued and trusted Core Policy partner. 18 CORPORATE VISION Business Partners We build mutually beneficial relationships. 19 The Anticipated DX Business Technology We seek to create new value through 20 A New Company to Drive the DX Business Through our constant pursuit of innovation. -
Performance Tuning of Graph500 Benchmark on Supercomputer Fugaku
Performance tuning of Graph500 benchmark on Supercomputer Fugaku Masahiro Nakao (RIKEN R-CCS) Outline Graph500 Benchmark Supercomputer Fugaku Tuning Graph500 Benchmark on Supercomputer Fugaku 2 Graph500 https://graph500.org Graph500 has started since 2010 as a competition for evaluating performance of large-scale graph processing The ranking is updated twice a year (June and November) Fugaku won the awards twice in 2020 One of kernels in Graph500 is BFS (Breadth-First Search) An artificial graph called the Kronecker graph is used Some vertices are connected to many other vertices while numerous others are connected to only a few vertices Social network is known to have a similar property 3 Overview of BFS BFS Input:graph and root Output:BFS tree Data structure and BFS algorithm are free 4 Hybrid-BFS [Beamer, 2012] Scott Beamer et al. Direction-optimizing breadth-first search, SC ’12 It is suitable for small diameter graphs used in Graph500 Perform BFS while switching between Top-down and Bottom-up In the middle of BFS, the number of vertices being visited increases explosively, so it is inefficient in only Top-down Top-down Bottom-up 0 0 1 1 1 1 1 1 Search for unvisited vertices Search for visited vertices from visited vertices from unvisited vertices 5 2D Hybrid-BFS [Beamer, 2013] Scott Beamer, et. al. Distributed Memory Breadth- First Search Revisited: Enabling Bottom-Up Search. IPDPSW '13. Distribute the adjacency matrix to a 2D process grid (R x C) Communication only within the column process and within the row process Allgatherv, Alltoallv, -
The Blue Gene/Q Compute Chip
The Blue Gene/Q Compute Chip Ruud Haring / IBM BlueGene Team © 2011 IBM Corporation Acknowledgements The IBM Blue Gene/Q development teams are located in – Yorktown Heights NY, – Rochester MN, – Hopewell Jct NY, – Burlington VT, – Austin TX, – Bromont QC, – Toronto ON, – San Jose CA, – Boeblingen (FRG), – Haifa (Israel), –Hursley (UK). Columbia University . University of Edinburgh . The Blue Gene/Q project has been supported and partially funded by Argonne National Laboratory and the Lawrence Livermore National Laboratory on behalf of the United States Department of Energy, under Lawrence Livermore National Laboratory Subcontract No. B554331 2 03 Sep 2012 © 2012 IBM Corporation Blue Gene/Q . Blue Gene/Q is the third generation of IBM’s Blue Gene family of supercomputers – Blue Gene/L was announced in 2004 – Blue Gene/P was announced in 2007 . Blue Gene/Q – was announced in 2011 – is currently the fastest supercomputer in the world • June 2012 Top500: #1,3,7,8, … 15 machines in top100 (+ 101-103) – is currently the most power efficient supercomputer architecture in the world • June 2012 Green500: top 20 places – also shines at data-intensive computing • June 2012 Graph500: top 2 places -- by a 7x margin – is relatively easy to program -- for a massively parallel computer – and its FLOPS are actually usable •this is not a GPGPU design … – incorporates innovations that enhance multi-core / multi-threaded computing • in-memory atomic operations •1st commercial processor to support hardware transactional memory / speculative execution •… – is just a cool chip (and system) design • pushing state-of-the-art ASIC design where it counts • innovative power and cooling 3 03 Sep 2012 © 2012 IBM Corporation Blue Gene system objectives . -
MT-Lib: a Topology-Aware Message Transfer Library for Graph500 on Supercomputers Xinbiao Gan
IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 MT-lib: A Topology-aware Message Transfer Library for Graph500 on Supercomputers Xinbiao Gan Abstract—We present MT-lib, an efficient message transfer library for messages gather and scatter in benchmarks like Graph500 for Supercomputers. Our library includes MST version as well as new-MST version. The MT-lib is deliberately kept light-weight, efficient and friendly interfaces for massive graph traverse. MST provides (1) a novel non-blocking communication scheme with sending and receiving messages asynchronously to overlap calculation and communication;(2) merging messages according to the target process for reducing communication overhead;(3) a new communication mode of gathering intra-group messages before forwarding between groups for reducing communication traffic. In MT-lib, there are (1) one-sided message; (2) two-sided messages; and (3) two-sided messages with buffer, in which dynamic buffer expansion is built for messages delivery. We experimented with MST and then testing Graph500 with MST on Tianhe supercomputers. Experimental results show high communication efficiency and high throughputs for both BFS and SSSP communication operations. Index Terms—MST; Graph500; Tianhe supercomputer; Two-sided messages with buffer —————————— —————————— 1 INTRODUCTION HE development of high-performance supercompu- large graphs, a new benchmark called the Graph500 was T ting has always been a strategic goal for many coun- proposed in 2010[3]. Differently from the Top 500 used to tries [1,2]. Currently, exascale computing poses severe ef- supercomputers metric with FLOPS (Floating Point Per ficiency and stability challenges. Second) for compute-intensive su-percomputing applica- Large-scale graphs have applied to many seminars from tions. -
World's Fastest Computer
HISTORY OF FUJITSU LIMITED SUPERCOMPUTING “FUGAKU” – World’s Fastest Computer Birth of Fugaku Fujitsu has been developing supercomputer systems since 1977, equating to over 40 years of direct hands-on experience. Fujitsu determined to further expand its HPC thought leadership and technical prowess by entering into a dedicated partnership with Japan’s leading research institute, RIKEN, in 2006. This technology development partnership led to the creation of the “K Computer,” successfully released in 2011. The “K Computer” raised the bar for processing capabilities (over 10 PetaFLOPS) to successfully take on even larger and more complex challenges – consistent with Fujitsu’s corporate charter of tackling societal challenges, including global climate change and sustainability. Fujitsu and Riken continued their collaboration by setting aggressive new targets for HPC solutions to further extend the ability to solve complex challenges, including applications to deliver improved energy efficiency, better weather and environmental disaster prediction, and next generation manufacturing. Fugaku, another name for Mt. Fuji, was announced May 23, 2019. Achieving The Peak of Performance Fujitsu Ltd. And RIKEN partnered to design the next generation supercomputer with the goal of achieving Exascale performance to enable HPC and AI applications to address the challenges facing mankind in various scientific and social fields. Goals of the project Ruling the Rankings included building and delivering the highest peak performance, but also included wider adoption in applications, and broader • : Top 500 list First place applicability across industries, cloud and academia. The Fugaku • First place: HPCG name symbolizes the achievement of these objectives. • First place: Graph 500 Fugaku’s designers also recognized the need for massively scalable systems to be power efficient, thus the team selected • First place: HPL-AI the ARM Instruction Set Architecture (ISA) along with the Scalable (real world applications) Vector Extensions (SVE) as the base processing design. -
Supercomputers – Prestige Objects Or Crucial Tools for Science and Industry?
Supercomputers – Prestige Objects or Crucial Tools for Science and Industry? Hans W. Meuer a 1, Horst Gietl b 2 a University of Mannheim & Prometeus GmbH, 68131 Mannheim, Germany; b Prometeus GmbH, 81245 Munich, Germany; This paper is the revised and extended version of the Lorraine King Memorial Lecture Hans Werner Meuer was invited by Lord Laird of Artigarvan to give at the House of Lords, London, on April 18, 2012. Keywords: TOP500, High Performance Computing, HPC, Supercomputing, HPC Technology, Supercomputer Market, Supercomputer Architecture, Supercomputer Applications, Supercomputer Technology, Supercomputer Performance, Supercomputer Future. 1 e-mail: [email protected] 2 e-mail: [email protected] 1 Content 1 Introduction ..................................................................................................................................... 3 2 The TOP500 Supercomputer Project ............................................................................................... 3 2.1 The LINPACK Benchmark ......................................................................................................... 4 2.2 TOP500 Authors ...................................................................................................................... 4 2.3 The 39th TOP500 List since 1993 .............................................................................................. 5 2.4 The 39th TOP10 List since 1993 ............................................................................................... -
An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers
ORNL/TM-2020/1561 An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers Hyogi Sim Awais Khan Sudharshan S. Vazhkudai Approved for public release. Distribution is unlimited. August 11, 2020 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website: www.osti.gov/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-605-6000 (1-800-553-6847) TDD: 703-487-4639 Fax: 703-605-6900 E-mail: [email protected] Website: http://classic.ntis.gov/ Reports are available to DOE employees, DOE contractors, Energy Technology Data Ex- change representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone: 865-576-8401 Fax: 865-576-5728 E-mail: [email protected] Website: http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal lia- bility or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or rep- resents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not nec- essarily constitute or imply its endorsement, recommendation, or fa- voring by the United States Government or any agency thereof. -
Scalable Graph Traversal on Sunway Taihulight with Ten Million Cores
2017 IEEE International Parallel and Distributed Processing Symposium Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores Heng Lin†, Xiongchao Tang†, Bowen Yu†, Youwei Zhuo1,‡, Wenguang Chen†§, Jidong Zhai†, Wanwang Yin¶, Weimin Zheng† †Tsinghua University, ‡University of Southern California, §Research Institute of Tsinghua University in Shenzhen, ¶National Research Centre of Parallel Computer Engineering and Technology Email: {linheng11, txc13, yubw15}@mails.tsinghua.edu.cn, {cwg, zhaijidong, zwm-dcs}@tsinghua.edu.cn, [email protected] Abstract—Interest has recently grown in efficiently analyzing Although considerable amount of research has been unstructured data such as social network graphs and protein devoted to implementing BFS on large-scale distributed sys- structures. A fundamental graph algorithm for doing such task tems [3]–[5], rather less attention has been paid to systems is the Breadth-First Search (BFS) algorithm, the foundation for with accelerators. many other important graph algorithms such as calculating the Notably, only one machine among the top ten in the shortest path or finding the maximum flow in graphs. Graph500 list of November 2015 (Tianhe-2) has accelera- In this paper, we share our experience of designing and tors [2]. Furthermore, Tianhe-2 does not use the accelerators implementing the BFS algorithm on Sunway TaihuLight, a for its Graph500 benchmark test, while the accelerators are newly released machine with 40,960 nodes and 10.6 million fully employed for its Linpack performance. accelerator cores. It tops the Top500 list of June 2016 with a The question of how irregular applications, such as BFS, 93.01 petaflops Linpack performance [1]. can leverage the power of accelerators in large-scale parallel Designed for extremely large-scale computation and power computers is still left to be answered. -
Comparative HPC Performance Powerpoint
Comparative HPC Performance TOP500 Top Ten, Graph and Detail FX700 Actual Customer Benchmarks GRAPH500 Top Ten, Graph and Detail HPCG Top Ten, Graph and Detail HPL-AI Top Five, Graph and Detail Top 500 HPC Rankings – November 2020 500 450 400 350 300 Home - | TOP500 250 200 150 100 50 0 Fugaku Summit Sierra TaihuLight Selene Tianhe Juwels HPC5 Frontera Dammam 7 Rmax (k Tflop/s) Home - | TOP500 Rank Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kW) 1 Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D, Fujitsu 7,630,848 442,010 537,212 29,899 RIKEN Center for Computational Science Japan Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR 2 2,414,592 148,600 200,795 10,096 Infiniband, IBM DOE/SC/Oak Ridge National Laboratory United States Sierra - IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR 3 1,572,480 94,640 125,712 7,438 Infiniband, IBM / NVIDIA / Mellanox DOE/NNSA/LLNL United States 4 Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway, NRCPC 10,649,600 93,015 125,436 15,371 National Supercomputing Center in Wuxi China 5 Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband, Nvidia 555,520 63,460 79,215 2,646 NVIDIA Corporation United States 6 Tianhe-2A - TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000, NUDT 4,981,760 61,445 100,679 18,482 National Super Computer Center in Guangzhou China JUWELS Booster Module - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR 7 449,280 44,120 70,980 1,764 InfiniBand/ParTec ParaStation ClusterSuite, Atos Forschungszentrum Juelich (FZJ) Germany 8 HPC5 - PowerEdge C4140, Xeon Gold 6252 24C 2.1GHz, NVIDIA Tesla V100, Mellanox HDR Infiniband, Dell EMC 669,760 35,450 51,721 2,252 Eni S.p.A.