Supercomputer Fugaku
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  File System and Power Management Enhanced for Supercomputer FugakuFile System and Power Management Enhanced for Supercomputer Fugaku Hideyuki Akimoto Takuya Okamoto Takahiro Kagami Ken Seki Kenichirou Sakai Hiroaki Imade Makoto Shinohara Shinji Sumimoto RIKEN and Fujitsu are jointly developing the supercomputer Fugaku as the successor to the K computer with a view to starting public use in FY2021. While inheriting the software assets of the K computer, the plan for Fugaku is to make improvements, upgrades, and functional enhancements in various areas such as computational performance, efficient use of resources, and ease of use. As part of these changes, functions in the file system have been greatly enhanced with a focus on usability in addition to improving performance and capacity beyond that of the K computer. Additionally, as reducing power consumption and using power efficiently are issues common to all ultra-large-scale computer systems, power management functions have been newly designed and developed as part of the up- grading of operations management software in Fugaku. This article describes the Fugaku file system featuring significantly enhanced functions from the K computer and introduces new power management functions. 1. Introduction application development environment are taken up in RIKEN and Fujitsu are developing the supercom- separate articles [1, 2], this article focuses on the file puter Fugaku as the successor to the K computer and system and operations management software. The are planning to begin public service in FY2021. file system provides a high-performance and reliable The Fugaku is composed of various kinds of storage environment for application programs and as- system software for supporting the execution of su- sociated data.
- 
												  Performance Analysis on Energy Efficient HighPerformance Analysis on Energy Efficient High-Performance Architectures Roman Iakymchuk, François Trahay To cite this version: Roman Iakymchuk, François Trahay. Performance Analysis on Energy Efficient High-Performance Architectures. CC’13 : International Conference on Cluster Computing, Jun 2013, Lviv, Ukraine. hal-00865845 HAL Id: hal-00865845 https://hal.archives-ouvertes.fr/hal-00865845 Submitted on 25 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Performance Analysis on Energy Ecient High-Performance Architectures Roman Iakymchuk1 and François Trahay1 1Institut Mines-Télécom Télécom SudParis 9 Rue Charles Fourier 91000 Évry France {roman.iakymchuk, francois.trahay}@telecom-sudparis.eu Abstract. With the shift in high-performance computing (HPC) towards energy ecient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing perfor- mance analysis tools to these new systems. We present EZTrace a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on a mechanism for instrumenting func- tions, a lightweight tool for recording events, and a generic interface for writing traces. To support EZTrace on energy ecient HPC systems, we developed a CUDA module and ported EZTrace to ARM processors.
- 
												  BDEC2 Poznan Japanese HPC Infrastructure UpdateBDEC2 Poznan Japanese HPC Infrastructure Update Presenter: Masaaki Kondo, Riken R-CCS/Univ. of Tokyo (on behalf of Satoshi Matsuoka, Director, Riken R-CCS) 1 Post-K: The Game Changer (2020) 1. Heritage of the K-Computer, HP in simulation via extensive Co-Design • High performance: up to x100 performance of K in real applications • Multitudes of Scientific Breakthroughs via Post-K application programs • Simultaneous high performance and ease-of-programming 2. New Technology Innovations of Post-K Global leadership not just in High Performance, esp. via high memory BW • the machine & apps, but as Performance boost by “factors” c.f. mainstream CPUs in many HPC & Society5.0 apps via BW & Vector acceleration cutting edge IT • Very Green e.g. extreme power efficiency Ultra Power efficient design & various power control knobs • Arm Global Ecosystem & SVE contribution Top CPU in ARM Ecosystem of 21 billion chips/year, SVE co- design and world’s first implementation by Fujitsu High Perf. on Society5.0 apps incl. AI • ARM: Massive ecosystem Architectural features for high perf on Society 5.0 apps from embedded to HPC based on Big Data, AI/ML, CAE/EDA, Blockchain security, etc. Technology not just limited to Post-K, but into societal IT infrastructures e.g. Clouds 2 Arm64fx & Post-K (to be renamed) Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU HPC Optimized: Extremely high package high memory BW (1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS (~3Teraflops), various AI support (FP16, INT8, etc.) Gen purpose CPU – Linux,
- 
												  Measuring Power Consumption on IBM Blue Gene/PView metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ).
- 
												  Download SC18 Final ProgramProgram November 11-16, 2018 The International Conference Exhibits November 12-15, 2018 for High Performance KAY BAILEY HUTCHISON CONVENTION CENTER Computing, Networking, DALLAS, TEXAS, USA Storage, and Analysis Sponsored by: MICHAEL S. RAWLINGS Mayor of Dallas November 11, 2018 Supercomputing – SC18 Conference: As Mayor of Dallas, it is my pleasure to welcome SC18 – the ultimate international conference for high performance computing, networking, storage, and analysis – to the Kay Bailey Hutchison Convention Center! I applaud SC18 for bringing together high-performance computing (HPC) professionals and students from across the globe to share ideas, attend and participate in technical presentations, papers and workshops, all while inspiring the next generation of HPC professionals. Dallas is the No. 1 visitor destination in Texas and one of the top cities in the nation for meetings and conventions. In addition to having the resources to host major events and conventions, Dallas is a greatly diverse American city and a melting pot of cultures. This important convergence of uniqueness and differences is reflected throughout the sights and sounds of our city. Dallas' authentic arts, music, food, historic landmarks and urban lifestyle all contribute to the city's rich cultural makeup. Let the bold, vibrant spirit of Dallas, with a touch of Texas charm, guide the way to your BIG Dallas moments. Our city is filled with the unique experiences one can only have in Dallas – from cuisine by celebrity chefs, to the cultural landscape in one of the nation’s largest arts districts, to first-class shopping - make the most of your time in our great city! From transportation to hotels, restaurants, retail and entertainment venues, Dallas is ready to roll out our red carpet to host SC18.
- 
												  Compute System Metrics (Bof)Setting Trends for Energy-Efficient Supercomputing Natalie Bates, EE HPC Working Group Tahir Cader, HP & The Green Grid Wu Feng, Virginia Tech & Green500 John Shalf, Berkeley Lab & NERSC Horst Simon, Berkeley Lab & TOP500 Erich Strohmaier, Berkeley Lab & TOP500 ISC BoF; May 31, 2010; Hamburg, Germany Why We Are Here • “Can only improve what you can measure” • Context – Power consumption of HPC and facilities cost are increasing • What is needed? – Converge on a common basis between different research and industry groups for: •metrics • methodologies • workloads for energy-efficient supercomputing, so we can make progress towards solutions. Current Technology Roadmaps will Depart from Historical Gains Power is the Leading Design Constraint From Peter Kogge, DARPA Exascale Study … and the power costs will still be staggering From Peter Kogge, DARPA Exascale Study $1M per megawatt per year! (with CHEAP power) Absolute Power Levels Power Consumption Power Efficiency What We Have Done • Stages of Green Supercomputing – Denial – Awareness – Hype – Substance The Denial Phase (2001 – 2004) • Green Destiny – A 240-Node Supercomputer in 5 Sq. Ft. – LINPACK Performance: 101 Gflops – Power Consumption: 3.2 kW • Prevailing Views embedded processor – “Green Destiny is so low power that it runs just as fast when it is unplugged.” – “In HPC, no one cares about power & cooling, and no one ever will …” – “Moore’s Law for Power will stimulate the economy by creating a new market in cooling technologies.” The Awareness Phase (2004 – 2008) • Green Movements & Studies – IEEE Int’l Parallel & Distributed Processing Symp. (2005) • Workshop on High-Performance, Power-Aware Computing (HPPAC) Green500 • Metrics: Energy-Delay Product and FLOPS/Watt FLOPS/watt – Green Grid (2007) • Industry-driven consortium of all the top system vendors • Metric: Power Usage Efficiency (PUE) – Kogge et al., “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA ITO, AFRL, 2008.
- 
												  Mixed Precision Block Fused Multiply-AddMixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh To cite this version: Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores. SIAM Journal on Scientific Computing, Society for Industrial and Applied Mathematics, 2020, 42 (3), pp.C124-C141. 10.1137/19M1289546. hal-02491076v2 HAL Id: hal-02491076 https://hal.archives-ouvertes.fr/hal-02491076v2 Submitted on 28 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. MIXED PRECISION BLOCK FUSED MULTIPLY-ADD: ERROR ANALYSIS AND APPLICATION TO GPU TENSOR CORES∗ PIERRE BLANCHARDy , NICHOLAS J. HIGHAMz , FLORENT LOPEZx , THEO MARY{, AND SRIKARA PRANESHk Abstract. Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision and existing rounding error analyses do not support them. We consider a mixed precision block FMA that generalizes both the usual scalar FMA and existing tensor units.
- 
												  Computer Architectures an OverviewComputer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
- 
												  Programming on K ComputerProgramming on K computer Koh Hotta The Next Generation Technical Computing Fujitsu Limited Copyright 2010 FUJITSU LIMITED System Overview of “K computer” Target Performance : 10PF over 80,000 processors Over 640K cores Over 1 Peta Bytes Memory Cutting-edge technologies CPU : SPARC64 VIIIfx 8 cores, 128GFlops Extension of SPARC V9 Interconnect, “Tofu” : 6-D mesh/torus Parallel programming environment. 1 Copyright 2010 FUJITSU LIMITED I have a dream that one day you just compile your programs and enjoy high performance on your high-end supercomputer. So, we must provide easy hybrid parallel programming method including compiler and run-time system support. 2 Copyright 2010 FUJITSU LIMITED User I/F for Programming for K computer K computer Client System FrontFront End End BackBack End End (80000 (80000 Nodes) Nodes) IDE Interface CCommommandand JobJob C Controlontrol IDE IntInteerfrfaceace debugger DDebugebuggerger App IntInteerfrfaceace App Interactive Debugger GUI Data Data Data CoConvnveerrtteerr Data SamplSampleerr debugging partition official op VisualizedVisualized SSaammpplingling Stage out partition Profiler DaDattaa DaDattaa (InfiniBand) 3 Copyright 2010 FUJITSU LIMITED Parallel Programming 4 Copyright 2010 FUJITSU LIMITED Hybrid Parallelism on over-640K cores Too large # of processes to manipulate To reduce number of processes, hybrid thread-process programming is required But Hybrid parallel programming is annoying for programmers Even for multi-threading, procedure level or outer loop parallelism was desired Little opportunity for such coarse grain parallelism System support for “fine grain” parallelism is required 5 Copyright 2010 FUJITSU LIMITED Targeting inner-most loop parallelization Automatic vectorization technology has become mature, and vector-tuning is easy for programmers. Inner-most loop parallelism, which is fine-grain, should be an important portion for peta-scale parallelization.
- 
												  HPC Systems in the Next Decade – What to Expect, When, WhereEPJ Web of Conferences 245, 11004 (2020) https://doi.org/10.1051/epjconf/202024511004 CHEP 2019 HPC Systems in the Next Decade – What to Expect, When, Where Dirk Pleiter1;∗ 1Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany Abstract. HPC systems have seen impressive growth in terms of performance over a period of many years. Soon the next milestone is expected to be reached with the deployment of exascale systems in 2021. In this paper, we provide an overview of the exascale challenges from a computer architecture’s perspec- tive and explore technological and other constraints. The analysis of upcoming architectural options and emerging technologies allow for setting expectations for application developers, which will have to cope with heterogeneous archi- tectures, increasingly diverse compute technologies as well as deeper memory and storage hierarchies. Finally, needs resulting from changing science and en- gineering workflows will be discussed, which need to be addressed by making HPC systems available as part of more open e-infrastructures that provide also other compute and storage services. 1 Introduction Over several decades, high-performance computing (HPC) has seen an exponential growth of performance when solving large-scale numerical problems involving dense-matrix opera- tions. This development is well documented by the Top500 list [1], which is published twice a year since 1993. In various regions around the world, significant efforts are being per- formed to reach the next milestone of performance with the deployment of exascale systems. For various reasons, improving on the performance of HPC systems has become increasingly more difficult. In this paper, we discuss some of the reasons.
- 
												  2010 IBM and the Environment Report2010 IBM and the Environment Report Committed to environmental leadership across all of IBM's business activities IBM and the Environment - 2010 Annual Report IBM AND THE ENVIRONMENT IBM has long maintained an unwavering commitment to environmental protection, which was formalized by a corporate environmental policy in 1971. The policy calls for IBM to be an environmental leader across all of our business activities, from our research, operations and products to the services and solutions we provide our clients to help them be more protective of the environment. This section of IBM’s Corporate Responsibility Report describes IBM’s programs and performance in the following environmental areas: A Commitment to Environmental 3 Energy and Climate Programs 36 Leadership A Five-Part Strategy 36 Conserving Energy 37 Global Governance and Management 4 CO2 Emissions Reduction 42 System PFC Emissions Reduction 43 Global Environmental Management 4 Renewable Energy 45 System Voluntary Climate Partnerships 46 Stakeholder Engagement 7 Transportation and Logistics 47 Voluntary Partnerships and Initiatives 8 Initiatives Environmental Investment and Return 9 Energy and Climate Protection in 48 the Supply Chain Process Stewardship 12 Environmentally Preferable Substances 12 Remediation 52 and Materials Nanotechnology 15 Audits and Compliance 53 Accidental Releases 53 Pollution Prevention 17 Fines and Penalties 54 Hazardous Waste 17 Nonhazardous Waste 19 Awards and Recognition 55 Chemical Use and Management 20 Internal Recognition 55 External Recognition
- 
												  Management Direction Update• Thank you for taking the time out of your busy schedules today to participate in this briefing on our fiscal year 2020 financial results. • To prevent the spread of COVID-19 infections, we are holding today’s event online. We apologize for any inconvenience this may cause. • My presentation will offer an overview of the progress of our management direction. To meet the management targets we announced last year for FY2022, I will explain the initiatives we have undertaken in FY2020 and the areas on which we will focus this fiscal year and into the future. Copyright 2021 FUJITSU LIMITED 0 • It was just around a year ago that we first defined our Purpose: “to make the world more sustainable by building trust in society through innovation.” • Today, all of our activities are aligned to achieve this purpose. Copyright 2021 FUJITSU LIMITED 1 Copyright 2021 FUJITSU LIMITED 2 • First, I would like to provide a summary of our financial results for FY2020. • The upper part of the table shows company-wide figures for Fujitsu. Consolidated revenue was 3,589.7 billion yen, and operating profit was 266.3 billion yen. • In the lower part of the table, which shows the results in our core business of Technology Solutions, consolidated revenue was 3,043.6 billion yen, operating profit was 188.4 billion yen, and our operating profit margin was 6.2%. • Due to the impact of COVID-19, revenue fell, but because of our shift toward the services business and greater efficiencies, on a company-wide basis our operating profit and profit for the year were the highest in Fujitsu’s history.