Fugaku: the First ‘Exascale’ Machine

Total Page:16

File Type:pdf, Size:1020Kb

Fugaku: the First ‘Exascale’ Machine Fugaku: the First ‘Exascale’ Machine Satoshi Matsuoka, Director R-CCS / Prof. Tokyo Inst. Tech. The 6th ENES HPC Workshop (virtual) 26 May 2020 1 The “Fugaku” 富岳 “Exascale” Supercomputer for Society 5.0 High Large Application Scale - Peak Peak Mt. Fuji representing (Capability) --- the ideal of supercomputing Acceleration of Acceleration Broad Base --- Applicability & Capacity Broad Applications: Simulation, Data Science, AI, … Broad User Bae: Academia, Industry, Cloud Startups, … For Society 5.0 Arm64fx & Fugaku 富岳 /Post-K are: Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU HPC Optimized: Extremely high package high memory BW (1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS (~3Teraflops), various AI support (FP16, INT8, etc.) Gen purpose CPU – Linux, Windows (Word), other SCs/Clouds Extremely power efficient – > 10x power/perf efficiency for CFD benchmark over current mainstream x86 CPU Largest and fastest supercomputer to be ever built circa 2020 > 150,000 nodes, superseding LLNL Sequoia > 150 PetaByte/s memory BW Tofu-D 6D Torus NW, 60 Petabps injection BW (10x global IDC traffic) 25~30PB NVMe L1 storage many endpoint 100Gbps I/O network into Lustre 3 Brief History of R-CCS towards Fugaku April 2018 April 2012 AICS renamed to RIKEN R-CCS. July 2010 Post-K Feasibility Study start Satoshi Matsuoka becomes new RIKEN AICS established 3 Arch Teams and 1 Apps Team Director August 2010 June 2012 Aug 2018 April 2020 HPCI Project start K computer construction complete Arm A64fx announce at Hotchips COVID19 Teams September 2010 September 2012 Oct 2018 Start K computer installation start K computer production start NEDO 100x processor project start May 2020 First meeting of SDHPC November 2012 Nov 2018 Fugaku Shipment (Post-K) ACM Gordon bell Award Post-K Manufacturing approval by Complete Prime Minister’s CSTI Committee 2013 2006 2011 2014 2010 2019 2012 2018 2020 January 2006 End of FY2013 (Mar 2014) Next Generation Supercomputer P Post-K Feasibility Study Reports March 2019 roject (K Computer) start Post-K Manufacturing start May 2019 June 2011 Post-K named “Supercomputer Fugaku” #1 on Top 500 July 2019 November 2011 Post-Moore Whitepaper start #1 on Top 500 > 10 Petaflops Aug 2019 April 2014 K Computer shutdown ACM Gordon Bell Award Post-K project start End of FY 2011 (March 2012) Dec 2019 June 2014 Fugaku installation start 4 SDHPC Whitepaper #1 on Graph 500 Co-Design Activities in Fugaku Multiple Activities since 2011 Science by Computing Science of Computing ・9 Priority App Areas: High Concern to General Public: Medical/Pharma, Environment/Disaster, Energy, Manufacturing, … A64fx For the Post-K supercomputer Select representatives fr Design systems with param om 100s of applications eters that consider various signifying various compu application characteristics tational characteristics Extremely tight collabrations between the Co-Design apps centers, Riken, and Fujitsu, etc. Chose 9 representative apps as “target application” scenario Achieve up to x100 speedup c.f. K-Computer Also ease-of-programming, broad SW ecosystem, very low power, … Research Subjects of the Post-K Computer The post K computer will expand the fields pioneered by the K computer, and also challenge new areas. 11 NICAM: Global Climate Simulation Global cloud resolving model with 0.87 km-mesh which allows resolution of cumulus clouds Month-long forecasts of Madden-Julian oscillations in the tropics is realized. Global cloud resolving model Miyamoto et al (2013) , Geophys. Res. Lett., 40, 4922–4926, doi:10.1002/grl.50944. 13 “Big Data Assimilation” NICAM+LETKF High-precision Simulations Future-generation technologies available 10 years in advance High-precision observations Mutual feedback Fugaku’s FUjitsu A64fx Processor is… an Many-Core ARM CPU… 48 compute cores + 2 or 4 assistant (OS) cores Brand new core design Near Xeon-Class Integer performance core ARM V8 --- 64bit ARM ecosystem Tofu-D + PCIe 3 external connection …but also an accelerated GPU-like processor SVE 512 bit x 2 vector extensions (ARM & Fujitsu) Integer (1, 2, 4, 8 bytes) + Float (16, 32, 64 bytes) Cache + memory localization (sector cache) HBM2 on package memory – Massive Mem BW (Bytes/DPF ~0.4) Streaming memory access, strided access, scatter/gather etc. Intra-chip barrier synch. and other memory enhancing features GPU-like High performance in HPC especially CFD-- Weather & Climate (even with traditional Fortran code) + AI/Big Data 9 “Fugaku” CPU Performance Evaluation (2/3) Himeno Benchmark (Fortran90) Stencil calculation to solve Poisson’s equation by Jacobi method GFlops † † † “Performance evaluation of a vector supercomputer SX-aurora TSUBASA”, SC18, https://dl.acm.org/citation.cfm?id=3291728 Post-K Activities, ISC19, Frankfurt 10 Copyright 2019 FUJITSU LIMITED “Fugaku” CPU Performance Evaluation (3/3) WRF: Weather Research and Forecasting model Vectorizing loops including IF-constructs is key optimization Source code tuning using directives promotes compiler optimizations x x Post-K Activities, ISC19, Frankfurt 11 Copyright 2019 FUJITSU LIMITED Fugaku system configuration Scalable design CPU CMU BoB Shelf Rack System Unit # of nodes Description Single socket node with HBM2 & Tofu interconnect CPU 1 D CMU 2 CPU Memory Unit: 2x CPU BoB 16 Bunch of Blades: 8x CMU Shelf 48 3x BoB Rack 384 8x Shelf System 150k+ As a Fugaku system 12 © 2019 FUJITSU Fugaku Total System Config & Performance Total # Nodes: 158,976 nodes 384 nodes/rack x 396 (full) racks = 152,064 nodes 192 nodes/rack x 36 (half) racks = 6,912 nodes c.f. K Computer 88,128 nodes Theoretical Peak Compute Performances Normal Mode (CPU Frequency 2GHz) 提供: 富士通株式会社 64 bit Double Precision FP: 488 Petaflops C.f. K Computer performance comparison (Boost) 32 bit Single Precision FP: 977 Petaflops 64 bit Double Precision FP: 48x 16 bit Half Precision FP (AI training): 1.95 Exaflops 32 bit Single Precision: 95x 8 bit Integer (AI Inference): 3.90 Exaops 16 bit Half Precision (AI training): 190x Boost Mode (CPU Frequency 2.2GHz) K Computer Theoretical Peak: 11.28 PF for all precisions 64 bit Double Precision FP: 537 Petaflops 8 bit Integer (AI Inference): > 1,500x 32 bit Single Precision FP: 1.07 Exaflops K Computer Theoretical Peak: 2.82 Petaops (64 bits) 16 bit Half Precision FP (AI training): 2.15 Exaflops Theoretical Peak Memory Bandwidth: 29x 8 bit Integer (AI Inference): 4.30 Exaops K Computer Theoretical Peak: 5.64 Petabytes/s Theoretical Peak Memory Bandwidth: 163 Petabytes/s 13 Fugaku is a Year’s worth of IT in Japan IDC Servers incl Smartphones Fugaku K Computer Clouds 20 million 300,000 Units (2/3 annual shipments = (2/3 annual shipments = 1 30~100 in Japan) in Japan) = 10W×20 mil = 600-700W x 30K > 1/ Power = 200MW 30MW 15MW 200MW (incl cooling) > Arm x86/Arm CPU ISA Arm Sparc System SW iOS/ Linux (Red Hat Linux (Red Proprietary Linux Android Linux etc.)/Win Hat etc.) Low generality Gen. Purpos Gen. CPU AI Custom ASIC Accelerator e.g. SVE None Acceleration Inference Only GPU instructions 14 Green500, Nov. 2019 A64FX prototype – Fujitsu A64FX 48C 2GHz ranked #1 on the list 768x general purpose A64FX CPU w/o accelerators • 1.9995 PFLOPS @ HPL, 84.75% • 16.876 GF/W • Power quality level 2 FUJITSU CONFIDENTIAL 15 Copyright 2019 FUJITSU LIMITED SC19 Green500 ranking and 1st appeared TOP500 edition “A64FX prototype”, prototype of Supercomputer Fugaku, 18 ranked #1, 16.876 GF/W 16 nd w/o Acc Approx. 3x better GF/W to 2 system w/CPUs w/ Acc 14 12 The first system w/o Acc ranked #1 in 9.5 years 10 8 Top Xeon machine #37, 5.843 GF/W GF/W 6 Endeavor: #94, 2.961 GF/W 4 2 0 0 50 100 150 200 250 300 350 400 450 500 35 40 45 50 Green500 rank TOP500 edition FUJITSU CONFIDENTIAL 16 Copyright 2019 FUJITSU LIMITED A64FX CPU performance evaluation for real apps Relative speed up ratio Open source software, Real 0.5 1.0 1.5 apps on an A64FX @ 2.2GHz OpenFOAM Up to 1.8x faster over the latest FrontlSTR x86 processor (24core, 2.9GHz) x 2, or 3.6x per socket ABINIT High memory B/W and long SALMON SIMD length of A64FX work SPECFEM3D effectively with these WRF applications MPAS FUJITSU A64FX x86 (24core, 2.9GHz)x2 19 A64FX CPU power efficiency for real apps Relative power efficiency ratio Performance / Energy 1.0 2.0 3.0 4.0 consumption on an A64FX @ 2.2GHz OpenFOAM FrontlSTR Up to 3.7x more efficient over the latest x86 processor ABINIT (24core, 2.9GHz) x2 SALMON High efficiency is achieved by SPECFEM3D energy-conscious design and WRF implementation MPAS FUJITSU A64FX x86 (24core, 2.9GHz)x2 20 Fugaku Performance Estimate on 9 Co-Design Target Apps Catego Performance Priority Issue Area Application Brief description ry Speedup over K Performance target goal Health and longevity 1. Innovative computing 100 times faster than K for some applications infrastructure for drug GENESIS MD for proteins discovery 125x + (tuning included) 2. Personalized and 30 to 40 MW power consumption Genome processing preventive medicine using big 8x + Genomon (Genome alignment) Peak performance to be achieved data prevention and 3. Integrated simulation Environment Earthquake simulator (FEM in systems induced by PostK K Disaster GAMERA earthquake and tsunami 45x + unstructured & structured grid) Peak DP >400+ Pflops 11.3 Pflops Weather prediction system using (double precision) (34x +) 4. Meteorological and global NICAM+ environmental prediction using 120x + Big data (structured grid stencil & Peak SP big data LETKF >800+ Pflops ensemble Kalman filter) (single 11.3 Pflops (70x +) 5. New technologies for precision) Energy issue Molecular electronic simulation energy creation, conversion / 40x + NTChem (structure calculation) Peak HP >1600+ Pflops storage, and use -- (half precision) (141x +) 6. Accelerated development of Computational Mechanics System Total memory >150+ PB/sec innovative clean energy Adventure for Large Scale Analysis and 5,184TB/sec systems 35x + bandwidth (29x +) Design (unstructured grid) competitiveness enhancement 7.
Recommended publications
  • File System and Power Management Enhanced for Supercomputer Fugaku
    File System and Power Management Enhanced for Supercomputer Fugaku Hideyuki Akimoto Takuya Okamoto Takahiro Kagami Ken Seki Kenichirou Sakai Hiroaki Imade Makoto Shinohara Shinji Sumimoto RIKEN and Fujitsu are jointly developing the supercomputer Fugaku as the successor to the K computer with a view to starting public use in FY2021. While inheriting the software assets of the K computer, the plan for Fugaku is to make improvements, upgrades, and functional enhancements in various areas such as computational performance, efficient use of resources, and ease of use. As part of these changes, functions in the file system have been greatly enhanced with a focus on usability in addition to improving performance and capacity beyond that of the K computer. Additionally, as reducing power consumption and using power efficiently are issues common to all ultra-large-scale computer systems, power management functions have been newly designed and developed as part of the up- grading of operations management software in Fugaku. This article describes the Fugaku file system featuring significantly enhanced functions from the K computer and introduces new power management functions. 1. Introduction application development environment are taken up in RIKEN and Fujitsu are developing the supercom- separate articles [1, 2], this article focuses on the file puter Fugaku as the successor to the K computer and system and operations management software. The are planning to begin public service in FY2021. file system provides a high-performance and reliable The Fugaku is composed of various kinds of storage environment for application programs and as- system software for supporting the execution of su- sociated data.
    [Show full text]
  • BDEC2 Poznan Japanese HPC Infrastructure Update
    BDEC2 Poznan Japanese HPC Infrastructure Update Presenter: Masaaki Kondo, Riken R-CCS/Univ. of Tokyo (on behalf of Satoshi Matsuoka, Director, Riken R-CCS) 1 Post-K: The Game Changer (2020) 1. Heritage of the K-Computer, HP in simulation via extensive Co-Design • High performance: up to x100 performance of K in real applications • Multitudes of Scientific Breakthroughs via Post-K application programs • Simultaneous high performance and ease-of-programming 2. New Technology Innovations of Post-K Global leadership not just in High Performance, esp. via high memory BW • the machine & apps, but as Performance boost by “factors” c.f. mainstream CPUs in many HPC & Society5.0 apps via BW & Vector acceleration cutting edge IT • Very Green e.g. extreme power efficiency Ultra Power efficient design & various power control knobs • Arm Global Ecosystem & SVE contribution Top CPU in ARM Ecosystem of 21 billion chips/year, SVE co- design and world’s first implementation by Fujitsu High Perf. on Society5.0 apps incl. AI • ARM: Massive ecosystem Architectural features for high perf on Society 5.0 apps from embedded to HPC based on Big Data, AI/ML, CAE/EDA, Blockchain security, etc. Technology not just limited to Post-K, but into societal IT infrastructures e.g. Clouds 2 Arm64fx & Post-K (to be renamed) Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU HPC Optimized: Extremely high package high memory BW (1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS (~3Teraflops), various AI support (FP16, INT8, etc.) Gen purpose CPU – Linux,
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]
  • Download SC18 Final Program
    Program November 11-16, 2018 The International Conference Exhibits November 12-15, 2018 for High Performance KAY BAILEY HUTCHISON CONVENTION CENTER Computing, Networking, DALLAS, TEXAS, USA Storage, and Analysis Sponsored by: MICHAEL S. RAWLINGS Mayor of Dallas November 11, 2018 Supercomputing – SC18 Conference: As Mayor of Dallas, it is my pleasure to welcome SC18 – the ultimate international conference for high performance computing, networking, storage, and analysis – to the Kay Bailey Hutchison Convention Center! I applaud SC18 for bringing together high-performance computing (HPC) professionals and students from across the globe to share ideas, attend and participate in technical presentations, papers and workshops, all while inspiring the next generation of HPC professionals. Dallas is the No. 1 visitor destination in Texas and one of the top cities in the nation for meetings and conventions. In addition to having the resources to host major events and conventions, Dallas is a greatly diverse American city and a melting pot of cultures. This important convergence of uniqueness and differences is reflected throughout the sights and sounds of our city. Dallas' authentic arts, music, food, historic landmarks and urban lifestyle all contribute to the city's rich cultural makeup. Let the bold, vibrant spirit of Dallas, with a touch of Texas charm, guide the way to your BIG Dallas moments. Our city is filled with the unique experiences one can only have in Dallas – from cuisine by celebrity chefs, to the cultural landscape in one of the nation’s largest arts districts, to first-class shopping - make the most of your time in our great city! From transportation to hotels, restaurants, retail and entertainment venues, Dallas is ready to roll out our red carpet to host SC18.
    [Show full text]
  • Mixed Precision Block Fused Multiply-Add
    Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh To cite this version: Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores. SIAM Journal on Scientific Computing, Society for Industrial and Applied Mathematics, 2020, 42 (3), pp.C124-C141. 10.1137/19M1289546. hal-02491076v2 HAL Id: hal-02491076 https://hal.archives-ouvertes.fr/hal-02491076v2 Submitted on 28 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. MIXED PRECISION BLOCK FUSED MULTIPLY-ADD: ERROR ANALYSIS AND APPLICATION TO GPU TENSOR CORES∗ PIERRE BLANCHARDy , NICHOLAS J. HIGHAMz , FLORENT LOPEZx , THEO MARY{, AND SRIKARA PRANESHk Abstract. Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision and existing rounding error analyses do not support them. We consider a mixed precision block FMA that generalizes both the usual scalar FMA and existing tensor units.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Programming on K Computer
    Programming on K computer Koh Hotta The Next Generation Technical Computing Fujitsu Limited Copyright 2010 FUJITSU LIMITED System Overview of “K computer” Target Performance : 10PF over 80,000 processors Over 640K cores Over 1 Peta Bytes Memory Cutting-edge technologies CPU : SPARC64 VIIIfx 8 cores, 128GFlops Extension of SPARC V9 Interconnect, “Tofu” : 6-D mesh/torus Parallel programming environment. 1 Copyright 2010 FUJITSU LIMITED I have a dream that one day you just compile your programs and enjoy high performance on your high-end supercomputer. So, we must provide easy hybrid parallel programming method including compiler and run-time system support. 2 Copyright 2010 FUJITSU LIMITED User I/F for Programming for K computer K computer Client System FrontFront End End BackBack End End (80000 (80000 Nodes) Nodes) IDE Interface CCommommandand JobJob C Controlontrol IDE IntInteerfrfaceace debugger DDebugebuggerger App IntInteerfrfaceace App Interactive Debugger GUI Data Data Data CoConvnveerrtteerr Data SamplSampleerr debugging partition official op VisualizedVisualized SSaammpplingling Stage out partition Profiler DaDattaa DaDattaa (InfiniBand) 3 Copyright 2010 FUJITSU LIMITED Parallel Programming 4 Copyright 2010 FUJITSU LIMITED Hybrid Parallelism on over-640K cores Too large # of processes to manipulate To reduce number of processes, hybrid thread-process programming is required But Hybrid parallel programming is annoying for programmers Even for multi-threading, procedure level or outer loop parallelism was desired Little opportunity for such coarse grain parallelism System support for “fine grain” parallelism is required 5 Copyright 2010 FUJITSU LIMITED Targeting inner-most loop parallelization Automatic vectorization technology has become mature, and vector-tuning is easy for programmers. Inner-most loop parallelism, which is fine-grain, should be an important portion for peta-scale parallelization.
    [Show full text]
  • HPC Systems in the Next Decade – What to Expect, When, Where
    EPJ Web of Conferences 245, 11004 (2020) https://doi.org/10.1051/epjconf/202024511004 CHEP 2019 HPC Systems in the Next Decade – What to Expect, When, Where Dirk Pleiter1;∗ 1Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany Abstract. HPC systems have seen impressive growth in terms of performance over a period of many years. Soon the next milestone is expected to be reached with the deployment of exascale systems in 2021. In this paper, we provide an overview of the exascale challenges from a computer architecture’s perspec- tive and explore technological and other constraints. The analysis of upcoming architectural options and emerging technologies allow for setting expectations for application developers, which will have to cope with heterogeneous archi- tectures, increasingly diverse compute technologies as well as deeper memory and storage hierarchies. Finally, needs resulting from changing science and en- gineering workflows will be discussed, which need to be addressed by making HPC systems available as part of more open e-infrastructures that provide also other compute and storage services. 1 Introduction Over several decades, high-performance computing (HPC) has seen an exponential growth of performance when solving large-scale numerical problems involving dense-matrix opera- tions. This development is well documented by the Top500 list [1], which is published twice a year since 1993. In various regions around the world, significant efforts are being per- formed to reach the next milestone of performance with the deployment of exascale systems. For various reasons, improving on the performance of HPC systems has become increasingly more difficult. In this paper, we discuss some of the reasons.
    [Show full text]
  • Management Direction Update
    • Thank you for taking the time out of your busy schedules today to participate in this briefing on our fiscal year 2020 financial results. • To prevent the spread of COVID-19 infections, we are holding today’s event online. We apologize for any inconvenience this may cause. • My presentation will offer an overview of the progress of our management direction. To meet the management targets we announced last year for FY2022, I will explain the initiatives we have undertaken in FY2020 and the areas on which we will focus this fiscal year and into the future. Copyright 2021 FUJITSU LIMITED 0 • It was just around a year ago that we first defined our Purpose: “to make the world more sustainable by building trust in society through innovation.” • Today, all of our activities are aligned to achieve this purpose. Copyright 2021 FUJITSU LIMITED 1 Copyright 2021 FUJITSU LIMITED 2 • First, I would like to provide a summary of our financial results for FY2020. • The upper part of the table shows company-wide figures for Fujitsu. Consolidated revenue was 3,589.7 billion yen, and operating profit was 266.3 billion yen. • In the lower part of the table, which shows the results in our core business of Technology Solutions, consolidated revenue was 3,043.6 billion yen, operating profit was 188.4 billion yen, and our operating profit margin was 6.2%. • Due to the impact of COVID-19, revenue fell, but because of our shift toward the services business and greater efficiencies, on a company-wide basis our operating profit and profit for the year were the highest in Fujitsu’s history.
    [Show full text]
  • 40 Jahre Erfolgsgeschichte BS2000 Vortrag Von 2014
    Fujitsu NEXT Arbeitskreis BS2000 40 Jahre BS2000: Eine Erfolgsgeschichte Achim Dewor Fujitsu NEXT AK BS2000 5./6. Mai 2014 0 © FUJITSU LIMITED 2014 Was war vor 40 Jahren eigentlich so los? (1974) 1 © FUJITSU LIMITED 2014 1974 APOLLO-SOJUS-KOPPLUNG Die amtierenden Staatspräsidenten Richard Nixon und Leonid Breschnew unterzeichnen das Projekt. Zwei Jahre arbeitet man an den Vorbereitungen für eine neue gemeinsame Kopplungsmission 2 © FUJITSU LIMITED 2014 1974 Der erste VW Golf wird ausgeliefert 3 © FUJITSU LIMITED 2014 1974 Willy Brandt tritt zurück 6. MAI 1974 TRITT BUNDESKANZLER WILLY BRANDT IM VERLAUF DER SPIONAGE-AFFÄRE UM SEINEN PERSÖNLICHEN REFERENTEN, DEN DDR-SPION GÜNTER GUILLAUME 4 © FUJITSU LIMITED 2014 1974 Muhammad Ali gegen George Foreman „Ich werde schweben wie ein Schmetterling, zustechen wie eine Biene", "Ich bin das Größte, das jemals gelebt hat“ "Ich weiß nicht immer, wovon ich rede. Aber ich weiß, dass ich recht habe“ (Zitate von Ali) 5 © FUJITSU LIMITED 2014 1974 Deutschland wird Fußball-Weltmeister Breitners Elfer zum Ausgleich und das unvergängliche, typische Müller- Siegtor aus der Drehung. Es war der sportliche Höhepunkt der wohl besten deutschen Mannschaft aller Zeiten. 6 © FUJITSU LIMITED 2014 Was ist eigentlich „alt“? These: „Stillstand macht alt …. …. ewig jung bleibt, was laufend weiterentwickelt wird!“ 7 © FUJITSU LIMITED 2014 Was ist „alt“ - was wurde weiterentwickelt ? Wird das Fahrrad bald 200 Jahre alt !? Welches? Das linke oder das rechte? Fahrrad (Anfang 1817) Fahrrad (2014) 8 © FUJITSU LIMITED 2014 Was ist „alt“ - was wurde weiterentwickelt ? Soroban (Japan) Im Osten, vom Balkan bis nach China, wird er hier und da noch als preiswerte Rechenmaschine bei kleineren Geschäften verwendet.
    [Show full text]
  • Toward Automated Cache Partitioning for the K Computer
    IPSJ SIG Technical Report Toward Automated Cache Partitioning for the K Computer Swann Perarnau1 Mitsuhisa Sato1,2 Abstract: The processor architecture available on the K computer (SPARC64VIIIfx) features an hardware cache par- titioning mechanism called sector cache. This facility enables software to split the memory cache in two independent sectors and to select which one will receive a line when it is retrieved from memory. Such control over the cache by an application enables significant performance optimization opportunities for memory intensive programs, that several studies on software-controlled cache partitioning environments already demonstrated. Most of these previous studies share the same implementation idea: the use of page coloring and an overriding of the operating system virtual memory manager. However, the sector cache differs in several key points over these imple- mentations, making the use of the existing analysis or optimization strategies impractical, while enabling new ones. For example, while most studies overlooked the issue of phase changes in an application because of the prohibitive cost of a repartitioning, the sector cache provides this feature without any costs, allowing multiple cache distribution strategies to alternate during an execution, providing the best allocation of the current locality pattern. Unfortunately, for most application programmers, this hardware cache partitioning facility is not easy to take advantage of. It requires intricate knowledge of the memory access patterns of a code and the ability to identify data structures that would benefit from being isolated in cache. This optimization process can be tedious, with multiple code modifications and countless application runs necessary to achieve a good optimization scheme.
    [Show full text]
  • The Post-K Project and Fujitsu ARM-SVE Enabled A64FX Processor for Energy-Efficiency and Sustained Application Performance
    CEA‐RIKEN Summer School, 13th June 2019 @ MDLS, Paris The post-K project and Fujitsu ARM-SVE enabled A64FX processor for energy-efficiency and sustained application performance Mitsuhisa Sato Team Leader of Architecture Development Team Deputy project leader, FLAGSHIP 2020 project Deputy Director, RIKEN Center for Computational Science (R-CCS) Professor (Cooperative Graduate School Program), University of Tsukuba The name of our system (a.k.a post‐K) was announced as “Fugaku” (May 23, 2019) 富岳 富岳 (Fugaku) = Mt. Fuji http://www.bestweb‐link.net/PD‐Museum‐of‐Art/ukiyoe/ukiyoe/fuga2 ku36/No.027.jpg FLAGSHIP2020 Project Missions • Building the Japanese national flagship supercomputer, “Fugaku” (a.k.a Post‐K), and • Developing wide range of HPC applications, running on Fugaku, in order to solve social and science issues in Japan Planned Budget (from 2014FY to 2020FY) • 110 billion JPY (about 1 billion US$ if 1US$=110JPY, total) includes: • Research and development, and manufacturing of the Fugakusystem • Development of applications Project organization Applications • The government selected 9 social & System development scientific priority issues and their • RIKEN is in charge of development R&D organizations. • Fujitsu is vendor partner. • Additional projects for Exploratory • International collaborations: DOE, JLESC, Issues were selected in Jun 2016 CEA .. 2019/5/17 Target science: 9 Priority Issues 重点課題① ⽣体分⼦システムの機能制御による ⾰新的創薬基盤の構築 重点課題② 個別化・予防医療を⽀援する 統合計算⽣命科学 重点課題③ 地震・津波による複合災害の ①Innovative Drug Discovery ②Personalized and Preventive ③Hazard統合的予測システムの構築 and Disaster induced by Medicine Earthquake and Tsunami Society with health and longevity Disaster prevention RIKEN Quant. Biology Center Inst. Medical Science, U. Tokyo Earthquakeand global climate Res. Inst., U.
    [Show full text]