Fugaku: the First 'Exascale' Supercomputer - Past, Present and Future

Total Page:16

File Type:pdf, Size:1020Kb

Fugaku: the First 'Exascale' Supercomputer - Past, Present and Future Fugaku: the First 'Exascale' Supercomputer - Past, Present and Future. Satoshi Matsuoka, Director R-CCS / Prof. Tokyo Inst. Tech. Cluster 2020 Keynote Presentation 15 Sep. 20201 Science of Computing by Computing for Computing R-CCS International core research center in the science of high performance computing (HPC) One of Riken’s 12 Research Centers 18 Research Teams + 4 Operations Teams > 200 FTEs III Achievements First ‘Exascale’ SC: Fugaku R&D Groundbreaking Apps: E.g., Large-scale - R&D started since the genesis Earthquake disaster simulation w/HPC & AI of the center, to design & build Ultra large scale world’s first “exascale” earthquake simulation supercomputer using AI for massive scale - Fugaku construction official earthquake in Tokyo area. approval to build by the Nominated as finalist of government on Nov. 2018 Modeling of bldgs., Gordon bell 2018. Winning - Installation start Dec. 2019, underground structures, SC16, 17 Best Poster Fugaku/post- K rack, CPU with geological layers ( ) Award. prototype ©Fujitsu operations to start early 2021 ©Tokyo Univ. Industry outreach e.g. RIKEN Combustion Top-Tier Awards HPC High Speed algorithm for analyzing social System CAE Consortium RIKEN Consortium to develop next generation networks won Best paper Award at HiPC. CAE for combustors has established with Contribution of Dr. Matsuoka has awarded 11 industries and 9 at ACM HPDC 2018 academia members. Achievement Award also It would expand the Asia HPC Leadership users of R-CCS Award at Software as well as supercomputing Asia Fugaku. 2019. Simulation CAE examples 3 The “Fugaku” 富岳 “Exascale” Supercomputer for Society 5.0 High Large Scale Application Scale Large - Mt. Fuji representing Peak (Capability) the ideal of supercomputing --- Acceleration of Acceleration Broad Base --- Applicability & Capacity Broad Applications: Simulation, Data Science, AI, … Broad User Bae: Academia, Industry, Cloud Startups, … For Society 5.0 Fugaku: Largest & Fastest Supercomputer Ever ‘Applications First’ R&D Challenge--- High Risk “Moonshot” R&D ●A new high performance & low power Arm A64FX CPU co-developed by Riken R-CCS & Fujitsu along with nationwide HPC researchers as a National Flagship 2020 project - 3x perf c.f. top CPU in HPC apps “Moonshot” - 3x power efficiency c.f. top CPU R&D Target - General purpose Arm CPU, runs sa me program as Smartphones - Acceleration features for AI ●Fugaku x 2~3 = Entire annual IT in Japan Servers (incl. K Smartphones Fugaku IDC) Computer 20 million 300,000 1 Untis ~annual shipment = (~annual = Max 120 in Japan shipment in Japan (160K nodes) 600-700W×30万台= 15MW Power 10W×2,000万台= > 30MW = 200MW (less than 1/10 (W) 200MW > (very low) efficiency c.f. (incl cooling) Fugaku) ●Developed via extensive co-design ”Science of Computing" ”Science by Computing" By Riken & Fujitsu & HPCI Centers, “9 Priority Areas” to develop target etc., Arm Ecosystem, Reflecting applications to tackle important numerous research results societal problems 5 Brief History of R-CCS towards Fugaku April 2018 AICS renamed to RIKEN R-CCS. April 2012 April 2020 Satoshi Matsuoka becomes new July 2010 Post-K Feasibility Study start COVID19 Teams Director RIKEN AICS established 3 Arch Teams and 1 Apps Team Start Aug 2018 August 2010 June 2012 May 2020 Arm A64fx announce at Hotchips HPCI Project start K computer construction complete Fugaku Shipment Oct 2018 September 2010 September 2012 Complete NEDO 100x processor project start K computer installation start K computer production start June 2020 Nov 2018 First meeting of SDHPC November 2012 Fugaku #1 on Top Post-K Manufacturing approval by (Post-K) ACM Gordon bell Award 500 etc. Prime Minister’s CSTI Committee 2013 2006 2011 2014 2010 2019 2012 2018 2020 January 2006 End of FY2013 (Mar 2014) Next Generation Supercomputer Post-K Feasibility Study Reports March 2019 Project (K Computer) start Post-K Manufacturing start May 2019 June 2011 Post-K named “Supercomputer Fugaku” K #1 on Top 500 July 2019 November 2011 Post-Moore Whitepaper start K #1 on Top 500 > 10 Petaflops April 2014 Aug 2019 ACM Gordon Bell Award Post-K project start K Computer shutdown End of FY 2011 (March 2012) June 2014 Dec 2019 SDHPC Whitepaper K #1 on Graph 500 Fugaku installation start 6 “Applications First” Co-Design Activities in Fugaku Science by Computing Science of Computing ・9 Priority App Areas: High Concern to Riken R-CCS & Fujtisu General Public: Medical/Pharma, Environment/Disaster, Energy, Manufacturing, … A 6 4 f x For the Post-K supercomputer Select representatives fr Design systems with param om 100s of applications eters that consider various signifying various compu application characteristics tational characteristics Extremely tight collabrations between the Co-Design apps centers, Riken, and Fujitsu, etc. Chose 9 representative apps as “target application” scenario Achieve up to x100 speedup c.f. K-Computer Also ease-of-programming, broad SW ecosystem, very low power, … A64FX Leading-edge Si-technology TSMC 7nm FinFET & CoWoS TofuD Interface PCIe Interface HBM2 Broadcom SerDes, HBM I/O, and Core Core Core Core Core Core Core Core Core Core SRAMs Interface L2 L2 HBM2interface Core Core Core Core 8.786 billion transistors Cache Cache 594 signal pins Core Core Core Core Core Core RING Core Core Core Core Core Core - Bus Core Core Core Core Core Core Core Core Core Core Core Core HBM2Interface L2 L2 Core Core Core Core Cache Cache HBM2Interface Core Core Core Core Core Core Core Core Core Core Post-K Activities, ISC19, Frankfurt 8 Copyright 2019 FUJITSU LIMITED Fugaku’s FUjitsu A64fx Processor is… an Many-Core ARM CPU… 48 compute cores + 2 or 4 assistant (OS) cores Brand new core design Near Xeon-Class Integer performance core ARM V8 --- 64bit ARM ecosystem Tofu-D + PCIe 3 external connection …but also an accelerated GPU-like processor SVE 512 bit x 2 vector extensions (ARM & Fujitsu) Integer (1, 2, 4, 8 bytes) + Float (16, 32, 64 bytes) Cache + memory localization (sector cache) HBM2 on package memory – Massive Mem BW (Bytes/DPF ~0.4) Streaming memory access, strided access, scatter/gather etc. Intra-chip barrier synch. and other memory enhancing features GPU-like High performance in HPC especially CFD-- Weather & Climate (even with traditional Fortran code) + AI/Big Data 9 A64FX Chip Overview Tofu I/O Architecture Features <A64FX> 28Gbps 2 lanes 10 ports PCIe Gen3 16 lanes CMG specification 13 cores • Armv8.2-A (AArch64 only) L2$ 8MiB Mem 8GiB, 256GB/s Tofu PCIe • SVE 512-bit wide SIMD controller controller • 48 computing cores + 4 assistant cores* HBM2 HBM2 *All the cores are identical Netwrok • HBM2 32GiB on Chip • Tofu 6D Mesh/Torus HBM2 HBM2 28Gbps x 2 lanes x 10 ports • PCIe Gen3 16 lanes A64FX SPARC64 XIfx 7nm FinFET (Post-K) (PRIMEHPC FX100) ISA (Base) Armv8.2-A SPARC-V9 • 8,786M transistors ISA (Extension) SVE HPC-ACE2 • 594 package signal pins Process Node 7nm 20nm Peak Performance >2.7TFLOPS 1.1TFLOPS Peak Performance (Efficiency) SIMD 512-bit 256-bit • >2.7TFLOPS (>90%@DGEMM) # of Cores 48+4 32+2 Memory HBM2 HMC Memory Peak B/W 1024GB/s 240GB/s x2 (in/out) • Memory B/W 1024GB/s (>80%@Stream Triad) 6 All Rights Reserved. Copyright © FUJITSU LIMITED 2018 A64FX Core Pipeline A64FX enhances and inherits superior features of SPARC64 • Inherits superscalar, out-of-order, branch prediction, etc. • Enhances SIMD and predicate operations – 2x 512-bit wide SIMD FMA + Predicate Operation + 4x ALU (shared w/ 2x AGEN) – 2x 512-bit wide SIMD load or 512-bit wide SIMD store Fetch Issue Dispatch Reg-Read Execute Cache and Memory Commit CSE 52cores EAGA Fetch EXC Port Decode RSA EAGB PC L1 I$ & Issue PGPR EXD Store EXA Port L1D$ Control Registers Branch Buffer Predictor PPR PRX RSE1 FLA RSBR PFPR FLB Tofu controller L2$ PCI Controller Main Enhanced block HBM2 Controller Tofu Interconnect HBM2 PCI-GEN3 8 All Rights Reserved. Copyright © FUJITSU LIMITED 2018 Power Management (Cont.) “Power knob” for power optimization A64FX provides power management function called “Power Knob” • Applications can change hardware configurations for power optimization → Power knobs and Energy monitor/analyzer will help users to optimize power consumption of their applications <A64FX Power Knob Diagram> Decode width: 2 EX pipeline usage : EXA only Frequency reduction Fetch Issue Dispatch Reg-Read Execute Cache and Memory Commit CSE EAGA Fetch EXC Port Decode RSA EAGB PC L1 I$ & Issue PGPR EXD Store EXA Port L1D$ Control Registers EXB Write RSE0 Buffer Branch Predictor PPR PRX RSE1 FLA PFPR RSBR 52 cores FLB L2$ HBM2 Controller PCI Controller Tofu controller FL pipeline usage: FLA only HBM2 HBM2 B/W adjustment (units of 10%) 16 All Rights Reserved. Copyright © FUJITSU LIMITED 2018 Many-Core Architecture A64FX consists of four CMGs (Core Memory Group) • A CMG consists of 13 cores, an L2 cache and a memory controller - One out of 13 cores is an assistant core which handles daemon, I/O, etc. • Four CMGs keep cache coherency by ccNUMA with on-chip directory • X-bar connection in a CMG maximizes high efficiency for throughput of the L2 cache • Process binding in a CMG allows linear scalability up to 48 cores On-chip-network with a wide ring bus secures I/O performance CMG Configuration A64FX Chip Configuration core core core core core core core Tofu PCIe Controller Controller core core core core core core HBM2 HBM2 X-Bar Connection CMG CMG Netwrok on L2 cache 8MiB 16-way Chip (Ring bus) HBM2 HBM2 CMG CMG Memory Controller Network HBM2 on Chip 12 All Rights Reserved. Copyright © FUJITSU LIMITED 2018 High Bandwidth Extremely high bandwidth in caches and memory • A64FX has out-of-order mechanisms in cores, caches and memory controllers. It maximizes the capability of each layer’s bandwidth CMG 12x Computing Cores + 1x Assistant Core Core Core Core Core Performance 512-bit wide SIMD >2.7TFLOPS 2x FMAs >115 L1 Cache > 230 GB/s >11.0TB/s (BF ratio = 4) GB/s L1D 64KiB, 4way >57 L2 Cache > 115 GB/s >3.6TB/s (BF ratio = 1.3) GB/s L2 Cache 8MiB, 16way Memory 256 1024GB/s (BF ratio =~0.37) GB/s HHBBMM2288GGiBiB HBM2HBM288GGiBiB 13 All Rights Reserved.
Recommended publications
  • File System and Power Management Enhanced for Supercomputer Fugaku
    File System and Power Management Enhanced for Supercomputer Fugaku Hideyuki Akimoto Takuya Okamoto Takahiro Kagami Ken Seki Kenichirou Sakai Hiroaki Imade Makoto Shinohara Shinji Sumimoto RIKEN and Fujitsu are jointly developing the supercomputer Fugaku as the successor to the K computer with a view to starting public use in FY2021. While inheriting the software assets of the K computer, the plan for Fugaku is to make improvements, upgrades, and functional enhancements in various areas such as computational performance, efficient use of resources, and ease of use. As part of these changes, functions in the file system have been greatly enhanced with a focus on usability in addition to improving performance and capacity beyond that of the K computer. Additionally, as reducing power consumption and using power efficiently are issues common to all ultra-large-scale computer systems, power management functions have been newly designed and developed as part of the up- grading of operations management software in Fugaku. This article describes the Fugaku file system featuring significantly enhanced functions from the K computer and introduces new power management functions. 1. Introduction application development environment are taken up in RIKEN and Fujitsu are developing the supercom- separate articles [1, 2], this article focuses on the file puter Fugaku as the successor to the K computer and system and operations management software. The are planning to begin public service in FY2021. file system provides a high-performance and reliable The Fugaku is composed of various kinds of storage environment for application programs and as- system software for supporting the execution of su- sociated data.
    [Show full text]
  • Performance Analysis on Energy Efficient High
    Performance Analysis on Energy Efficient High-Performance Architectures Roman Iakymchuk, François Trahay To cite this version: Roman Iakymchuk, François Trahay. Performance Analysis on Energy Efficient High-Performance Architectures. CC’13 : International Conference on Cluster Computing, Jun 2013, Lviv, Ukraine. hal-00865845 HAL Id: hal-00865845 https://hal.archives-ouvertes.fr/hal-00865845 Submitted on 25 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Performance Analysis on Energy Ecient High-Performance Architectures Roman Iakymchuk1 and François Trahay1 1Institut Mines-Télécom Télécom SudParis 9 Rue Charles Fourier 91000 Évry France {roman.iakymchuk, francois.trahay}@telecom-sudparis.eu Abstract. With the shift in high-performance computing (HPC) towards energy ecient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing perfor- mance analysis tools to these new systems. We present EZTrace a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on a mechanism for instrumenting func- tions, a lightweight tool for recording events, and a generic interface for writing traces. To support EZTrace on energy ecient HPC systems, we developed a CUDA module and ported EZTrace to ARM processors.
    [Show full text]
  • Science-Driven Development of Supercomputer Fugaku
    Special Contribution Science-driven Development of Supercomputer Fugaku Hirofumi Tomita Flagship 2020 Project, Team Leader RIKEN Center for Computational Science In this article, I would like to take a historical look at activities on the application side during supercomputer Fugaku’s climb to the top of supercomputer rankings as seen from a vantage point in early July 2020. I would like to describe, in particular, the change in mindset that took place on the application side along the path toward Fugaku’s top ranking in four benchmarks in 2020 that all began with the Application Working Group and Computer Architecture/Compiler/ System Software Working Group in 2011. Somewhere along this path, the application side joined forces with the architecture/system side based on the keyword “co-design.” During this journey, there were times when our opinions clashed and when efforts to solve problems came to a halt, but there were also times when we took bold steps in a unified manner to surmount difficult obstacles. I will leave the description of specific technical debates to other articles, but here, I would like to describe the flow of Fugaku development from the application side based heavily on a “science-driven” approach along with some of my personal opinions. Actually, we are only halfway along this path. We look forward to the many scientific achievements and so- lutions to social problems that Fugaku is expected to bring about. 1. Introduction In this article, I would like to take a look back at On June 22, 2020, the supercomputer Fugaku the flow along the path toward Fugaku’s top ranking became the first supercomputer in history to simul- as seen from the application side from a vantage point taneously rank No.
    [Show full text]
  • BDEC2 Poznan Japanese HPC Infrastructure Update
    BDEC2 Poznan Japanese HPC Infrastructure Update Presenter: Masaaki Kondo, Riken R-CCS/Univ. of Tokyo (on behalf of Satoshi Matsuoka, Director, Riken R-CCS) 1 Post-K: The Game Changer (2020) 1. Heritage of the K-Computer, HP in simulation via extensive Co-Design • High performance: up to x100 performance of K in real applications • Multitudes of Scientific Breakthroughs via Post-K application programs • Simultaneous high performance and ease-of-programming 2. New Technology Innovations of Post-K Global leadership not just in High Performance, esp. via high memory BW • the machine & apps, but as Performance boost by “factors” c.f. mainstream CPUs in many HPC & Society5.0 apps via BW & Vector acceleration cutting edge IT • Very Green e.g. extreme power efficiency Ultra Power efficient design & various power control knobs • Arm Global Ecosystem & SVE contribution Top CPU in ARM Ecosystem of 21 billion chips/year, SVE co- design and world’s first implementation by Fujitsu High Perf. on Society5.0 apps incl. AI • ARM: Massive ecosystem Architectural features for high perf on Society 5.0 apps from embedded to HPC based on Big Data, AI/ML, CAE/EDA, Blockchain security, etc. Technology not just limited to Post-K, but into societal IT infrastructures e.g. Clouds 2 Arm64fx & Post-K (to be renamed) Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU HPC Optimized: Extremely high package high memory BW (1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS (~3Teraflops), various AI support (FP16, INT8, etc.) Gen purpose CPU – Linux,
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]
  • Measuring Power Consumption on IBM Blue Gene/P
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ).
    [Show full text]
  • Providing a Robust Tools Landscape for CORAL Machines
    Providing a Robust Tools Landscape for CORAL Machines 4th Workshop on Extreme Scale Programming Tools @ SC15 in AusGn, Texas November 16, 2015 Dong H. Ahn Lawrence Livermore Naonal Lab Michael Brim Oak Ridge Naonal Lab Sco4 Parker Argonne Naonal Lab Gregory Watson IBM Wesley Bland Intel CORAL: Collaboration of ORNL, ANL, LLNL Current DOE Leadership Computers Objective - Procure 3 leadership computers to be Titan (ORNL) Sequoia (LLNL) Mira (ANL) 2012 - 2017 sited at Argonne, Oak Ridge and Lawrence Livermore 2012 - 2017 2012 - 2017 in 2017. Leadership Computers - RFP requests >100 PF, 2 GiB/core main memory, local NVRAM, and science performance 4x-8x Titan or Sequoia Approach • Competitive process - one RFP (issued by LLNL) leading to 2 R&D contracts and 3 computer procurement contracts • For risk reduction and to meet a broad set of requirements, 2 architectural paths will be selected and Oak Ridge and Argonne must choose different architectures • Once Selected, Multi-year Lab-Awardee relationship to co-design computers • Both R&D contracts jointly managed by the 3 Labs • Each lab manages and negotiates its own computer procurement contract, and may exercise options to meet their specific needs CORAL Innova,ons and Value IBM, Mellanox, and NVIDIA Awarded $325M U.S. Department of Energy’s CORAL Contracts • Scalable system solu.on – scale up, scale down – to address a wide range of applicaon domains • Modular, flexible, cost-effec.ve, • Directly leverages OpenPOWER partnerships and IBM’s Power system roadmap • Air and water cooling • Heterogeneous
    [Show full text]
  • Summit Supercomputer
    Cooling System Overview: Summit Supercomputer David Grant, PE, CEM, DCEP HPC Mechanical Engineer Oak Ridge National Laboratory Corresponding Member of ASHRAE TC 9.9 Infrastructure Co-Lead - EEHPCWG ORNL is managed by UT-Battelle, LLC for the US Department of Energy Today’s Presentation #1 ON THE TOP 500 LIST • System Description • Cooling System Components • Cooling System Performance NOVEMBER 2018 #1 JUNE 2018 #1 2 Feature Titan Summit Summit Node Overview Application Baseline 5-10x Titan Performance Number of 18,688 4,608 Nodes Node 1.4 TF 42 TF performance Memory per 32 GB DDR3 + 6 GB 512 GB DDR4 + 96 GB Node GDDR5 HBM2 NV memory per 0 1600 GB Node Total System >10 PB DDR4 + HBM2 + 710 TB Memory Non-volatile System Gemini (6.4 GB/s) Dual Rail EDR-IB (25 GB/s) Interconnect Interconnect 3D Torus Non-blocking Fat Tree Topology Bi-Section 15.6 TB/s 115.2 TB/s Bandwidth 1 AMD Opteron™ 2 IBM POWER9™ Processors 1 NVIDIA Kepler™ 6 NVIDIA Volta™ 32 PB, 1 TB/s, File System 250 PB, 2.5 TB/s, GPFS™ Lustre® Peak Power 9 MW 13 MW Consumption 3 System Description – How do we cool it? • >100,000 liquid connections 4 System Description – What’s in the data center? • Passive RDHXs- 215,150ft2 (19,988m2) of total heat exchange surface (>20X the area of the data center) – With a 70°F (21.1 °C) entering water temperature, the room averages ~73°F (22.8°C) with ~3.5MW load and ~75.5°F (23.9°C) with ~10MW load.
    [Show full text]
  • Download SC18 Final Program
    Program November 11-16, 2018 The International Conference Exhibits November 12-15, 2018 for High Performance KAY BAILEY HUTCHISON CONVENTION CENTER Computing, Networking, DALLAS, TEXAS, USA Storage, and Analysis Sponsored by: MICHAEL S. RAWLINGS Mayor of Dallas November 11, 2018 Supercomputing – SC18 Conference: As Mayor of Dallas, it is my pleasure to welcome SC18 – the ultimate international conference for high performance computing, networking, storage, and analysis – to the Kay Bailey Hutchison Convention Center! I applaud SC18 for bringing together high-performance computing (HPC) professionals and students from across the globe to share ideas, attend and participate in technical presentations, papers and workshops, all while inspiring the next generation of HPC professionals. Dallas is the No. 1 visitor destination in Texas and one of the top cities in the nation for meetings and conventions. In addition to having the resources to host major events and conventions, Dallas is a greatly diverse American city and a melting pot of cultures. This important convergence of uniqueness and differences is reflected throughout the sights and sounds of our city. Dallas' authentic arts, music, food, historic landmarks and urban lifestyle all contribute to the city's rich cultural makeup. Let the bold, vibrant spirit of Dallas, with a touch of Texas charm, guide the way to your BIG Dallas moments. Our city is filled with the unique experiences one can only have in Dallas – from cuisine by celebrity chefs, to the cultural landscape in one of the nation’s largest arts districts, to first-class shopping - make the most of your time in our great city! From transportation to hotels, restaurants, retail and entertainment venues, Dallas is ready to roll out our red carpet to host SC18.
    [Show full text]
  • Compute System Metrics (Bof)
    Setting Trends for Energy-Efficient Supercomputing Natalie Bates, EE HPC Working Group Tahir Cader, HP & The Green Grid Wu Feng, Virginia Tech & Green500 John Shalf, Berkeley Lab & NERSC Horst Simon, Berkeley Lab & TOP500 Erich Strohmaier, Berkeley Lab & TOP500 ISC BoF; May 31, 2010; Hamburg, Germany Why We Are Here • “Can only improve what you can measure” • Context – Power consumption of HPC and facilities cost are increasing • What is needed? – Converge on a common basis between different research and industry groups for: •metrics • methodologies • workloads for energy-efficient supercomputing, so we can make progress towards solutions. Current Technology Roadmaps will Depart from Historical Gains Power is the Leading Design Constraint From Peter Kogge, DARPA Exascale Study … and the power costs will still be staggering From Peter Kogge, DARPA Exascale Study $1M per megawatt per year! (with CHEAP power) Absolute Power Levels Power Consumption Power Efficiency What We Have Done • Stages of Green Supercomputing – Denial – Awareness – Hype – Substance The Denial Phase (2001 – 2004) • Green Destiny – A 240-Node Supercomputer in 5 Sq. Ft. – LINPACK Performance: 101 Gflops – Power Consumption: 3.2 kW • Prevailing Views embedded processor – “Green Destiny is so low power that it runs just as fast when it is unplugged.” – “In HPC, no one cares about power & cooling, and no one ever will …” – “Moore’s Law for Power will stimulate the economy by creating a new market in cooling technologies.” The Awareness Phase (2004 – 2008) • Green Movements & Studies – IEEE Int’l Parallel & Distributed Processing Symp. (2005) • Workshop on High-Performance, Power-Aware Computing (HPPAC) Green500 • Metrics: Energy-Delay Product and FLOPS/Watt FLOPS/watt – Green Grid (2007) • Industry-driven consortium of all the top system vendors • Metric: Power Usage Efficiency (PUE) – Kogge et al., “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA ITO, AFRL, 2008.
    [Show full text]
  • Mixed Precision Block Fused Multiply-Add
    Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh To cite this version: Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores. SIAM Journal on Scientific Computing, Society for Industrial and Applied Mathematics, 2020, 42 (3), pp.C124-C141. 10.1137/19M1289546. hal-02491076v2 HAL Id: hal-02491076 https://hal.archives-ouvertes.fr/hal-02491076v2 Submitted on 28 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. MIXED PRECISION BLOCK FUSED MULTIPLY-ADD: ERROR ANALYSIS AND APPLICATION TO GPU TENSOR CORES∗ PIERRE BLANCHARDy , NICHOLAS J. HIGHAMz , FLORENT LOPEZx , THEO MARY{, AND SRIKARA PRANESHk Abstract. Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision and existing rounding error analyses do not support them. We consider a mixed precision block FMA that generalizes both the usual scalar FMA and existing tensor units.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]