Japan's 10 Peta FLOPS Supercomputer Development

Total Page:16

File Type:pdf, Size:1020Kb

Japan's 10 Peta FLOPS Supercomputer Development Japan’s 10 Peta FLOPS Supercomputer Development Project and its energy saving designs Ryutaro Himeno 1) Group Director of R&D 2) Deputy Program Director 3) Director of Advance group, NGS R&D Center for Computational Science Center for Computer & Communication Contents 1. Self introduction 2. Japan’s next generation supercomputer R&D project: 1. Before the project started 2. Introduction and current status 3. Energy issues 3. Introduction of focused application area 2 Who am I? 1985 1987 1988 1995 1993 1990 Courtesy of Nissan 1992 3 4 Hardware development at RIKEN Special purpose computers MD-GRAPE2 Theoretical Peak In 1989, GRAPE Performance developed by U. Tokyo 100Peta MD-GRAPE3 MDM:MD-GRAPE2 develop by RIKEN 10Peta Japan US GRAPE-DR MDM was integrated in (U.Tokyo/RIKEN) BlueGene/Q installed 1PetaFLOPS RSCC, 2004 MD-GRAPE3 1Peta Planed HPCS MD-GRAPE3 developed in (RIKEN) NLCF BlueGene/L 2004 367TeraFLOPS 100 ASC nd Purple 2 fastest record on MD- Tera Earth Simulator ASC Tsubame 41TeraFLOPS Red (TITech) GRAPE3: Gordon Bell Storm ASCI Q PACS-CS Prize, 2006. 10Tera RSCC ASCI (Tsukuba) Blue Pacific ASCI White (Riken) MD-GRAPE-3 will be ASCI Red integrated in RSCC soon. ASCI Blue Mountain 1Tera *ASCI: Accelerated Strategic CP-PACS Computing Initiative '96 '98 '00 '02 '04 '06 '08 '10 [year] 5 Current RSCC System installed in 2004 at Riken Advance Center for Computing & Communication HDD User User Tape Storage User 20 TB User 200 TB POWER4 Server: 7 Itanium2 Server: 2 Front end Tape Drive: 4 (16) Server Disk Cache: 2TB Myrinet InfiniBand 128Nodes Vector InfiniBand Parallel (256CPUs) Cluster×4 512Nodes (1024CPUs) NEC SX-7/32 Cluster Intel Xeon 3.06GHz 282.5GFLOPS 2GB/Node,Myrinet/IB, 256GB 1.56TFLOPS×4 Intel Xeon 3.06GHz MD-GRAPE3 4GB/Node, IB 64TFLOPS 6.26TFLOPS 6 Basic concept of RSCC Each computer architecture has each advantage and disadvantage There is no single architecture which fits wide variety of applications RSCC is composed of 3 subsystem, Linux cluster with Xeon, Vector Parallel computer and MD-GRAPE3 We made a proposal of the NGSP based on this concept to MEXT 7 Our Proposal Needs of Multi- Needs of Integration of scale Multi- multiple multiple physic simulation computation architecture components Tightly-coupled Present Switch heterogeneous computer Slower connection Faster Faster Faster interconnect interconnect interconnect Proposing Faster Vector Scalar MD architecture interconnect Node Node Node Vector Scalar MD Node Node Node FPGA Node 8 Policy and Outline of A Next Generation Supercomputer Project Purpose of policy: development, installation and application of an advanced high performance supercomputer system, as one of Japan’s “Key Technologies of National Importance” Total Budget: about 115 billion Yen ( 1.15 billion US dollars ) Period of Project: FY2006 – FY2012 9 9 Formation of the 10 Peta R&D Project MEXT: Policy & Funding Office for Supercomputer Development Planning Industries Industrial Forum Project Committee for Promotion of Supercomputing R&D Scheme RIKEN: Project HQ Next-Generation NII: Grid Middleware Advisory Supercomputer R&D Center and Infrastructure (Ryoji Noyori) Evaluation Scheme Board IMS: Nano Science (MEXT and CSTP) Project Leader: Tadashi Watanabe Simulation R&D G.: Ryutaro Himeno Evaluation Riken Wako Institute: Life Science Simulation Committees Visiting Researchers from Univ. & National Labs. (Note) NII: National Institute of Informatics, Computer Companies IMS: Institute for Molecular Science 10 Goals of the Next Generation Supercomputer Project 1. Development and installation of the most advanced high performance supercomputer system 2. Development and wide use of application software to utilize the supercomputer to the maximum extent 3. Provision of flexible computing environment by sharing the next generation supercomputer through connection with other supercomputers located at universities and research institutes 4. Establishment of “Advanced Computational Science and Technology Center (tentative name)” 11 11 Expansion of Highest Computer for Global Usage National Leadership Sustained Performance (FLOPS) Government National Investment Leadership National 100P National Next-next- Infrastructure 100P Leadership next Institute, (The Next Generation Supercomputer) Generation University Project The Next Generation Supercomputer Project Next-next National Infrastructure 1P Generation 1P National Project Institute, Leadership University (Earth Simulator) National Earth Simulator Project Infrastructure Institute, Enterprise National University Company, Leadership 10T Laboratory 10T CP-PACS, Enterprise NWT Company, National Laboratory Infrastructure Enterprise Institute Company, University Laboratory 100G Personal 100G Personal Entertainment Personal National Entertainment PC, Home Server Infrastructure Entertainment Enterprise PC, Home Server Workstation, Institute, PC, Home Server Company, Workstation, Game Machine, Digital TV University Workstation, Game Machine, Digital TV Laboratory Game Machine, Digital TV 1990 2000 2010 2015 2020 2025 12 Schedule of Project FY2006 FY2007 FY2008 FY2009 FY2010 FY2011 FY2012 Conceptual Prototype and Production,Production, installation,installation, Processing unit DetailedDetailed designdesign design evaluation andand adjustmentadjustment System Buildings Front-end unit Basic Basic Detailed design Tuning and improvement (total system design Detailed design Production and evaluation Tuning and improvement software) design Shared file BasicBasic DetailedDetailed designdesign Production,Production, installation,installation, andand adjustmentadjustment system designdesign Applications Next-Generation Integrated Nanoscience Development,Development, production,production, andand evaluationevaluation VerificationVerification Simulation Next-Generation Integrated Development, production, and evaluation Verification Life Simulation Development, production, and evaluation Verification Computer building DesignDesign ConstructionConstruction Research building DesignDesign ConstructionConstruction present 13 Optimization of Computer Design Requirement from Grand Challenges Requirement from Computer Centers 1400 Power, Space 1200 Computer center in 1000 Japan 800 600 Others 設置面積(㎡) 400 Nano Sci. Life Sci. Reliability, operability 200 (Engineering etc) 0 0 500 1000 1500 2000 2500 3000 3500 電力(kW) Cost (development, manufacturing, maintenance Popular numerical scheme FMO MO FVM Fast transform Searching algorism Optimum system Other Project WatchTechnology Survey Real space Linear RISM density function Monte Carlo FEM Solver method Fastest in the world Operation & Utilization 21 Selected Target applications Essential Element technologies Power savving SOI Light transmit. Software LSI process Low-k tech. OS, compiler/// High-k Nano Science Life ScienceEnv./ DesasterEngineering.Physics, Astro. Spin off to the consumer electronics (6) (6) prev. (3) (4) (2) Technology Limit 14 Target applications and BM suit The application software committee started selecting target applications for making a benchmark suit in January, 2006. 21 codes were selected in March and NextBMT suit was developed. 15 NextBMT and Peta-scale BMT(1/2) Peta-scale BMT Fields Program name Description SimFold Prediction of Protein Structure GNISC Inference of Genetic Networks from Experimental Data of Gene Expression MLTest Validation of statistical significance for development of individualized medicine Life (6) MC-Bflow Simulation for blood-flow analysis sievgene/myPresto Docking simulation of protein and drug ProteinDF Ab-initio Molecular Dynamics Calculation in Large-Scale Protein Systems GAMESS/FMO FMO Molecular Orbital Calculation Modylas Massively-parallel multipurpose software for molecular dynamics calculation RSDFT Ab-initio Molecular Dynamics Calculation in Real Space Nono (6) RISM/3D-RISM Analysis of Electron States of Proteins in Solution with the 3D-RISM/FMO Method PHASE First-Principles Molecular Dynamics Simulation within the Plane-Wave Pseudopotential formalism Octa Coarse-Grained Molecular Dynamics Calculation 16 NextBMT and Peta-scale BMT(2/2) Peta-scale BMT LatticeQCD Study of elementary particle and nuclear physics based on Lattice QCD Physics simulation Astronomy NINJA/ASURA Super large-scale gravitational many-body simulation for finding the celestial (2) origin NICAM Nonhydrostatic ICosahedral Atmospheric Model (NICAM) for Global-Cloud Resolving Simulations Geophysics Seism3D Simulation of Seismic-Wave Propagation and Strong Ground Motions (3) COCO Super-resolution Ocean General Circulation Model Cavitation Computation of Unsteady Cavitation Flow by the Finite Difference Method LANS Computation of compressible fluid in aircraft and spacecraft analysis Engineering(4) FrontSTR Structural Calculation by Finite Element Method (FEM) FrontFlow/Blue Unsteady Flow Analysis based on Large Eddy Simulation (LES) 17 System Configuration The Next-Generation Supercomputer is designed as hybrid general-purpose supercomputer that provides the optimum computing environment for a wide range of simulations. •Calculations will be performed in processing units that are suitable for the particular simulation. •Parallel processing in a hybrid configuration of scalar and vector units will make larger and more complex simulations possible. 18 Characteristics Hybrid Supercomputer System composed of Scalar and Vector Units Vector unit is suitable for continuum physics simulation Scalar unit is suitable for Data Base application, particle-based simulation Hybrid system has both strong points for
Recommended publications
  • An Overview of the Blue Gene/L System Software Organization
    An Overview of the Blue Gene/L System Software Organization George Almasi´ , Ralph Bellofatto , Jose´ Brunheroto , Calin˘ Cas¸caval , Jose´ G. ¡ Castanos˜ , Luis Ceze , Paul Crumley , C. Christopher Erway , Joseph Gagliano , Derek Lieber , Xavier Martorell , Jose´ E. Moreira , Alda Sanomiya , and Karin ¡ Strauss ¢ IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598-0218 £ gheorghe,ralphbel,brunhe,cascaval,castanos,pgc,erway, jgaglia,lieber,xavim,jmoreira,sanomiya ¤ @us.ibm.com ¥ Department of Computer Science University of Illinois at Urbana-Champaign Urabana, IL 61801 £ luisceze,kstrauss ¤ @uiuc.edu Abstract. The Blue Gene/L supercomputer will use system-on-a-chip integra- tion and a highly scalable cellular architecture. With 65,536 compute nodes, Blue Gene/L represents a new level of complexity for parallel system software, with specific challenges in the areas of scalability, maintenance and usability. In this paper we present our vision of a software architecture that faces up to these challenges, and the simulation framework that we have used for our experiments. 1 Introduction In November 2001 IBM announced a partnership with Lawrence Livermore National Laboratory to build the Blue Gene/L (BG/L) supercomputer, a 65,536-node machine de- signed around embedded PowerPC processors. Through the use of system-on-a-chip in- tegration [10], coupled with a highly scalable cellular architecture, Blue Gene/L will de- liver 180 or 360 Teraflops of peak computing power, depending on the utilization mode. Blue Gene/L represents a new level of scalability for parallel systems. Whereas existing large scale systems range in size from hundreds (ASCI White [2], Earth Simulator [4]) to a few thousands (Cplant [3], ASCI Red [1]) of compute nodes, Blue Gene/L makes a jump of almost two orders of magnitude.
    [Show full text]
  • Advances in Ultrashort-Pulse Lasers • Modeling Dispersions of Biological and Chemical Agents • Centennial of E
    October 2001 U.S. Department of Energy’s Lawrence Livermore National Laboratory Also in this issue: • More Advances in Ultrashort-Pulse Lasers • Modeling Dispersions of Biological and Chemical Agents • Centennial of E. O. Lawrence’s Birth About the Cover Computing systems leader Greg Tomaschke works at the console of the 680-gigaops Compaq TeraCluster2000 parallel supercomputer, one of the principal machines used to address large-scale scientific simulations at Livermore. The supercomputer is accessible to unclassified program researchers throughout the Laboratory, thanks to the Multiprogrammatic and Institutional Computing (M&IC) Initiative described in the article beginning on p. 4. M&IC makes supercomputers an institutional resource and helps scientists realize the potential of advanced, three-dimensional simulations. Cover design: Amy Henke About the Review Lawrence Livermore National Laboratory is operated by the University of California for the Department of Energy’s National Nuclear Security Administration. At Livermore, we focus science and technology on assuring our nation’s security. We also apply that expertise to solve other important national problems in energy, bioscience, and the environment. Science & Technology Review is published 10 times a year to communicate, to a broad audience, the Laboratory’s scientific and technological accomplishments in fulfilling its primary missions. The publication’s goal is to help readers understand these accomplishments and appreciate their value to the individual citizen, the nation, and the world. Please address any correspondence (including name and address changes) to S&TR, Mail Stop L-664, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, California 94551, or telephone (925) 423-3432. Our e-mail address is [email protected].
    [Show full text]
  • 2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe
    sandia national laboratories 2017 HIGH PERformance computing The 2017 High Performance Computing Annual Report is dedicated to John Noe and Dino Pavlakos. Building a foundational framework Editor in high performance computing Yasmin Dennig Contributing Writers Megan Davidson Sandia National Laboratories has a long history of significant contributions to the high performance computing Mattie Hensley community and industry. Our innovative computer architectures allowed the United States to become the first to break the teraflop barrier—propelling us to the international spotlight. Our advanced simulation and modeling capabilities have been integral in high consequence US operations such as Operation Burnt Frost. Strong partnerships with industry leaders, such as Cray, Inc. and Goodyear, have enabled them to leverage our high performance computing capabilities to gain a tremendous competitive edge in the marketplace. Contributing Editor Laura Sowko As part of our continuing commitment to provide modern computing infrastructure and systems in support of Sandia’s missions, we made a major investment in expanding Building 725 to serve as the new home of high performance computer (HPC) systems at Sandia. Work is expected to be completed in 2018 and will result in a modern facility of approximately 15,000 square feet of computer center space. The facility will be ready to house the newest National Nuclear Security Administration/Advanced Simulation and Computing (NNSA/ASC) prototype Design platform being acquired by Sandia, with delivery in late 2019 or early 2020. This new system will enable continuing Stacey Long advances by Sandia science and engineering staff in the areas of operating system R&D, operation cost effectiveness (power and innovative cooling technologies), user environment, and application code performance.
    [Show full text]
  • Performance Modeling the Earth Simulator and ASCI Q
    PAL CCS-3 Performance Modeling the Earth Simulator and ASCI Q Darren J. Kerbyson Adolfy Hoisie, Harvey J. Wasserman Performance and Architectures Laboratory (PAL) Los Alamos National Laboratory April 2003 Los Alamos PAL CCS-3 “26.58Tflops on AFES … 64.9% of peak (640nodes)” “14.9Tflops on Impact-3D …. 45% of peak (512nodes)” “10.5Tflops on PFES … 44% of peak (376nodes)” 40Tflops 20Tflops Los Alamos PAL Talk Overview CCS-3 G Overview of the Earth Simulator – A (quick) view of the architecture of the Earth Simulator (and Q) – A look at its performance characteristics G Application Centric Performance Models – Method of comparing performance is to use trusted models of applications that we are interested in, e.g. SAGE and Sweep3D. – Analytical / Parameterized in system & application characteristics G Models can be used to provide: – Predicted performance prior to availability (hardware or software) – Insight into performance – Performance Comparison (which is better?) G System Performance Comparison (Earth Simulator vs ASCI Q) Los Alamos PAL Earth Simulator: Overview CCS-3 ... ... RCU RCU . RCU AP AP AP Memory Memory Memory 0-7 0-7 0-7 ... Node0 Node1 Node639 640x640 ... crossbar G 640 Nodes (Vector Processors) G interconnected by a single stage cross-bar – Copper interconnect (~3,000Km wire) G NEC markets a product – SX-6 – Not the same as an Earth Simulator Node - similar but different memory sub-system Los Alamos PAL Earth Simulator Node CCS-3 Crossbar LAN network Disks AP AP AP AP AP AP AP AP 0 1 2 3 4 5 6 7 RCU IOP Node contains: 8 vector processors (AP) 16GByte memory Remote Control Unit (RCU) I/O Processor (IOP) 2 3 4 5 6 0 1 31 M M M M M M M .
    [Show full text]
  • Project Aurora 2017 Vector Inheritance
    Vector Coding a simple user’s perspective Rudolf Fischer NEC Deutschland GmbH Düsseldorf, Germany SIMD vs. vector Input Pipeline Result Scalar SIMD people call it “vector”! Vector SX 2 © NEC Corporation 2017 Data Parallelism ▌‘Vector Loop’, data parallel do i = 1, n Real, dimension(n): a,b,c a(i) = b(i) + c(i) … end do a = b + c ▌‘Scalar Loop’, not data parallel, ex. linear recursion do i = 2, n a(i) = a(i-1) + b(i) end do ▌Reduction? do i = 1, n, VL do i_ = i, min(n,i+VL-1) do i = 1, n s_(i_) = s_(i_) + v(i_)* w(i_) s = s + v(i)* w(i) end do end do end do s = reduction(s_) (hardware!) 3 © NEC Corporation 2017 Vector Coding Paradigm ▌‘Scalar’ thinking / coding: There is a (grid-point,particle,equation,element), what am I going to do with it? ▌‘Vector’ thinking / coding: There is a certain action or operation, to which (grid-points,particles,equations,elements) am I going to apply it simultaneously? 4 © NEC Corporation 2017 Identifying the data-parallel structure ▌ Very simple case: do j = 2, m-1 do i = 2, n-1 rho_new(i,j) = rho_old(i,j) + dt * something_with_fluxes(i+/-1,j+/-1) end do end do ▌ Does it tell us something? Partial differential equations Local theories Vectorization is very “natural” 5 © NEC Corporation 2017 Identifying the data-parallel subset ▌ Simple case: V------> do j = 1, m |+-----> do i = 2, n || a(i,j) = a(i-1,j) + b(i,j) |+----- end do V------ end do ▌ The compiler will vectorise along j ▌ non-unit-stride access, suboptimal ▌ Results for a certain case: Totally scalar (directives): 25.516ms Avoid outer vector loop:
    [Show full text]
  • 2020 Global High-Performance Computing Product Leadership Award
    2020 GLOBAL HIGH-PERFORMANCE COMPUTING PRODUCT LEADERSHIP AWARD Strategic Imperatives Frost & Sullivan identifies three key strategic imperatives that impact the automation industry: internal challenges, disruptive technologies, and innovative business models. Every company that is competing in the automation space is obligated to address these imperatives proactively; failing to do so will almost certainly lead to stagnation or decline. Successful companies overcome the challenges posed by these imperatives and leverage them to drive innovation and growth. Frost & Sullivan’s recognition of NEC is a reflection of how well it is performing against the backdrop of these imperatives. Best Practices Criteria for World-Class Performance Frost & Sullivan applies a rigorous analytical process to evaluate multiple nominees for each award category before determining the final award recipient. The process involves a detailed evaluation of best practices criteria across two dimensions for each nominated company. NEC excels in many of the criteria in the high-performance computing (HPC) space. About NEC Established in 1899, NEC is a global IT, network, and infrastructure solution provider with a comprehensive product portfolio across computing, data storage, embedded systems, integrated IT infrastructure, network products, software, and unified communications. Headquartered in Tokyo, Japan, NEC has been at the forefront of accelerating the industrial revolution of the 20th and 21st © Frost & Sullivan 2021 The Growth Pipeline Company™ centuries by leveraging its technical knowhow and product expertise across thirteen different industries1 in industrial and energy markets. Deeply committed to the vision of orchestrating a better world. NEC envisions a future that embodies the values of safety, security, fairness, and efficiency, thus creating long-lasting social value.
    [Show full text]
  • Hardware Technology of the Earth Simulator 1
    Special Issue on High Performance Computing 27 Architecture and Hardware for HPC 1 Hardware Technology of the Earth Simulator 1 By Jun INASAKA,* Rikikazu IKEDA,* Kazuhiko UMEZAWA,* 5 Ko YOSHIKAWA,* Shitaka YAMADA† and Shigemune KITAWAKI‡ 5 1-1 This paper describes the hardware technologies of the supercomputer system “The Earth Simula- 1-2 ABSTRACT tor,” which has the highest performance in the world and was developed by ESRDC (the Earth 1-3 & 2-1 Simulator Research and Development Center)/NEC. The Earth Simulator has adopted NEC’s leading edge 2-2 10 technologies such as most advanced device technology and process technology to develop a high-speed and high-10 2-3 & 3-1 integrated LSI resulting in a one-chip vector processor. By combining this LSI technology with various advanced hardware technologies including high-density packaging, high-efficiency cooling technology against high heat dissipation and high-density cabling technology for the internal node shared memory system, ESRDC/ NEC has been successful in implementing the supercomputer system with its Linpack benchmark performance 15 of 35.86TFLOPS, which is the world’s highest performance (http://www.top500.org/ : the 20th TOP500 list of the15 world’s fastest supercomputers). KEYWORDS The Earth Simulator, Supercomputer, CMOS, LSI, Memory, Packaging, Build-up Printed Wiring Board (PWB), Connector, Cooling, Cable, Power supply 20 20 1. INTRODUCTION (Main Memory Unit) package are all interconnected with the fine coaxial cables to minimize the distances The Earth Simulator has adopted NEC’s most ad- between them and maximize the system packaging vanced CMOS technologies to integrate vector and density corresponding to the high-performance sys- 25 25 parallel processing functions for realizing a super- tem.
    [Show full text]
  • R00456--FM Getting up to Speed
    GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING Susan L. Graham, Marc Snir, and Cynthia A. Patterson, Editors Committee on the Future of Supercomputing Computer Science and Telecommunications Board Division on Engineering and Physical Sciences THE NATIONAL ACADEMIES PRESS Washington, D.C. www.nap.edu THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Gov- erning Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engi- neering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for ap- propriate balance. Support for this project was provided by the Department of Energy under Spon- sor Award No. DE-AT01-03NA00106. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the organizations that provided support for the project. International Standard Book Number 0-309-09502-6 (Book) International Standard Book Number 0-309-54679-6 (PDF) Library of Congress Catalog Card Number 2004118086 Cover designed by Jennifer Bishop. Cover images (clockwise from top right, front to back) 1. Exploding star. Scientific Discovery through Advanced Computing (SciDAC) Center for Supernova Research, U.S. Department of Energy, Office of Science. 2. Hurricane Frances, September 5, 2004, taken by GOES-12 satellite, 1 km visible imagery. U.S. National Oceanographic and Atmospheric Administration. 3. Large-eddy simulation of a Rayleigh-Taylor instability run on the Lawrence Livermore National Laboratory MCR Linux cluster in July 2003.
    [Show full text]
  • Blue Gene/L Architecture
    Blue Gene/L Blue Gene/L Architecture Burkhard Steinmacher-Burow IBM Watson / Böblingen November 2, 2004, DESY-Zeuthen © 2004 IBM Corporation Blue Gene/L Outline Architecture Motivation Given the motivation, the architecture should seem natural and obvious. Architecture Overview 2 Blue Gene/L Supercomputer Overview | November 2, 2004, DESY-Zeuthen © 2004 IBM Corporation Blue Gene/L What is the Blue Gene/L Project? A 512- to 65536-node highly-integrated supercomputer based on system-on-a-chip technology: Node ASIC. Link ASIC. Strategic partnership with LLNL and other high performance computing centers and researchers: – Focus on numerically intensive scientific problems. – Validation and optimization of architecture based on real applications. – Grand challenge science stresses networks, memory and processing power. – Partners accustomed to "new architectures" and work hard to adapt to constraints. – Partners assist us in the investigation of the reach of this machine. 3 Blue Gene/L Supercomputer Overview | November 2, 2004, DESY-Zeuthen © 2004 IBM Corporation Blue Gene/L BG/L for Capability Computing December 1999: IBM Research announced a 5 year, $100M US, effort to build a petaflop/s scale supercomputer to attack science problems such as protein folding. Goals: Advance the state of the art of scientific simulation. Advance the state of the art in computer design and software for capability and capacity markets. November 2001: Announced Research partnership with Lawrence Livermore National Laboratory (LLNL). November 2002: Announced planned acquisition of a BG/L machine by LLNL as part of the ASCI Purple contract. June 2003: First DD1 chips completed. November 2003: BG/L Half rack DD1 prototype (512 nodes at 500 MHz) ranked #73 on 22nd Top500 List announced at SC2003 (1.435 TFlops/s ).
    [Show full text]
  • Parallel Gaussian Elimination
    Outline • Review Gaussian Elimination (GE) for solving Ax=b • Optimizing GE for caches on sequential machines - using matrix-matrix multiplication (BLAS and LAPACK) CS 267 • Minimizing communication for sequential GE Dense Linear Algebra: - Not LAPACK, but Recursive LU minimizes bandwidth (latency possible) Parallel Gaussian Elimination • Data layouts on parallel machines • Parallel Gaussian Elimination (ScaLAPACK) • Minimizing communication for parallel GE - Not ScaLAPACK (yet), but “Comm-Avoiding LU” (CALU) - Same idea for minimizing bandwidth and latency in sequential case James Demmel • Summarize rest of dense linear algebra • Dynamically scheduled LU for Multicore www.cs.berkeley.edu/~demmel/cs267_Spr14 • LU for Heterogeneous computers (CPU + GPU) 03/04/2014 CS267 Lecture 13 1 03/04/2014 CS267 Lecture 13 2 Summary of Matrix Multiplication Sca/LAPACK Overview • Goal: Multiply n x n matrices C = A·B using O(n3) arithmetic operations, minimizing data movement • Sequential - Assume fast memory of size M < 3n2, count slow mem. refs. - Thm: need Ω(n3/M1/2) slow mem. refs. and Ω(n3/M3/2) messages - Attainable using “blocked” or “recursive” matrix multiply • Parallel - Assume P processors, O(n2/P) data per processor - Thm: need Ω(n2/P1/2) words sent and Ω(P1/2) messages - Attainable by Cannon, nearly by SUMMA • SUMMA used in practice (PBLAS) - c copies of data ⇒ c1/2 times fewer words, c3/2 fewer messages • Which other linear algebra problems can we do with as little data movement? - Today: Solve Ax=b in detail, summarize what’s known, open3
    [Show full text]
  • 75 YEARS Trinity Test the Dawn of America’S Scientific Innovation CONTENTS
    75 YEARS Trinity Test The Dawn of America’s Scientific Innovation CONTENTS 1 THE MANHATTAN PROJECT ........ 4 2 TRINITY TEST - JULY 16, 1945 ...... 6 3 1940s .............................................. 10 4 1950s ..........................................12 5 1960s ..........................................14 1970s ..........................................16 Beyond the advances in nuclear physics 6 and chemistry that made the “ 7 1980s ..........................................18 “ first functional atomic device possible, Trinity was arguably the greatest 1990s ..........................................20 scientific experiment ever conducted. 8 9 2000s ..........................................22 Lisa E. Gordon-Hagerty U.S. Under Secretary of Energy for Nuclear Security Administrator of the National Nuclear Security Administration 10 2010s ..........................................24 11 2020 and FORWARD ..................... 26 1 Manhattan Project: The Origin of the Trinity Test In the 1920s-1930s, a young Hungarian-German physicist Roosevelt responded by launching The Manhattan Project, Leo Szilard led the field of nuclear research, submitting a nationwide network of laboratories and manufacturing patents for a linear accelerator (1928) and cyclotron (1929), facilities designed to collaboratively assist in the before collaborating with Albert Einstein to develop the manufacture of a new atomic weapon. Einstein refrigerator. But when Hitler came into power in 1933, Szilard fled to England, encouraging his friends and family to do the same. In England, he first described the nuclear chain reaction (1933) and patented an early design for a nuclear fission reactor (1934). In 1938, Szilard joined Einstein in the United States, but the rumor that a group of Berlin chemists had split the uranium atom made them so concerned that in 1939, they sent an urgent letter to President Franklin D. Roosevelt, warning him that that Axis scientists were working to turn new nuclear discoveries into a superweapon.
    [Show full text]
  • Supercomputers – Prestige Objects Or Crucial Tools for Science and Industry?
    Supercomputers – Prestige Objects or Crucial Tools for Science and Industry? Hans W. Meuer a 1, Horst Gietl b 2 a University of Mannheim & Prometeus GmbH, 68131 Mannheim, Germany; b Prometeus GmbH, 81245 Munich, Germany; This paper is the revised and extended version of the Lorraine King Memorial Lecture Hans Werner Meuer was invited by Lord Laird of Artigarvan to give at the House of Lords, London, on April 18, 2012. Keywords: TOP500, High Performance Computing, HPC, Supercomputing, HPC Technology, Supercomputer Market, Supercomputer Architecture, Supercomputer Applications, Supercomputer Technology, Supercomputer Performance, Supercomputer Future. 1 e-mail: [email protected] 2 e-mail: [email protected] 1 Content 1 Introduction ..................................................................................................................................... 3 2 The TOP500 Supercomputer Project ............................................................................................... 3 2.1 The LINPACK Benchmark ......................................................................................................... 4 2.2 TOP500 Authors ...................................................................................................................... 4 2.3 The 39th TOP500 List since 1993 .............................................................................................. 5 2.4 The 39th TOP10 List since 1993 ...............................................................................................
    [Show full text]