Japanese Supercomputer Development and Hybrid Accelerated Supercomputing

Total Page:16

File Type:pdf, Size:1020Kb

Japanese Supercomputer Development and Hybrid Accelerated Supercomputing Japanese Supercomputer Development and Hybrid Accelerated Supercomputing Taisuke Boku Director, Center for Computational Sciences University of Tsukuba With courtesy of HPCI and R-CCS (first part) 1 2019/08/27 HPC-AI Advisory Council @ Perth Center for Computational Sciences, Univ. of Tsukuba Agenda n Development and Deployment of Supercomputers in Japan n Tier-1 and Tier-2 systems n Supercomputers in national universities n FUGAKU (Post-K) Computer n Multi-Hybrid Accelerated Supercomputing at U. Tsukuba n Today’s accelerated supercomputing n New concept of multi-hybrid accelerated computing n Combining GPU and FPGA in a system n Programming and applications n Cygnus supercomputer in U. Tsukuba n Conclusions 2 HPC-AI Advisory Council @ Perth 2019/08/27 Center for Computational Sciences, Univ. of Tsukuba Development and Deployment of Supercomputer in Japan 3 2019/08/27 HPC-AI Advisory Council @ Perth Towards Exascale Computing Future • Tier-1: PF Exascale Post K Computer National Flagship Machine 1000 Tier-1 and tier-2 supercomputers -> RikenR-CCS AICS • Originally developed MPP form HPCI and move forward to • K Computer 100 Exascale computing like two wheels • Fugaku (Post-K) Computer • Tier-2: University Supercomputer 10 OFP JCAHPC(U. Tsukuba and Centers U. Tokyo) • Cluster, vector, GPU, etc. 1 Tokyo Tech. • 9 national universities to TSUBAME2.0 procure original systems T2K U. of Tsukuba U. of Tokyo Kyoto U. 2008 2010 2012 2014 2016 2018 2020 4 HPC-AI Advisory Council @ Perth 2019/08/27 HPCI – High Performance Computing Infrastructure in Japan n National networking program to share most of supercomputers in Japan, under MEXT n National flagship supercomputer “K” (and “FUGAKU” in 2021), and all other representative supercomputers in national university supercomputer centers are connected physically and logically n Nation-wide supercomputer sharing program based on proposals (twice in a year) from all kind of computational science and engineering fields reviewed by selection committee and assigned computation resources n Large capacity of shared storage (~250PByte) distributed in two sites to be shared by all HPCI supercomputer facilities connected by 100Gbps class of nation-wide network n “Single sign-on” system (Globus) for easy login and job scheduling among all resources 5 HPC-AI Advisory Council @ Perth 2019/08/27 6 2019/08/27 HPC-AI Advisory Council @ Perth HPCI Tier 2 Systems Roadmap (As of Jun. 2019) Fiscal Year 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 HITACHIHITACHI SR16000/M1SR16000/M1((172TF,172TF, 22TB 22TB) ) Cloud System BS2000 (44TF, 14TB) 3.96 PF (UCC + CFL/M) 0.9 MW Hokkaido Cloud System BS2000 (44TF, 14TB) 35 PF (UCC + Data Science Cloud / Storage HA8000 / WOS7000 (10TF, 1.96PB) 0.16 PF (Cloud) 0.1MW CFL-M) 2MW 100~200 PF, Tohoku SX-ACE(707TF,160TB, 655TB/s) 100-200 PB/s LX406e(31TF), Storage(4PB), 3D Vis, 2MW ~30PF, ~30PB/s Mem BW (CFL-D/CFL-M) ~3MW (CFL-D/CFL-D) ~4 MW HA- PPX1 PPX2 Tsukuba PACS(1166TF) (62TF) (62TF) Cygnus 2.4PF (TPF) 0.4MW PACS-XI 100PF (TPF) COMA (PACS-IX) (1001 TF) Oakforest-PACS (OFP) 25 PF 100+ PF 4.5-6.0MW (UCC + TPF) Fujitsu FX10 (Oakleaf/Oakbridge) (1.27PFlops, 168TiB, 460 TB/s), (UCC + TPF) 3.2 MW 200+ Tokyo Oakbridge-II 4+ PF 1.0MW PF Reedbush-U/H: 1.92 PF (FAC) 0.7MW (Reedbush-Uは2020年6月末まで) BDEC 60+ PF (FAC) 3.5-4.5MW (FAC) Reedbush-L1.4 PF (FAC) 0.2 MW 6.5- Hitachi SR16K/M1 (54.9 TF, 10.9 TiB, 28.7 TB/s) 8.0MW TSUBAME 3.0 (12.15 PF, 1.66 PB/s) Tokyo Tech. TSUBAME 2.5 TSUBAME 4.0 (~100 PF, ~10PB/s, ~2.0MW) (5.7 PF, 110+ TB, 1160 TB/s), 1.4MW Fujitsu FX100 (2.9PF, 81 TiB) (542TF, 100+ PF (FAC/UCC+CFL-M) Fujitsu CX400 (774TF, 71TiB) 71TiB) 20+ PF (FAC/UCC + CFL-M) Nagoya up to 3MW up to 3MW 2MW in Cray:XE6 + total GB8K XC30 Cray XC40(5.5PF) + CS400(1.0PF) 20-40+ PF 80-150+ PF Kyoto (983TF) 1.33 MW (FAC/TPF + UCC) 2 MW Cray XC30 (584TF) (FAC/TPF + UCC) 1.5 MW NEC SX-ACE NEC Express5800 3.2PB/s,15~25Pflop/s, 1.0-1.5MW (CFL-M) 25.6 PB/s, 50-100Pflop/s Osaka (TPF) (423TF) (22.4TF) OCTPUS 1.463PF (UCC) 1.5-2.0MW HA8000 (712TF, 242 TB) SR16000 (8.2TF, 6TB) Fujitsu PRIMERGY CX subsystem A + B, 10.4 PF (UCC/TPF) 2.7MW 100+ PF ~ 2.0MW 3MW Kyushu FX10 (272.4TF, 36 TB) FX10 (FAC/TPF + UCC/TPF) CX400 (966.2 TF, 183TB) (90.8TFLOPS) JAMSTEC SX-ACE(1.3PF, 320TiB) 3MW 100PF, 3MW UV2000 (98TF, ISM 2PF, 0.3MW HPC-AI Advisory Council128TiB) @ Perth 0.3MW 7 2019/08/27 Power is the maximum consumption including cooling FUGAKU (富岳) New National Flagship Machine Slides courtesy by M. Sato of RIKEN R-CCS 8 2019/08/27 HPC-AI Advisory Council @ Perth FLAGSHIP2020 Project p Missions • Building the Japanese national flagship supercomputer Fugaku (a.k. a post K), and • Developing wide range of HPC applications, running on Fugaku, in order to solve social and science issues in Japan p Overview of Fugaku architecture Fujitsu A64FX processor Prototype board Node: Manycore architecture • Armv8-A + SVE (Scalable Vector Extension) p Status and Update • SIMD Length: 512 bits • “Design and Implementation” completed • # of Cores: 48 + (2/4 for OS) (> 2.7 TF / 48 core) • The official contract with Fujitsu to manufacture, ship, • Co-design with application developers and high and install hardware for Fugaku is done memory bandwidth utilizing on-package stacked • RIKEN revealed #nodes > 150K memory (HBM2) 1 TB/s B/W • The Name of the system was decided as “Fugaku” • Low power : 15GF/W (dgemm) • RIKEN announced the Fugaku early access program to Network: TofuD begin around Q2/CY2020 • Chip-Integrated NIC, 6D mesh/torus Interconnect 2019/08/27 HPC-AI Advisory Council @ Perth 9 CPU Architecture: A64FX l Armv8.2-A (AArch64 only) + SVE (Scalable Vector Extension) u “Common” programing model will be to run each l FP64/FP32/FP16 (https://developer.arm.com/products/architecture/a- MPI process on a NUMA node (CMG) with profile/docs) OpenMP-MPI hybrid programming. u 48 threads OpenMP is also supported. l SVE 512-bit wide SIMD l # of Cores: 48 + (2/4 for OS) CMG(Core-Memory-Group): NUMA node 12+1 core l Co-design with application developers and high memory bandwidth utilizing on-package stacked memory: HBM2(32GiB) l Leading-edge Si-technology (7nm FinFET), low power logic design (approx. 15 GF/W (dgemm)), and power-controlling knobs l PCIe Gen3 16 lanes l Peak performance l > 2.7 TFLOPS (>90% @ dgemm) l Memory B/W 1024GB/s (>80% stream) l Byte per Flops: approx. 0.4 HBM2: 8GiB 2019/08/27 HPC-AI Advisory Council @ Perth 10 Peak Performance n HPL & Stream Fugaku K 400+ Pflops n Peak DP 11.3 Pflops > 2.5TF / node for dgemm (double precision) (x34+) Peak SP 800+ Pflops n > 830GB/s /node for stream triad 11.3 Pflops (single precision) (x70+) Peak HP 1600+ Pflops -- (half precision) (x141+) Total memory 150+ PB/sec n 5.2PB/sec Himeno Benchmark (Fortran90) bandwidth (x29+) † “Performance evaluation of a vector supercomputer SX-aurora TSUBASA”, 12 SC18, https://dl.acm.org/citation.cfm?id=3291728 Target Application’s Performance l Performance Targets l 100 times faster than K for some applications (tuning included) https://postk-web.r-ccs.riken.jp/perf.html l 30 to 40 MW power consumption p Predicted Performance of 9 Target Applications As of 2019/05/14 Performance Area Priority Issue Application Brief description Speedup over K 1. Innovative computing infrastructure for drug x125+ GENESIS MD for proteins Health and discovery longevity 2. PersonaliZed and preventive medicine using big Genomon Genome processing data x8+ (Genome alignment) 3. Integrated simulation systems induced by GAMERA Earthquake simulator (FEM in unstructured & structured grid) Disaster earthquake and tsunami x45+ prevention and Environment 4. Meteorological and global environmental NICAM+ Weather prediction system using Big data (structured grid stencil & prediction using big data x120+ LETKF ensemble Kalman filter) 5. New technologies for energy creation, conversion Molecular electronic / storage, and use x40+ NTChem (structure calculation) Energy issue 6. Accelerated development of innovative clean Computational Mechanics System for Large Scale Analysis and energy systems x35+ Adventure Design (unstructured grid) 7. Creation of new functional devices and high- Ab-initio program Industrial performance materials x30+ RSDFT (density functional theory) competitivenes 8. Development of innovative design and production s enhancement Large Eddy Simulation (unstructured grid) processes x25+ FFB 9. Elucidation of the fundamental laws and Basic science LQCD Lattice QCD simulation (structured grid Monte Carlo) 14 evolution of the universe x25+ Performance study using Post-K simulator l We have been developing a cycle-level simulator for the post-K processor using gem5. l Collaboration with U. Tsukuba l Kernel evaluation using single core Post-K KNL Simulator Execution time 4.2 5.5 [msec] Number of L1D 29569 ー l 1.3 times faster than KNL per core misses L1D miss rate 1.19% ー l With further optimization (inst. scheduling) exec 3.4 msec by time reduced to 3.4 msec (1.6 times faster) Number of L2 misses 20 ー further l This is the evaluation on L1. OpenMP Multicore L2 miss rate 0.01% optimizationー execution will be much faster due to HBM 17 memory Fugaku prototype board and rack l “Fujitsu Completes Post-K Supercomputer CPU Prototype, Begins Functionality Trials”, HPCwire June 21, 2018 HBM2 60mm 60mm Wa t e r Wa t e r AOC QSFP2 8 ( Z) AOC QSFP2 8 ( Y) AOC QSFP2 8 ( X) Electrical signals Shelf: 48 CPUs (24 CMU) Rack: 8 shelves = 384 CPUs (8x48) 2 CPU / CMU 18 CPU Die Photo: by Fujitsu Ltd.
Recommended publications
  • Spectroscopy of the Candidate Luminous Blue Variable at the Center
    A&A manuscript no. ASTRONOMY (will be inserted by hand later) AND Your thesaurus codes are: 06 (08.03.4; 08.05.1; 08.05.3; 08.09.2; 08.13.2; 08.22.3) ASTROPHYSICS Spectroscopy of the candidate luminous blue variable at the center of the ring nebula G79.29+0.46 R.H.M. Voors1,2⋆, T.R. Geballe3, L.B.F.M. Waters4,5, F. Najarro6, and H.J.G.L.M. Lamers1,2 1 Astronomical Institute, University of Utrecht, Princetonplein 5, 3508 TA Utrecht, The Netherlands 2 SRON Laboratory for Space Research, Sorbonnelaan 2, NL-3584 CA Utrecht, The Netherlands 3 Gemini Observatory, 670 N. A’ohoku Place, Hilo, Hawaii 96720, USA 4 Astronomical Institute ’Anton Pannekoek’, University of Amsterdam, Kruislaan 403, NL-1098 SJ Amsterdam, The Nether- lands 5 SRON Laboratory for Space Research, P.O. Box 800, NL-9700 AV Groningen, The Netherlands 6 CSIC Instituto de Estructura de la Materia, Dpto. Fisica Molecular, C/Serrano 121, E-28006 Madrid, Spain Received date: 23 March 2000; accepted date Abstract. We report optical and near-infrared spectra of Luminous Blue Variable (LBV) stage (Conti 1984), these the central star of the radio source G79.29+0.46, a can- stars lose a large amount of mass in a short time interval didate luminous blue variable. The spectra contain nu- (e.g. Chiosi & Maeder 1986). The identifying characteris- merous narrow (FWHM < 100 kms−1) emission lines of tics of an LBV in addition to its blue colors are (1) a mass −5 −1 which the low-lying hydrogen lines are the strongest, and loss rate of (∼ 10 M⊙ yr ), (2) a low wind velocity of resemble spectra of other LBVc’s and B[e] supergiants.
    [Show full text]
  • File System and Power Management Enhanced for Supercomputer Fugaku
    File System and Power Management Enhanced for Supercomputer Fugaku Hideyuki Akimoto Takuya Okamoto Takahiro Kagami Ken Seki Kenichirou Sakai Hiroaki Imade Makoto Shinohara Shinji Sumimoto RIKEN and Fujitsu are jointly developing the supercomputer Fugaku as the successor to the K computer with a view to starting public use in FY2021. While inheriting the software assets of the K computer, the plan for Fugaku is to make improvements, upgrades, and functional enhancements in various areas such as computational performance, efficient use of resources, and ease of use. As part of these changes, functions in the file system have been greatly enhanced with a focus on usability in addition to improving performance and capacity beyond that of the K computer. Additionally, as reducing power consumption and using power efficiently are issues common to all ultra-large-scale computer systems, power management functions have been newly designed and developed as part of the up- grading of operations management software in Fugaku. This article describes the Fugaku file system featuring significantly enhanced functions from the K computer and introduces new power management functions. 1. Introduction application development environment are taken up in RIKEN and Fujitsu are developing the supercom- separate articles [1, 2], this article focuses on the file puter Fugaku as the successor to the K computer and system and operations management software. The are planning to begin public service in FY2021. file system provides a high-performance and reliable The Fugaku is composed of various kinds of storage environment for application programs and as- system software for supporting the execution of su- sociated data.
    [Show full text]
  • Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
    Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators Toshio Endo Akira Nukada Graduate School of Information Science and Engineering Global Scientific Information and Computing Center Tokyo Institute of Technology Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Satoshi Matsuoka Naoya Maruyama Global Scientific Information and Computing Center Global Scientific Information and Computing Center Tokyo Institute of Technology/National Institute of Informatics Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Abstract—We report Linpack benchmark results on the Roadrunner or other systems described above, it includes TSUBAME supercomputer, a large scale heterogeneous system two types of accelerators. This is due to incremental upgrade equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD of the system, which has been the case in commodity CPU accelerators. With all of 10,480 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, clusters; they may have processors with different speeds as we have achieved 87.01TFlops, which is the third record as a result of incremental upgrade. In this paper, we present a heterogeneous system in the world. This paper describes a Linpack implementation and evaluation results on TSUB- careful tuning and load balancing method required to achieve AME with 10,480 Opteron cores, 624 Tesla GPUs and 648 this performance. On the other hand, since the peak speed is ClearSpeed accelerators. In the evaluation, we also used a 163 TFlops, the efficiency is 53%, which is lower than other systems.
    [Show full text]
  • 2020 ALCF Science Report
    ARGONNE LEADERSHIP 2020 COMPUTING FACILITY Science On the cover: A snapshot of a visualization of the SARS-CoV-2 viral envelope comprising 305 million atoms. A multi-institutional research team used multiple supercomputing resources, including the ALCF’s Theta system, to optimize codes in preparation for large-scale simulations of the SARS-CoV-2 spike protein that were recognized with the ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. Image: Rommie Amaro, Lorenzo Casalino, Abigail Dommer, and Zied Gaieb, University of California San Diego 2020 SCIENCE CONTENTS 03 Message from ALCF Leadership 04 Argonne Leadership Computing Facility 10 Advancing Science with HPC 06 About ALCF 12 ALCF Resources Contribute to Fight Against COVID-19 07 ALCF Team 16 Edge Services Propel Data-Driven Science 08 ALCF Computing Resources 18 Preparing for Science in the Exascale Era 26 Science 28 Accessing ALCF GPCNeT: Designing a Benchmark 43 Materials Science 51 Physics Resources for Science Suite for Inducing and Measuring Constructing and Navigating Hadronic Light-by-Light Scattering Contention in HPC Networks Polymorphic Landscapes of and Vacuum Polarization Sudheer Chunduri Molecular Crystals Contributions to the Muon 30 2020 Science Highlights Parallel Relational Algebra for Alexandre Tkatchenko Anomalous Magnetic Moment Thomas Blum 31 Biological Sciences Logical Inferencing at Scale Data-Driven Materials Sidharth Kumar Scalable Reinforcement-Learning- Discovery for Optoelectronic The Last Journey Based Neural Architecture Search Applications
    [Show full text]
  • BRAS Newsletter August 2013
    www.brastro.org August 2013 Next meeting Aug 12th 7:00PM at the HRPO Dark Site Observing Dates: Primary on Aug. 3rd, Secondary on Aug. 10th Photo credit: Saturn taken on 20” OGS + Orion Starshoot - Ben Toman 1 What's in this issue: PRESIDENT'S MESSAGE....................................................................................................................3 NOTES FROM THE VICE PRESIDENT ............................................................................................4 MESSAGE FROM THE HRPO …....................................................................................................5 MONTHLY OBSERVING NOTES ....................................................................................................6 OUTREACH CHAIRPERSON’S NOTES .........................................................................................13 MEMBERSHIP APPLICATION .......................................................................................................14 2 PRESIDENT'S MESSAGE Hi Everyone, I hope you’ve been having a great Summer so far and had luck beating the heat as much as possible. The weather sure hasn’t been cooperative for observing, though! First I have a pretty cool announcement. Thanks to the efforts of club member Walt Cooney, there are 5 newly named asteroids in the sky. (53256) Sinitiere - Named for former BRAS Treasurer Bob Sinitiere (74439) Brenden - Named for founding member Craig Brenden (85878) Guzik - Named for LSU professor T. Greg Guzik (101722) Pursell - Named for founding member Wally Pursell
    [Show full text]
  • Tsubame 2.5 Towards 3.0 and Beyond to Exascale
    Being Very Green with Tsubame 2.5 towards 3.0 and beyond to Exascale Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology ACM Fellow / SC13 Tech Program Chair NVIDIA Theater Presentation 2013/11/19 Denver, Colorado TSUBAME2.0 NEC Confidential TSUBAME2.0 Nov. 1, 2010 “The Greenest Production Supercomputer in the World” TSUBAME 2.0 New Development >600TB/s Mem BW 220Tbps NW >12TB/s Mem BW >400GB/s Mem BW >1.6TB/s Mem BW Bisecion BW 80Gbps NW BW 35KW Max 1.4MW Max 32nm 40nm ~1KW max 3 Performance Comparison of CPU vs. GPU 1750 GPU 200 GPU ] 1500 160 1250 GByte/s 1000 120 750 80 500 CPU CPU 250 40 Peak Performance [GFLOPS] Performance Peak 0 Memory Bandwidth [ Bandwidth Memory 0 x5-6 socket-to-socket advantage in both compute and memory bandwidth, Same power (200W GPU vs. 200W CPU+memory+NW+…) NEC Confidential TSUBAME2.0 Compute Node 1.6 Tflops Thin 400GB/s Productized Node Mem BW as HP 80GBps NW ProLiant Infiniband QDR x2 (80Gbps) ~1KW max SL390s HP SL390G7 (Developed for TSUBAME 2.0) GPU: NVIDIA Fermi M2050 x 3 515GFlops, 3GByte memory /GPU CPU: Intel Westmere-EP 2.93GHz x2 (12cores/node) Multi I/O chips, 72 PCI-e (16 x 4 + 4 x 2) lanes --- 3GPUs + 2 IB QDR Memory: 54, 96 GB DDR3-1333 SSD:60GBx2, 120GBx2 Total Perf 2.4PFlops Mem: ~100TB NEC Confidential SSD: ~200TB 4-1 2010: TSUBAME2.0 as No.1 in Japan > All Other Japanese Centers on the Top500 COMBINED 2.3 PetaFlops Total 2.4 Petaflops #4 Top500, Nov.
    [Show full text]
  • Miniature Exoplanet Radial Velocity Array I: Design, Commissioning, and Early Photometric Results
    Miniature Exoplanet Radial Velocity Array I: design, commissioning, and early photometric results Jonathan J. Swift Steven R. Gibson Michael Bottom Brian Lin John A. Johnson Ming Zhao Jason T. Wright Paul Gardner Nate McCrady Emilio Falco Robert A. Wittenmyer Stephen Criswell Peter Plavchan Chantanelle Nava Reed Riddle Connor Robinson Philip S. Muirhead David H. Sliski Erich Herzig Richard Hedrick Justin Myles Kevin Ivarsen Cullen H. Blake Annie Hjelstrom Jason Eastman Jon de Vera Thomas G. Beatty Andrew Szentgyorgyi Stuart I. Barnes Downloaded From: http://astronomicaltelescopes.spiedigitallibrary.org/ on 05/21/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx Journal of Astronomical Telescopes, Instruments, and Systems 1(2), 027002 (Apr–Jun 2015) Miniature Exoplanet Radial Velocity Array I: design, commissioning, and early photometric results Jonathan J. Swift,a,*,† Michael Bottom,a John A. Johnson,b Jason T. Wright,c Nate McCrady,d Robert A. Wittenmyer,e Peter Plavchan,f Reed Riddle,a Philip S. Muirhead,g Erich Herzig,a Justin Myles,h Cullen H. Blake,i Jason Eastman,b Thomas G. Beatty,c Stuart I. Barnes,j,‡ Steven R. Gibson,k,§ Brian Lin,a Ming Zhao,c Paul Gardner,a Emilio Falco,l Stephen Criswell,l Chantanelle Nava,d Connor Robinson,d David H. Sliski,i Richard Hedrick,m Kevin Ivarsen,m Annie Hjelstrom,n Jon de Vera,n and Andrew Szentgyorgyil aCalifornia Institute of Technology, Departments of Astronomy and Planetary Science, 1200 E. California Boulevard, Pasadena, California 91125, United States bHarvard-Smithsonian Center for Astrophysics, Cambridge, Massachusetts 02138, United States cThe Pennsylvania State University, Department of Astronomy and Astrophysics, Center for Exoplanets and Habitable Worlds, 525 Davey Laboratory, University Park, Pennsylvania 16802, United States dUniversity of Montana, Department of Physics and Astronomy, 32 Campus Drive, No.
    [Show full text]
  • Biology at the Exascale
    Biology at the Exascale Advances in computational hardware and algorithms that have transformed areas of physics and engineering have recently brought similar benefits to biology and biomedical research. Contributors: Laura Wolf and Dr. Gail W. Pieper, Argonne National Laboratory Biological sciences are undergoing a revolution. High‐performance computing has accelerated the transition from hypothesis‐driven to design‐driven research at all scales, and computational simulation of biological systems is now driving the direction of biological experimentation and the generation of insights. As recently as ten years ago, success in predicting how proteins assume their intricate three‐dimensional forms was considered highly unlikely if there was no related protein of known structure. For those proteins whose sequence resembles a protein of known structure, the three‐dimensional structure of the known protein can be used as a “template” to deduce the unknown protein structure. At the time, about 60 percent of protein sequences arising from the genome sequencing projects had no homologs of known structure. In 2001, Rosetta, a computational technique developed by Dr. David Baker and colleagues at the Howard Hughes Medical Institute, successfully predicted the three‐dimensional structure of a folded protein from its linear sequence of amino acids. (Baker now develops tools to enable researchers to test new protein scaffolds, examine additional structural hypothesis regarding determinants of binding, and ultimately design proteins that tightly bind endogenous cellular proteins.) Two years later, a thirteen‐year project to sequence the human genome was declared a success, making available to scientists worldwide the billions of letters of DNA to conduct postgenomic research, including annotating the human genome.
    [Show full text]
  • (Intel® OPA) for Tsubame 3
    CASE STUDY High Performance Computing (HPC) with Intel® Omni-Path Architecture Tokyo Institute of Technology Chooses Intel® Omni-Path Architecture for Tsubame 3 Price/performance, thermal stability, and adaptive routing are key features for enabling #1 on Green 500 list Challenge How do you make a good thing better? Professor Satoshi Matsuoka of the Tokyo Institute of Technology (Tokyo Tech) has been designing and building high- performance computing (HPC) clusters for 20 years. Among the systems he and his team at Tokyo Tech have architected, Tsubame 1 (2006) and Tsubame 2 (2010) have shown him the importance of heterogeneous HPC systems for scientific research, analytics, and artificial intelligence (AI). Tsubame 2, built on Intel® Xeon® Tsubame at a glance processors and Nvidia* GPUs with InfiniBand* QDR, was Japan’s first peta-scale • Tsubame 3, the second- HPC production system that achieved #4 on the Top500, was the #1 Green 500 generation large, production production supercomputer, and was the fastest supercomputer in Japan at the time. cluster based on heterogeneous computing at Tokyo Institute of Technology (Tokyo Tech); #61 on June 2017 Top 500 list and #1 on June 2017 Green 500 list For Matsuoka, the next-generation machine needed to take all the goodness of Tsubame 2, enhance it with new technologies to not only advance all the current • The system based upon HPE and latest generations of simulation codes, but also drive the latest application Apollo* 8600 blades, which targets—which included deep learning/machine learning, AI, and very big data are smaller than a 1U server, analytics—and make it more efficient that its predecessor.
    [Show full text]
  • NI\S/\ \\\\\\\\\ \\\\ \\\\ \\\\\ \\\\\ \\\\\ \\\\\ \\\\ \\\\ ' NF00991 ) NASA Technical Memorandum 86169
    NASA-TM-8616919850010600 NASA Technical Memorandum 86169'----------/ X-Ray Spectra of Supernova Remnants Andrew E. Szymkowiak FEBRUARY 1985 LIBRARY COpy ;:. t- q', IJ~/) LANGLEY RESEARCY CENTER LIBRARY, NASA HAMPTON, VIRGINIA NI\S/\ \\\\\\\\\ \\\\ \\\\ \\\\\ \\\\\ \\\\\ \\\\\ \\\\ \\\\ ' NF00991 ) NASA Technical Memorandum 86169 X-Ray Spectra of Supernova Remnants Andrew E. Szymkowiak University of Maryland College Park, Maryland NI\S/\ National Aeronautics and Space Administration Scientific and Technical Information Branch 1985 This Page Intentionally Left Blank iii TABLE OF CONTENTS Table of r,ontents............................................... iii I. Introductlon................................................. 1 II. Observational and Theoretical Background.................... 4 III. Theory for the X-ray Emission.............................. 14 IV. Experiment and Analysis Description......................... 33 V. Variations in X-ray Spectra Across Puppis A.................. 46 VI. X-ray Spectra of Two Young Remnants......................... 70 VII. Future Prospects........................................... 93 VIII. Recapitulation............................................ 99 Bibliography.................................................... 102 1 I. Introduct10n The advent of astronom1cal spectroscopy prov1ded much of the impetus for the creation of astrophysics, a bridge between phYS1CS and astronomy. The ability to study composition and temperature of remote objects allowed the transition from a field mainly concerned with
    [Show full text]
  • BDEC2 Poznan Japanese HPC Infrastructure Update
    BDEC2 Poznan Japanese HPC Infrastructure Update Presenter: Masaaki Kondo, Riken R-CCS/Univ. of Tokyo (on behalf of Satoshi Matsuoka, Director, Riken R-CCS) 1 Post-K: The Game Changer (2020) 1. Heritage of the K-Computer, HP in simulation via extensive Co-Design • High performance: up to x100 performance of K in real applications • Multitudes of Scientific Breakthroughs via Post-K application programs • Simultaneous high performance and ease-of-programming 2. New Technology Innovations of Post-K Global leadership not just in High Performance, esp. via high memory BW • the machine & apps, but as Performance boost by “factors” c.f. mainstream CPUs in many HPC & Society5.0 apps via BW & Vector acceleration cutting edge IT • Very Green e.g. extreme power efficiency Ultra Power efficient design & various power control knobs • Arm Global Ecosystem & SVE contribution Top CPU in ARM Ecosystem of 21 billion chips/year, SVE co- design and world’s first implementation by Fujitsu High Perf. on Society5.0 apps incl. AI • ARM: Massive ecosystem Architectural features for high perf on Society 5.0 apps from embedded to HPC based on Big Data, AI/ML, CAE/EDA, Blockchain security, etc. Technology not just limited to Post-K, but into societal IT infrastructures e.g. Clouds 2 Arm64fx & Post-K (to be renamed) Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU HPC Optimized: Extremely high package high memory BW (1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS (~3Teraflops), various AI support (FP16, INT8, etc.) Gen purpose CPU – Linux,
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]