2. the IBM Blue Gene/P Supercomputer

Total Page:16

File Type:pdf, Size:1020Kb

2. the IBM Blue Gene/P Supercomputer Introduction to HPC Programming 2. The IBM Blue Gene/P Supercomputer Valentin Pavlov <[email protected]> About these lectures • This is the second of series of six introductory lectures discussing the field of High-Performance Computing; • The intended audience of the lectures are high-school students with some programming experience (preferrably using the C programming language) having interests in scientific studies, e.g. physics, chemistry, biology, etc. • This lecture provides an overview of the IBM Blue Gene/P supercomputer’s architecture, along with some practical advices about its usage. What does “super-” mean? • When talking about computers, the prefix “super-” does not have the same meaning as when talking about people (e.g. Superman); • The analogy is closer to that of a supermarket – a market that sells a lot of different articles; • Thus, a supercomputer is not to be tought a priori as a very powerful computer, but simply as a collection of a lot of ordinary computers. What does “super-” mean? • Everyone with a spare several thousand euro can build an in-house supercomputer out of a lot of cheap components (think Raspberry Pi) which would in principle be not much different than high-end supercomputers, only slower. • Most of the information in these lectures is applicable to such ad-hoc supercomputers, clusters, etc. • In this lecture we’ll have a look at the architecture of a real supercomputer, the IBM Blue Gene/P, and also discuss the differences with the new version of this architecture, IBM Blue Gene/Q. IBM Blue Gene/P • IBM Blue Gene/P is a modular hybrid parallel system. • Its basic module is called a “rack” and a certain configuration can have from 1 to 72 racks. • In the full 72 racks configuration, the theoretical peak performance of the system is around 1 PFLOPS; • Detailed information about system administration and application programming of this machine is available online from the IBM RedBooks publication series, e.g. http://www.redbooks.ibm.com/abstracts/sg247287.html The IBM Blue Gene/P @ NCSA, Sofia • The Bulgarian Supercomputing Center in Sofia operates and provides access to an IBM Blue Gene/P configuration that consists of 2,048 Computing Nodes, having a total of 8,192 PowerPC cores @ 850 MHz and 4TB of RAM; • The connection of the Computing Nodes with the rest of the system is through 16 10 Gb/s channels; • Its theoretical performance is 27.85 TFLOPS; • Its energy efficiency is 371.67 MFLOPS/W; • When it was put into operation in 2008, it was ranked 126-th in the world in the http://top500.org list. Why a supercomputer? • This supercomputer is not much different than a network of 2,000 ordinary computers (cluster), or let’s say 40 different clusters of 50 machines; • So why bother with a supercomputer? Because it offers several distinctive advantages: • Energy efficient – the maximum power consumption of the system at full utilization is 75 kWh; This might seem a lot, but is several times less than 2,000 ordinary computers. • Small footprint – it fits in a small room, while 2,000 PCs would probably occupy a football stadium. 40 clusters of 50 machines would occupy 40 different rooms. Why a supercomputer? • Transparent high-speed and highly available network – the mass of cables and devices that interconnect 2,000 PCs would be a nightmarish mess; • Standard programming interfaces (MPI and OpenMP) – the same would be used on clusters. So, a software for the cluster would work on the supercomputer, too (at least in principle); • High scalability to thousands of cores – in the 40 different clusters scenarios each cluster is small and cannot run extra large jobs; Why a supercomputer? • High availability at lower price – built as an integrated unit from the start, it breaks a lot less than 2,000 ordinary computers would. Moreover, it can be operated by a small team of staff, as opposed to 40 different teams in the many clusters scenario. • Better utilization, compared to the 40 clusters scenario. The centralized management allows different teams of researchers to use the processing power in a shared resource manner, which would be very hard to do if the clusters were owned by different groups. IBM Blue Gene/P Hardware Organization Figure: IBM Blue Gene/P – from the CPU to the full system (Source: IBM) Compute Nodes (CN) • The processing power of the supercomputer stems from the multitude of Compute Nodes (CNs). There are 1,024 CNs in a rack which totals to 73,728 CNs in a full configuration. • Each CN contains a quad-core PowerPC @ 850 MHz with dual FPU (called “double hummer”) and 2 GB RAM. • Ideally, each core can process 4 instructions per cycle, thus performing at 850 × 4 = 3400 MFLOPS. Multiplied by the number of cores, this brings the performance of a single CN to 4 × 3:4 = 13:6 GFLOPS. Compute Nodes (CN) • The theoretical performance of the whole system is thus 73728 × 13:6 = 1002700:8 GFLOPS = 1:0027008 PFLOPS • Each CN has 4 cores and behave as a shared memory machine with regards to the 2 GB of RAM on the node; • The cores on one CN does not have access to the memory of another CN, so the collection of CNs behave as a distributed memory machine; • Thus, the machine has hybrid organization – distributed memory between nodes and shared memory within the same node. Connectivity • Each CN is directly connected to its immediate neighbours in all 3 directions; • Communications between non-neighbouring nodes involve at least one node that apart from computation is also involved in forwarding network traffic, which brings down its performance. • The whole system looks like a 3D MESH, but in order to reduce the amount of forwarding it can also be configured as a 3D TORUS – a 4D figure in which the ends of the mesh in each of the 3 directions are connected to each other. Connectivity • The advantage of the torus is that it halves the amount of forwarding necessary, since the longest distance is now half the number of nodes in each direction. • The connectivity with the rest of the system is achived through special Input/Output Nodes (IONs); • Each Node Card (32 CNs) has 1 ION through which the CNs access shared disk storage and the rest of the components of the system via 10 GB/s network; • There are other specialized networks, e.g. for collective communications, etc. Supporting Hardware • Apart from the racks containing the CNs, the supercomputer configuration includes several other components, the most important of them being: • Front-End Nodes (FENs) – a collection of servers to which the users connect remotely using secure shell protocol. In the BGP configuration they are PowerPC 64bit machines running SuSE Linux Enterprise Server 10 (SLES 10); • Service Node (SN) – a backend service node that manages and orchestrates the work of the whole machine. It is off premises for the end users and only administrators have access to it; Supporting Hardware • File Servers (FS) – several servers that run distributed file system which is exported and seen by both the CNs and the FENs. The home directories of the users are stored on this distributed file system and this is where all input and output goes. • Shared Storage library – disk enclosures containing the physical HDDs over which the distributed file system spans. Software features—cross-compilation • In contrast to some other supercomputers and clusters, Blue Gene has two distinct sets of computing devices: CNs—the actual work horses; and FENs—the machines to which the users have direct access. • CNs and FENs are not binary compatible—a program that is compiled to run on the FEN cannot run on the CNs and vice versa. • This puts the users in a situation in which they have to compile their programs on the FEN (since they only have access to it), but the programs must be able to run on the CNs. This is called cross-compilation. Software features—batch execution • Since cross-compiled programs cannot run on the FEN, users cannot execute them directly—they need some way to post a program for execution. • This is called batch job execution. The users prepare what is called ’job control file’ (JCF) in which the specifics of the job are stated and submit the job to a resource scheduler queue. When the resource scheduler finds free resources that can execute the job, it is sent to the corresponding CNs; • Blue Gene/P uses TWS LoadLeveler (LL) as resource scheduler; Software features—batch execution • Important consequence of the batch execution is that programs better not be interactive. • While it is possible to come up with some sophisticated mechanism to wait on the queue and perform redirection in order to allow interactivity, it is not desirable, since its one cannot predict exactly when the program will be run. • And when it does run and waits for user input, and the user is not there, the CNs will idly waste time and power. • Thus, all parameters of the programs must be passed via configuration files, command line options or some other way, but not via user interaction. Partitions • The multitude of CNs is divided in “partitions” (or “blocks”) and the smallest partition depends on the exact machine configuration, but is usually 32 nodes (on the machine in Sofia the smallest partition is 128 nodes1); • 1 A partition that encompases 2 rack (512 CNs) is called ’midplane’ and is the smallest partition for which TORUS network topology can be chosen; • When LL starts a job, it dynamically creates a correspondingly sized partition for it. After the job terminates, the partition is destroyed.
Recommended publications
  • The Reliability Wall for Exascale Supercomputing Xuejun Yang, Member, IEEE, Zhiyuan Wang, Jingling Xue, Senior Member, IEEE, and Yun Zhou
    . IEEE TRANSACTIONS ON COMPUTERS, VOL. *, NO. *, * * 1 The Reliability Wall for Exascale Supercomputing Xuejun Yang, Member, IEEE, Zhiyuan Wang, Jingling Xue, Senior Member, IEEE, and Yun Zhou Abstract—Reliability is a key challenge to be understood to turn the vision of exascale supercomputing into reality. Inevitably, large-scale supercomputing systems, especially those at the peta/exascale levels, must tolerate failures, by incorporating fault- tolerance mechanisms to improve their reliability and availability. As the benefits of fault-tolerance mechanisms rarely come without associated time and/or capital costs, reliability will limit the scalability of parallel applications. This paper introduces for the first time the concept of “Reliability Wall” to highlight the significance of achieving scalable performance in peta/exascale supercomputing with fault tolerance. We quantify the effects of reliability on scalability, by proposing a reliability speedup, defining quantitatively the reliability wall, giving an existence theorem for the reliability wall, and categorizing a given system according to the time overhead incurred by fault tolerance. We also generalize these results into a general reliability speedup/wall framework by considering not only speedup but also costup. We analyze and extrapolate the existence of the reliability wall using two representative supercomputers, Intrepid and ASCI White, both employing checkpointing for fault tolerance, and have also studied the general reliability wall using Intrepid. These case studies provide insights on how to mitigate reliability-wall effects in system design and through hardware/software optimizations in peta/exascale supercomputing. Index Terms—fault tolerance, exascale, performance metric, reliability speedup, reliability wall, checkpointing. ! 1INTRODUCTION a revolution in computing at a greatly accelerated pace [2].
    [Show full text]
  • 2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe
    sandia national laboratories 2017 HIGH PERformance computing The 2017 High Performance Computing Annual Report is dedicated to John Noe and Dino Pavlakos. Building a foundational framework Editor in high performance computing Yasmin Dennig Contributing Writers Megan Davidson Sandia National Laboratories has a long history of significant contributions to the high performance computing Mattie Hensley community and industry. Our innovative computer architectures allowed the United States to become the first to break the teraflop barrier—propelling us to the international spotlight. Our advanced simulation and modeling capabilities have been integral in high consequence US operations such as Operation Burnt Frost. Strong partnerships with industry leaders, such as Cray, Inc. and Goodyear, have enabled them to leverage our high performance computing capabilities to gain a tremendous competitive edge in the marketplace. Contributing Editor Laura Sowko As part of our continuing commitment to provide modern computing infrastructure and systems in support of Sandia’s missions, we made a major investment in expanding Building 725 to serve as the new home of high performance computer (HPC) systems at Sandia. Work is expected to be completed in 2018 and will result in a modern facility of approximately 15,000 square feet of computer center space. The facility will be ready to house the newest National Nuclear Security Administration/Advanced Simulation and Computing (NNSA/ASC) prototype Design platform being acquired by Sandia, with delivery in late 2019 or early 2020. This new system will enable continuing Stacey Long advances by Sandia science and engineering staff in the areas of operating system R&D, operation cost effectiveness (power and innovative cooling technologies), user environment, and application code performance.
    [Show full text]
  • Safety and Security Challenge
    SAFETY AND SECURITY CHALLENGE TOP SUPERCOMPUTERS IN THE WORLD - FEATURING TWO of DOE’S!! Summary: The U.S. Department of Energy (DOE) plays a very special role in In fields where scientists deal with issues from disaster relief to the keeping you safe. DOE has two supercomputers in the top ten supercomputers in electric grid, simulations provide real-time situational awareness to the whole world. Titan is the name of the supercomputer at the Oak Ridge inform decisions. DOE supercomputers have helped the Federal National Laboratory (ORNL) in Oak Ridge, Tennessee. Sequoia is the name of Bureau of Investigation find criminals, and the Department of the supercomputer at Lawrence Livermore National Laboratory (LLNL) in Defense assess terrorist threats. Currently, ORNL is building a Livermore, California. How do supercomputers keep us safe and what makes computing infrastructure to help the Centers for Medicare and them in the Top Ten in the world? Medicaid Services combat fraud. An important focus lab-wide is managing the tsunamis of data generated by supercomputers and facilities like ORNL’s Spallation Neutron Source. In terms of national security, ORNL plays an important role in national and global security due to its expertise in advanced materials, nuclear science, supercomputing and other scientific specialties. Discovery and innovation in these areas are essential for protecting US citizens and advancing national and global security priorities. Titan Supercomputer at Oak Ridge National Laboratory Background: ORNL is using computing to tackle national challenges such as safe nuclear energy systems and running simulations for lower costs for vehicle Lawrence Livermore's Sequoia ranked No.
    [Show full text]
  • LLNL Computation Directorate Annual Report (2014)
    PRODUCTION TEAM LLNL Associate Director for Computation Dona L. Crawford Deputy Associate Directors James Brase, Trish Damkroger, John Grosh, and Michel McCoy Scientific Editors John Westlund and Ming Jiang Art Director Amy Henke Production Editor Deanna Willis Writers Andrea Baron, Rose Hansen, Caryn Meissner, Linda Null, Michelle Rubin, and Deanna Willis Proofreader Rose Hansen Photographer Lee Baker LLNL-TR-668095 3D Designer Prepared by LLNL under Contract DE-AC52-07NA27344. Ryan Chen This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, Print Production manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the Charlie Arteago, Jr., and Monarch Print Copy and Design Solutions United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. CONTENTS Message from the Associate Director . 2 An Award-Winning Organization . 4 CORAL Contract Awarded and Nonrecurring Engineering Begins . 6 Preparing Codes for a Technology Transition . 8 Flux: A Framework for Resource Management .
    [Show full text]
  • Biomolecular Simulation Data Management In
    BIOMOLECULAR SIMULATION DATA MANAGEMENT IN HETEROGENEOUS ENVIRONMENTS Julien Charles Victor Thibault A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biomedical Informatics The University of Utah December 2014 Copyright © Julien Charles Victor Thibault 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Julien Charles Victor Thibault has been approved by the following supervisory committee members: Julio Cesar Facelli , Chair 4/2/2014___ Date Approved Thomas E. Cheatham , Member 3/31/2014___ Date Approved Karen Eilbeck , Member 4/3/2014___ Date Approved Lewis J. Frey _ , Member 4/2/2014___ Date Approved Scott P. Narus , Member 4/4/2014___ Date Approved And by Wendy W. Chapman , Chair of the Department of Biomedical Informatics and by David B. Kieda, Dean of The Graduate School. ABSTRACT Over 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today’s supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk.
    [Show full text]
  • Measuring Power Consumption on IBM Blue Gene/P
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ).
    [Show full text]
  • The Artisanal Nuke, 2014
    The Artisanal Nuke Mary C. Dixon US Air Force Center for Unconventional Weapons Studies Maxwell Air Force Base, Alabama THE ARTISANAL NUKE by Mary C. Dixon USAF Center for Unconventional Weapons Studies 125 Chennault Circle Maxwell Air Force Base, Alabama 36112-6427 July 2014 Disclaimer The opinions, conclusions, and recommendations expressed or implied in this publication are those of the author and do not necessarily reflect the views of the Air University, Air Force, or Department of Defense. ii Table of Contents Chapter Page Disclaimer ................................................................................................... ii Table of Contents ....................................................................................... iii About the Author ......................................................................................... v Acknowledgements ..................................................................................... vi Abstract ....................................................................................................... ix 1 Introduction .............................................................................................. 1 2 Background ............................................................................................ 19 3 Stockpile Stewardship ........................................................................... 27 4 Opposition & Problems ......................................................................... 71 5 Milestones & Accomplishments ..........................................................
    [Show full text]
  • Blue Gene®/Q Overview and Update
    Blue Gene®/Q Overview and Update November 2011 Agenda Hardware Architecture George Chiu Packaging & Cooling Paul Coteus Software Architecture Robert Wisniewski Applications & Configurations Jim Sexton IBM System Technology Group Blue Gene/Q 32 Node Board Industrial Design BQC DD2.0 4-rack system 5D torus 3 © 2011 IBM Corporation IBM System Technology Group Top 10 reasons that you need Blue Gene/Q 1. Ultra-scalability for breakthrough science – System can scale to 256 racks and beyond (>262,144 nodes) – Cluster: typically a few racks (512-1024 nodes) or less. 2. Highest capability machine in the world (20-100PF+ peak) 3. Superior reliability: Run an application across the whole machine, low maintenance 4. Highest power efficiency, smallest footprint, lowest TCO (Total Cost of Ownership) 5. Low latency, high bandwidth inter-processor communication system 6. Low latency, high bandwidth memory system 7. Open source and standards-based programming environment – Red Hat Linux distribution on service, front end, and I/O nodes – Lightweight Compute Node Kernel (CNK) on compute nodes ensures scaling with no OS jitter, enables reproducible runtime results – Automatic SIMD (Single-Instruction Multiple-Data) FPU exploitation enabled by IBM XL (Fortran, C, C++) compilers – PAMI (Parallel Active Messaging Interface) runtime layer. Runs across IBM HPC platforms 8. Software architecture extends application reach – Generalized communication runtime layer allows flexibility of programming model – Familiar Linux execution environment with support for most
    [Show full text]
  • TMA4280—Introduction to Supercomputing
    Supercomputing TMA4280—Introduction to Supercomputing NTNU, IMF January 12. 2018 1 Outline Context: Challenges in Computational Science and Engineering Examples: Simulation of turbulent flows and other applications Goal and means: parallel performance improvement Overview of supercomputing systems Conclusion 2 Computational Science and Engineering (CSE) What is the motivation for Supercomputing? Solve complex problems fast and accurately: — efforts in modelling and simulation push sciences and engineering applications forward, — computational requirements drive the development of new hardware and software. 3 Computational Science and Engineering (CSE) Development of computational methods for scientific research and innovation in engineering and technology. Covers the entire spectrum of natural sciences, mathematics, informatics: — Scientific model (Physics, Biology, Medical, . ) — Mathematical model — Numerical model — Implementation — Visualization, Post-processing — Validation ! Feedback: virtuous circle Allows for larger and more realistic problems to be simulated, new theories to be experimented numerically. 4 Outcome in Industrial Applications Figure: 2004: “The Falcon 7X becomes the first aircraft in industry history to be entirely developed in a virtual environment, from design to manufacturing to maintenance.” Dassault Systèmes 5 Evolution of computational power Figure: Moore’s Law: exponential increase of number of transistors per chip, 1-year rate (1965), 2-year rate (1975). WikiMedia, CC-BY-SA-3.0 6 Evolution of computational power
    [Show full text]
  • Technical Issues in Keeping the Nuclear Stockpile Safe, Secure, and Reliable
    Technical Issues in Keeping the Nuclear Stockpile Safe, Secure, and Reliable Marvin L. Adams Texas A&M University Sidney D. Drell Stanford University ABSTRACT The United States has maintained a safe, secure, and reliable nuclear stockpile for 16 years without relying on underground nuclear explosive tests. We argue that a key ingredient of this success so far has been the expertise of the scientists and engineers at the national labs with the responsibility for the nuclear arsenal and infrastructure. Furthermore for the foreseeable future this cadre will remain the critical asset in sustaining the nuclear enterprise on which the nation will rely, independent of the size of the stockpile or the details of its purpose, and to adapt to new requirements generated by new findings and changes in policies. Expert personnel are the foundation of a lasting and responsive deterrent. Thus far, the United States has maintained a safe, secure, and reliable nuclear stockpile by adhering closely to the designs of existing “legacy” weapons. It remains to be determined whether this is the best course to continue, as opposed to introducing new designs (as envisaged in the original RRW program), re-using previously designed and tested components, or maintaining an evolving combination of new, re-use, and legacy designs. We argue the need for a stewardship program that explores a broad spectrum of options, includes strong peer review in its assessment of options, and provides the necessary flexibility to adjust to different force postures depending on evolving strategic developments. U.S. decisions and actions about nuclear weapons can be expected to affect the nuclear policy choices of other nations – non-nuclear as well as nuclear – on whose cooperation we rely in efforts to reduce the global nuclear danger.
    [Show full text]
  • Introduction to the Practical Aspects of Programming for IBM Blue Gene/P
    Introduction to the Practical Aspects of Programming for IBM Blue Gene/P Alexander Pozdneev Research Software Engineer, IBM June 22, 2015 — International Summer Supercomputing Academy IBM Science and Technology Center Established in 2006 • Research projects I HPC simulations I Security I Oil and gas applications • ESSL math library for IBM POWER • IBM z Systems mainframes I Firmware I Linux on z Systems I Software • IBM Rational software • Smarter Commerce for Retail • Smarter Cities 2 c 2015 IBM Corporation Outline 1 Blue Gene architecture in the HPC world 2 Blue Gene/P architecture 3 Conclusion 4 References 3 c 2015 IBM Corporation Outline 1 Blue Gene architecture in the HPC world Brief history of Blue Gene architecture Blue Gene in TOP500 Blue Gene in Graph500 Scientific applications Gordon Bell awards Blue Gene installation sites 2 Blue Gene/P architecture 3 Conclusion 4 References 4 c 2015 IBM Corporation Brief history of Blue Gene architecture • 1999 US$100M research initiative, novel massively parallel architecture, protein folding • 2003, Nov Blue Gene/L first prototype TOP500, #73 • 2004, Nov 16 Blue Gene/L racks at LLNL TOP500, #1 • 2007 Second generation Blue Gene/P • 2009 National Medal of Technology and Innovation • 2012 Third generation Blue Gene/Q • Features: I Low frequency, low power consumption I Fast interconnect balanced with CPU I Light-weight OS 5 c 2015 IBM Corporation Blue Gene in TOP500 Date # System Model Nodes Cores Jun’14 / Nov’14 3 Sequoia Q 96k 1.5M Jun’13 / Nov’13 3 Sequoia Q 96k 1.5M Nov’12 2 Sequoia
    [Show full text]
  • A Comparison of the Current Top Supercomputers
    A Comparison of the Current Top Supercomputers Russell Martin Xavier Gallardo Top500.org • Started in 1993 to maintain statistics on the top 500 performing supercomputers in the world • Started by Hans Meuer, updated twice annually • Uses LINPACK as main benchmark November 2012 List 1. Titan, Cray, USA 2. Sequoia, IBM, USA 3. K Computer, Fujitsu, Japan 4. Mira, IBM, USA 5. JUQueen, IBM, Germany 6. SuperMUC, IBM, Germany Performance Trends http://top500.org/statistics/perfdevel/ Trends • 84.8% use 6+ processor cores • 76% Intel, 12% AMD Opteron, 10.6% IBM Power • Most common Interconnects - Infiniband, Gigabit Ethernet • 251 in USA, 123 in Asia, 105 in Europe • Current #1 Computer • Oak Ridge National Laboratory Oak Ridge, Tennessee • Operational Oct 29, 2012 • Scientific research • AMD Opteron / NVIDIA Tesla • 18688 Nodes, 560640 Cores • Cray Gemini Interconnect • Theoretical 27 PFLOPS Sequoia • Ranked #2 • Located at the Lawrence Livermore National Library in Livermore, CA • Designed for "Uncertainty Quantification Studies" • Fully classified work starting April 17 2013 Sequoia • Cores: 1572864 • Nodes: 98304 • RAM: 1572864 GB • Interconnect: Custom 5D Torus • Max Performance: 16324.8 Tflops • Theoretical Max: 20132.7 Tflops • Power Consumption: 7890 kW Sequoia • IBM BlueGene/Q Architecture o Started in 1999 for protein folding o Custom SoC Layout 16 PPC Cores per Chip, each 1.6Ghz, 0.8V • Built for OpenMP and POSIX programs • Automatic SIMD exploitation Sequoia • #3 on Top 500, #1 June 2011 • RIKEN Advanced Institute for Computational
    [Show full text]