Measuring Power Consumption on IBM Blue Gene/P
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Performance Analysis on Energy Efficient High
Performance Analysis on Energy Efficient High-Performance Architectures Roman Iakymchuk, François Trahay To cite this version: Roman Iakymchuk, François Trahay. Performance Analysis on Energy Efficient High-Performance Architectures. CC’13 : International Conference on Cluster Computing, Jun 2013, Lviv, Ukraine. hal-00865845 HAL Id: hal-00865845 https://hal.archives-ouvertes.fr/hal-00865845 Submitted on 25 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Performance Analysis on Energy Ecient High-Performance Architectures Roman Iakymchuk1 and François Trahay1 1Institut Mines-Télécom Télécom SudParis 9 Rue Charles Fourier 91000 Évry France {roman.iakymchuk, francois.trahay}@telecom-sudparis.eu Abstract. With the shift in high-performance computing (HPC) towards energy ecient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing perfor- mance analysis tools to these new systems. We present EZTrace a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on a mechanism for instrumenting func- tions, a lightweight tool for recording events, and a generic interface for writing traces. To support EZTrace on energy ecient HPC systems, we developed a CUDA module and ported EZTrace to ARM processors. -
The Reliability Wall for Exascale Supercomputing Xuejun Yang, Member, IEEE, Zhiyuan Wang, Jingling Xue, Senior Member, IEEE, and Yun Zhou
. IEEE TRANSACTIONS ON COMPUTERS, VOL. *, NO. *, * * 1 The Reliability Wall for Exascale Supercomputing Xuejun Yang, Member, IEEE, Zhiyuan Wang, Jingling Xue, Senior Member, IEEE, and Yun Zhou Abstract—Reliability is a key challenge to be understood to turn the vision of exascale supercomputing into reality. Inevitably, large-scale supercomputing systems, especially those at the peta/exascale levels, must tolerate failures, by incorporating fault- tolerance mechanisms to improve their reliability and availability. As the benefits of fault-tolerance mechanisms rarely come without associated time and/or capital costs, reliability will limit the scalability of parallel applications. This paper introduces for the first time the concept of “Reliability Wall” to highlight the significance of achieving scalable performance in peta/exascale supercomputing with fault tolerance. We quantify the effects of reliability on scalability, by proposing a reliability speedup, defining quantitatively the reliability wall, giving an existence theorem for the reliability wall, and categorizing a given system according to the time overhead incurred by fault tolerance. We also generalize these results into a general reliability speedup/wall framework by considering not only speedup but also costup. We analyze and extrapolate the existence of the reliability wall using two representative supercomputers, Intrepid and ASCI White, both employing checkpointing for fault tolerance, and have also studied the general reliability wall using Intrepid. These case studies provide insights on how to mitigate reliability-wall effects in system design and through hardware/software optimizations in peta/exascale supercomputing. Index Terms—fault tolerance, exascale, performance metric, reliability speedup, reliability wall, checkpointing. ! 1INTRODUCTION a revolution in computing at a greatly accelerated pace [2]. -
SEVENTH FRAMEWORK PROGRAMME Research Infrastructures
SEVENTH FRAMEWORK PROGRAMME Research Infrastructures INFRA-2007-2.2.2.1 - Preparatory phase for 'Computer and Data Treat- ment' research infrastructures in the 2006 ESFRI Roadmap PRACE Partnership for Advanced Computing in Europe Grant Agreement Number: RI-211528 D8.3.2 Final technical report and architecture proposal Final Version: 1.0 Author(s): Ramnath Sai Sagar (BSC), Jesus Labarta (BSC), Aad van der Steen (NCF), Iris Christadler (LRZ), Herbert Huber (LRZ) Date: 25.06.2010 D8.3.2 Final technical report and architecture proposal Project and Deliverable Information Sheet PRACE Project Project Ref. №: RI-211528 Project Title: Final technical report and architecture proposal Project Web Site: http://www.prace-project.eu Deliverable ID: : <D8.3.2> Deliverable Nature: Report Deliverable Level: Contractual Date of Delivery: PU * 30 / 06 / 2010 Actual Date of Delivery: 30 / 06 / 2010 EC Project Officer: Bernhard Fabianek * - The dissemination level are indicated as follows: PU – Public, PP – Restricted to other participants (including the Commission Services), RE – Restricted to a group specified by the consortium (including the Commission Services). CO – Confidential, only for members of the consortium (including the Commission Services). PRACE - RI-211528 i 25.06.2010 D8.3.2 Final technical report and architecture proposal Document Control Sheet Title: : Final technical report and architecture proposal Document ID: D8.3.2 Version: 1.0 Status: Final Available at: http://www.prace-project.eu Software Tool: Microsoft Word 2003 File(s): D8.3.2_addn_v0.3.doc -
LLNL Computation Directorate Annual Report (2014)
PRODUCTION TEAM LLNL Associate Director for Computation Dona L. Crawford Deputy Associate Directors James Brase, Trish Damkroger, John Grosh, and Michel McCoy Scientific Editors John Westlund and Ming Jiang Art Director Amy Henke Production Editor Deanna Willis Writers Andrea Baron, Rose Hansen, Caryn Meissner, Linda Null, Michelle Rubin, and Deanna Willis Proofreader Rose Hansen Photographer Lee Baker LLNL-TR-668095 3D Designer Prepared by LLNL under Contract DE-AC52-07NA27344. Ryan Chen This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, Print Production manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the Charlie Arteago, Jr., and Monarch Print Copy and Design Solutions United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. CONTENTS Message from the Associate Director . 2 An Award-Winning Organization . 4 CORAL Contract Awarded and Nonrecurring Engineering Begins . 6 Preparing Codes for a Technology Transition . 8 Flux: A Framework for Resource Management . -
Recent Developments in Supercomputing
John von Neumann Institute for Computing Recent Developments in Supercomputing Th. Lippert published in NIC Symposium 2008, G. M¨unster, D. Wolf, M. Kremer (Editors), John von Neumann Institute for Computing, J¨ulich, NIC Series, Vol. 39, ISBN 978-3-9810843-5-1, pp. 1-8, 2008. c 2008 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above. http://www.fz-juelich.de/nic-series/volume39 Recent Developments in Supercomputing Thomas Lippert J¨ulich Supercomputing Centre, Forschungszentrum J¨ulich 52425 J¨ulich, Germany E-mail: [email protected] Status and recent developments in the field of supercomputing on the European and German level as well as at the Forschungszentrum J¨ulich are presented. Encouraged by the ESFRI committee, the European PRACE Initiative is going to create a world-leading European tier-0 supercomputer infrastructure. In Germany, the BMBF formed the Gauss Centre for Supercom- puting, the largest national association for supercomputing in Europe. The Gauss Centre is the German partner in PRACE. With its new Blue Gene/P system, the J¨ulich supercomputing centre has realized its vision of a dual system complex and is heading for Petaflop/s already in 2009. In the framework of the JuRoPA-Project, in cooperation with German and European industrial partners, the JSC will build a next generation general purpose system with very good price-performance ratio and energy efficiency. -
Biomolecular Simulation Data Management In
BIOMOLECULAR SIMULATION DATA MANAGEMENT IN HETEROGENEOUS ENVIRONMENTS Julien Charles Victor Thibault A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biomedical Informatics The University of Utah December 2014 Copyright © Julien Charles Victor Thibault 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Julien Charles Victor Thibault has been approved by the following supervisory committee members: Julio Cesar Facelli , Chair 4/2/2014___ Date Approved Thomas E. Cheatham , Member 3/31/2014___ Date Approved Karen Eilbeck , Member 4/3/2014___ Date Approved Lewis J. Frey _ , Member 4/2/2014___ Date Approved Scott P. Narus , Member 4/4/2014___ Date Approved And by Wendy W. Chapman , Chair of the Department of Biomedical Informatics and by David B. Kieda, Dean of The Graduate School. ABSTRACT Over 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today’s supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk. -
Supercomputer Fugaku
Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst. -
The QPACE Supercomputer Applications of Random Matrix Theory in Two-Colour Quantum Chromodynamics Dissertation
The QPACE Supercomputer Applications of Random Matrix Theory in Two-Colour Quantum Chromodynamics Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften (Dr. rer. nat.) der Fakult¨atf¨urPhysik der Universit¨atRegensburg vorgelegt von Nils Meyer aus Kempten im Allg¨au Mai 2013 Die Arbeit wurde angeleitet von: Prof. Dr. T. Wettig Das Promotionsgesuch wurde eingereicht am: 07.05.2013 Datum des Promotionskolloquiums: 14.06.2016 Pr¨ufungsausschuss: Vorsitzender: Prof. Dr. C. Strunk Erstgutachter: Prof. Dr. T. Wettig Zweitgutachter: Prof. Dr. G. Bali weiterer Pr¨ufer: Prof. Dr. F. Evers F¨urDenise. Contents List of Figures v List of Tables vii List of Acronyms ix Outline xiii I The QPACE Supercomputer 1 1 Introduction 3 1.1 Supercomputers . .3 1.2 Contributions to QPACE . .5 2 Design Overview 7 2.1 Architecture . .7 2.2 Node-card . .9 2.3 Cooling . 10 2.4 Communication networks . 10 2.4.1 Torus network . 10 2.4.1.1 Characteristics . 10 2.4.1.2 Communication concept . 11 2.4.1.3 Partitioning . 12 2.4.2 Ethernet network . 13 2.4.3 Global signals network . 13 2.5 System setup . 14 2.5.1 Front-end system . 14 2.5.2 Ethernet networks . 14 2.6 Other system components . 15 2.6.1 Root-card . 15 2.6.2 Superroot-card . 17 i ii CONTENTS 3 The IBM Cell Broadband Engine 19 3.1 The Cell Broadband Engine and the supercomputer league . 19 3.2 PowerXCell 8i overview . 19 3.3 Lattice QCD on the Cell BE . 20 3.3.1 Performance model . -
QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine
N OVEL A RCHITECTURES QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine Application-driven computers for Lattice Gauge Theory simulations have often been based on system-on-chip designs, but the development costs can be prohibitive for academic project budgets. An alternative approach uses compute nodes based on a commercial processor tightly coupled to a custom-designed network processor. Preliminary analysis shows that this solution offers good performance, but it also entails several challenges, including those arising from the processor’s multicore structure and from implementing the network processor on a field- programmable gate array. 1521-9615/08/$25.00 © 2008 IEEE uantum chromodynamics (QCD) is COPUBLISHED BY THE IEEE CS AND THE AIP a well-established theoretical frame- Gottfried Goldrian, Thomas Huth, Benjamin Krill, work to describe the properties and Jack Lauritsen, and Heiko Schick Q interactions of quarks and gluons, which are the building blocks of protons, neutrons, IBM Research and Development Lab, Böblingen, Germany and other particles. In some physically interesting Ibrahim Ouda dynamical regions, researchers can study QCD IBM Systems and Technology Group, Rochester, Minnesota perturbatively—that is, they can work out physi- Simon Heybrock, Dieter Hierl, Thilo Maurer, Nils Meyer, cal observables as a systematic expansion in the Andreas Schäfer, Stefan Solbrig, Thomas Streuer, strong coupling constant. In other (often more and Tilo Wettig interesting) regions, such an expansion isn’t pos- sible, so researchers must find other approaches. University of Regensburg, Germany The most systematic and widely used nonper- Dirk Pleiter, Karl-Heinz Sulanke, and Frank Winter turbative approach is lattice QCD (LQCD), a Deutsches Elektronen Synchrotron, Zeuthen, Germany discretized, computer-friendly version of the Hubert Simma theory Kenneth Wilson proposed more than 1 University of Milano-Bicocca, Italy 30 years ago. -
Download Presentation
Computer architecture in the past and next quarter century CNNA 2010 Berkeley, Feb 3, 2010 H. Peter Hofstee Cell/B.E. Chief Scientist IBM Systems and Technology Group CMOS Microprocessor Trends, The First ~25 Years ( Good old days ) Log(Performance) Single Thread More active transistors, higher frequency 2005 Page 2 SPECINT 10000 3X From Hennessy and Patterson, Computer Architecture: A ??%/year 1000 Quantitative Approach , 4th edition, 2006 52%/year 100 Performance (vs. VAX-11/780) VAX-11/780) (vs. Performance 10 25%/year 1 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 • VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 Page 3 CMOS Devices hit a scaling wall 1000 Air Cooling limit ) 100 2 Active 10 Power Passive Power 1 0.1 Power Density (W/cm Power Density 0.01 1994 2004 0.001 1 0.1 0.01 Gate Length (microns) Isaac e.a. IBM Page 4 Microprocessor Trends More active transistors, higher frequency Log(Performance) Single Thread More active transistors, higher frequency 2005 Page 5 Microprocessor Trends Multi-Core More active transistors, higher frequency Log(Performance) Single Thread More active transistors, higher frequency 2005 2015(?) Page 6 Multicore Power Server Processors Power 4 Power 5 Power 7 2001 2004 2009 Introduces Dual core Dual Core – 4 threads 8 cores – 32 threads Page 7 Why are (shared memory) CMPs dominant? A new system delivers nearly twice the throughput performance of the previous one without application-level changes. Applications do not degrade in performance when ported (to a next-generation processor). -
HPC Hardware & Software Development At
Heiko J Schick – IBM Deutschland R&D GmbH August 2010 HPC HW & SW Development at IBM Research & Development Lab © 2010 IBM Corporation Agenda . Section 1: Hardware and Software Development Process . Section 2: Hardware and Firmware Development . Section 3: Operating System and Device Driver Development . Section 4: Performance Tuning . Section 5: Cluster Management . Section 6: Project Examples 2 © 2010 IBM Corporation IBM Deutschland Research & Development GmbH Overview Focus Areas . One of IBM‘s largest Research & Text. Skills: Hardware, Firmware, Development sites Operating Systems, Software . Founded: and Services 1953 . More than 60 Hard- . Employees: and Software projects Berlin ~2.000 . Technology consulting . Headquarter: . Cooperation with Mainz Böblingen research institutes Walldorf and universities Böblingen . Managing Director: München Dirk Wittkopp 3 3 © 2010 IBM Corporation Research Zürich Watson Almaden China Tokio Haifa Austin India 4 © Copyright IBM Corporation 2009 Research Hardware Development Greenock Rochester Boulder Böblingen Toronto Fujisawa Endicott Burlington La Gaude San Jose East Fishkill Poughkeepsie Yasu Tucson Haifa Yamato Austin Raleigh Bangalore 5 © Copyright IBM Corporation 2009 Research Hardware Development Software Development Krakau Moskau Vancouver Dublin Hursley Minsk Rochester Böblingen Beaverton Toronto Paris Endicott Santa Foster Rom City Lenexa Littleton Beijing Teresa Poughkeepsie Haifa Yamato Austin Raleigh Costa Kairo Schanghai Mesa Taipei Pune Bangalore São Paolo Golden Coast Perth Sydney -
Blue Gene®/Q Overview and Update
Blue Gene®/Q Overview and Update November 2011 Agenda Hardware Architecture George Chiu Packaging & Cooling Paul Coteus Software Architecture Robert Wisniewski Applications & Configurations Jim Sexton IBM System Technology Group Blue Gene/Q 32 Node Board Industrial Design BQC DD2.0 4-rack system 5D torus 3 © 2011 IBM Corporation IBM System Technology Group Top 10 reasons that you need Blue Gene/Q 1. Ultra-scalability for breakthrough science – System can scale to 256 racks and beyond (>262,144 nodes) – Cluster: typically a few racks (512-1024 nodes) or less. 2. Highest capability machine in the world (20-100PF+ peak) 3. Superior reliability: Run an application across the whole machine, low maintenance 4. Highest power efficiency, smallest footprint, lowest TCO (Total Cost of Ownership) 5. Low latency, high bandwidth inter-processor communication system 6. Low latency, high bandwidth memory system 7. Open source and standards-based programming environment – Red Hat Linux distribution on service, front end, and I/O nodes – Lightweight Compute Node Kernel (CNK) on compute nodes ensures scaling with no OS jitter, enables reproducible runtime results – Automatic SIMD (Single-Instruction Multiple-Data) FPU exploitation enabled by IBM XL (Fortran, C, C++) compilers – PAMI (Parallel Active Messaging Interface) runtime layer. Runs across IBM HPC platforms 8. Software architecture extends application reach – Generalized communication runtime layer allows flexibility of programming model – Familiar Linux execution environment with support for most