Delivering Insight: the History of the Accelerated Strategic Computing

Total Page:16

File Type:pdf, Size:1020Kb

Delivering Insight: the History of the Accelerated Strategic Computing Lawrence Livermore National Laboratory Computation Directorate Dona L. Crawford Computation Associate Director Lawrence Livermore National Laboratory 7000 East Avenue, L-559 Livermore, CA 94550 September 14, 2009 Dear Colleague: Several years ago, I commissioned Alex R. Larzelere II to research and write a history of the U.S. Department of Energy’s Accelerated Strategic Computing Initiative (ASCI) and its evolution into the Advanced Simulation and Computing (ASC) Program. The goal was to document the first 10 years of ASCI: how this integrated and sustained collaborative effort reached its goals, became a base program, and changed the face of high-performance computing in a way that independent, individually funded R&D projects for applications, facilities, infrastructure, and software development never could have achieved. Mr. Larzelere has combined the documented record with first-hand recollections of prominent leaders into a highly readable, 200-page account of the history of ASCI. The manuscript is a testament to thousands of hours of research and writing and the contributions of dozens of people. It represents, most fittingly, a collaborative effort many years in the making. I’m pleased to announce that Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI) has been approved for unlimited distribution and is available online at https://asc.llnl.gov/asc_history/. Sincerely, Dona L. Crawford Computation Associate Director Lawrence Livermore National Laboratory An Equal Opportunity Employer • University of California • P.O. Box 808 Livermore, California94550 • Telephone (925) 422-2449 • Fax (925) 423-1466 Delivering Insight The History of the Accelerated Strategic Computing Initiative (ASCI) Prepared by: Alex R. Larzelere II For Lawrence Livermore National Laboratory Under Sub-Contract B545072 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory in part under Contract W-7405-Eng-48 and in part under Contract DE-AC52-07NA27344. This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. UCRL-TR-231286 ii Contents Contents……………. ....................................................................................................... ……..iii Figures…………….. ................................................................................................................. v Acknowledgements….. ........................................................................................................... vii Foreword by Dona Crawford….. ............................................................................................. ix Executive Summary . .................................................................................................................. 1 Chapter One: New Tools for Scientific Insight .......................................................................... 7 Chapter Two: Building the Initiative ........................................................................................ 19 Chapter Three: Applications – At the Heart of Delivering Insight .......................................... 33 Performance and Safety Applications – The Direct Connection to the Weapons….. ................................................................................................................ 41 Materials Modeling –A t the Level of First Principles ........................................................ 49 Programming Models –Creating the Applications ............................................................. 55 Chapter Four: Platforms – Power-Plants for Simulations ........................................................65 ASCI Red – Breaking the TeraFLOP/s Barrier ................................................................... 71 ASCI Blue Pacific and Mountain – Keeping the Industry Viable........................................79 Linux Clusters – Providing Cost Effective TeraFLOP/s ...................................................... 87 BlueGene/L – Collaborative Innovation at its Best ............................................................. 95 ASC Purple – Fulfilling the Promise of Power .................................................................. 105 Chapter Five – Environments for Simulation Capabilities ..................................................... 111 iii Parallel Programming Tools – Enabling a New Approach to High-Performance Applications .....................................................................................117 Scalable Visualization – Insights from Data ......................................................................125 Chapter Six – Partnering to Deliver Insight ...........................................................................135 Academic Alliances – Harnessing the Power of Universities ............................................139 IBM and Lawrence Livermore – A Sense of Shared Mission ..............................................149 ASCI Organization – Synergy at the Edge of Chaos ..........................................................157 Chapter Seven – Impact and Lessons Learned .......................................................................165 Chapter Eight – Looking Toward the Future ..........................................................................171 Epilogue – A Future Envisioned by Visionaries ......................................................................177 Schlesinger Award Citation for Gil Weigand .................................................................…178 Text of Gil Weigand’sA cceptance Speech ....................................................................…179 Schlesinger Award Citation for Vic Ries .......................................................................…181 Text of Vic Ries’A cceptance Speech ............................................................................…182 Appendix A – Chronology of Events .......................................................................................187 Appendix B – ASCI Platform Configurations .........................................................................191 Appendix C – ASCI Leadership ..............................................................................................193 About the Author….. ..............................................................................................................195 Glossary …………… ..............................................................................................................197 Bibliography……….. ..............................................................................................................205 Index………………… .............................................................................................................213 iv Figures 1-1 Vic Reis .................................................................................................................... 11 1-2 ENIAC ...................................................................................................................... 11 1-3 Vignette Timeframes ................................................................................................ 17 2-1 Gil Weigand .............................................................................................................. 22 2-2 ASCI Organizational Plan ........................................................................................ 23 3-1 Galileo’s Gravity Experiment ................................................................................... 34 3-2 Example of a Finite Element Mesh .......................................................................... 37 3-3 Example of a Validation Experiment ....................................................................... 38 3-4 Comparison of Simulation and Experiment for Validation ...................................... 46 3-5 Example of 3D Simulation of Rayleigh Taylor Instability ...................................... 47 3-6 Physics Occurs at Many Scales ................................................................................ 51 3-7 A 160 Million Atom Simulation of Copper Undergoing a Shock ................................................................................................. 53 4-1 The ASCI Red System at Sandia ............................................................................. 76 4-2 ASC Red Storm at Sandia ........................................................................................ 77 4-3 The ASCI “Curve” as it Appeared in the 1996 ASCI Program Plan ....................... 80 4-4 ASCI
Recommended publications
  • Year in Review 2 NEWSLINE January 7, 2011 2010: S&T Achievement and Building for the Future
    Published for Nthe employees of LawrenceEWSLINE Livermore National Laboratory January 7, 2011 Vol. 4, No. 1 Year in review 2 NEWSLINE January 7, 2011 2010: S&T achievement and building for the future hile delivering on its mission obligations with award-winning sci- ence and technology, the Laboratory also spent 2010 building for the future. W In an October all-hands address, Director George Miller said his top priori- ties are investing for the future in programmatic growth and the underpinning infrastructure, as well as recruiting and retaining top talent at the Lab. In Review “It’s an incredibly exciting situation we find ourselves in,” Miller said in an earlier talk about the Lab’s strategic outlook. “If you look at the set of issues facing the country, the Laboratory has experience in all of them.” Defining “national security” broadly, Miller said the Lab will continue to make vital contributions to stockpile stewardship, homeland security, nonprolif- eration, arms control, the environment, climate change and sustainable energy. “Energy, environment and climate change are national security issues,” he said. With an eye toward accelerating the development of technologies that benefit national security and industry, the Lab partnered with Sandia-Calif. to launch the Livermore Valley Open Campus (LVOC) on the Lab’s southeast side. Construction has begun on an R&D campus outside the fence that will allow for collaboration in a broad set of disciplines critical to the fulfillment DOE/NNSA missions and to strengthening U.S. industry’s economic competitiveness, includ- If you look at the set of issues facing ing high-performance computing, energy, cyber security and environment.
    [Show full text]
  • The ASCI Red TOPS Supercomputer
    The ASCI Red TOPS Supercomputer http://www.sandia.gov/ASCI/Red/RedFacts.htm The ASCI Red TOPS Supercomputer Introduction The ASCI Red TOPS Supercomputer is the first step in the ASCI Platforms Strategy, which is aimed at giving researchers the five-order-of-magnitude increase in computing performance over current technology that is required to support "full-physics," "full-system" simulation by early next century. This supercomputer, being installed at Sandia National Laboratories, is a massively parallel, MIMD computer. It is noteworthy for several reasons. It will be the world's first TOPS supercomputer. I/O, memory, compute nodes, and communication are scalable to an extreme degree. Standard parallel interfaces will make it relatively simple to port parallel applications to this system. The system uses two operating systems to make the computer both familiar to the user (UNIX) and non-intrusive for the scalable application (Cougar). And it makes use of Commercial Commodity Off The Shelf (CCOTS) technology to maintain affordability. Hardware The ASCI TOPS system is a distributed memory, MIMD, message-passing supercomputer. All aspects of this system architecture are scalable, including communication bandwidth, main memory, internal disk storage capacity, and I/O. Artist's Concept The TOPS Supercomputer is organized into four partitions: Compute, Service, System, and I/O. The Service Partition provides an integrated, scalable host that supports interactive users, application development, and system administration. The I/O Partition supports a scalable file system and network services. The System Partition supports system Reliability, Availability, and Serviceability (RAS) capabilities. Finally, the Compute Partition contains nodes optimized for floating point performance and is where parallel applications execute.
    [Show full text]
  • The Reliability Wall for Exascale Supercomputing Xuejun Yang, Member, IEEE, Zhiyuan Wang, Jingling Xue, Senior Member, IEEE, and Yun Zhou
    . IEEE TRANSACTIONS ON COMPUTERS, VOL. *, NO. *, * * 1 The Reliability Wall for Exascale Supercomputing Xuejun Yang, Member, IEEE, Zhiyuan Wang, Jingling Xue, Senior Member, IEEE, and Yun Zhou Abstract—Reliability is a key challenge to be understood to turn the vision of exascale supercomputing into reality. Inevitably, large-scale supercomputing systems, especially those at the peta/exascale levels, must tolerate failures, by incorporating fault- tolerance mechanisms to improve their reliability and availability. As the benefits of fault-tolerance mechanisms rarely come without associated time and/or capital costs, reliability will limit the scalability of parallel applications. This paper introduces for the first time the concept of “Reliability Wall” to highlight the significance of achieving scalable performance in peta/exascale supercomputing with fault tolerance. We quantify the effects of reliability on scalability, by proposing a reliability speedup, defining quantitatively the reliability wall, giving an existence theorem for the reliability wall, and categorizing a given system according to the time overhead incurred by fault tolerance. We also generalize these results into a general reliability speedup/wall framework by considering not only speedup but also costup. We analyze and extrapolate the existence of the reliability wall using two representative supercomputers, Intrepid and ASCI White, both employing checkpointing for fault tolerance, and have also studied the general reliability wall using Intrepid. These case studies provide insights on how to mitigate reliability-wall effects in system design and through hardware/software optimizations in peta/exascale supercomputing. Index Terms—fault tolerance, exascale, performance metric, reliability speedup, reliability wall, checkpointing. ! 1INTRODUCTION a revolution in computing at a greatly accelerated pace [2].
    [Show full text]
  • An Overview of the Blue Gene/L System Software Organization
    An Overview of the Blue Gene/L System Software Organization George Almasi´ , Ralph Bellofatto , Jose´ Brunheroto , Calin˘ Cas¸caval , Jose´ G. ¡ Castanos˜ , Luis Ceze , Paul Crumley , C. Christopher Erway , Joseph Gagliano , Derek Lieber , Xavier Martorell , Jose´ E. Moreira , Alda Sanomiya , and Karin ¡ Strauss ¢ IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598-0218 £ gheorghe,ralphbel,brunhe,cascaval,castanos,pgc,erway, jgaglia,lieber,xavim,jmoreira,sanomiya ¤ @us.ibm.com ¥ Department of Computer Science University of Illinois at Urbana-Champaign Urabana, IL 61801 £ luisceze,kstrauss ¤ @uiuc.edu Abstract. The Blue Gene/L supercomputer will use system-on-a-chip integra- tion and a highly scalable cellular architecture. With 65,536 compute nodes, Blue Gene/L represents a new level of complexity for parallel system software, with specific challenges in the areas of scalability, maintenance and usability. In this paper we present our vision of a software architecture that faces up to these challenges, and the simulation framework that we have used for our experiments. 1 Introduction In November 2001 IBM announced a partnership with Lawrence Livermore National Laboratory to build the Blue Gene/L (BG/L) supercomputer, a 65,536-node machine de- signed around embedded PowerPC processors. Through the use of system-on-a-chip in- tegration [10], coupled with a highly scalable cellular architecture, Blue Gene/L will de- liver 180 or 360 Teraflops of peak computing power, depending on the utilization mode. Blue Gene/L represents a new level of scalability for parallel systems. Whereas existing large scale systems range in size from hundreds (ASCI White [2], Earth Simulator [4]) to a few thousands (Cplant [3], ASCI Red [1]) of compute nodes, Blue Gene/L makes a jump of almost two orders of magnitude.
    [Show full text]
  • Advances in Ultrashort-Pulse Lasers • Modeling Dispersions of Biological and Chemical Agents • Centennial of E
    October 2001 U.S. Department of Energy’s Lawrence Livermore National Laboratory Also in this issue: • More Advances in Ultrashort-Pulse Lasers • Modeling Dispersions of Biological and Chemical Agents • Centennial of E. O. Lawrence’s Birth About the Cover Computing systems leader Greg Tomaschke works at the console of the 680-gigaops Compaq TeraCluster2000 parallel supercomputer, one of the principal machines used to address large-scale scientific simulations at Livermore. The supercomputer is accessible to unclassified program researchers throughout the Laboratory, thanks to the Multiprogrammatic and Institutional Computing (M&IC) Initiative described in the article beginning on p. 4. M&IC makes supercomputers an institutional resource and helps scientists realize the potential of advanced, three-dimensional simulations. Cover design: Amy Henke About the Review Lawrence Livermore National Laboratory is operated by the University of California for the Department of Energy’s National Nuclear Security Administration. At Livermore, we focus science and technology on assuring our nation’s security. We also apply that expertise to solve other important national problems in energy, bioscience, and the environment. Science & Technology Review is published 10 times a year to communicate, to a broad audience, the Laboratory’s scientific and technological accomplishments in fulfilling its primary missions. The publication’s goal is to help readers understand these accomplishments and appreciate their value to the individual citizen, the nation, and the world. Please address any correspondence (including name and address changes) to S&TR, Mail Stop L-664, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, California 94551, or telephone (925) 423-3432. Our e-mail address is [email protected].
    [Show full text]
  • 2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe
    sandia national laboratories 2017 HIGH PERformance computing The 2017 High Performance Computing Annual Report is dedicated to John Noe and Dino Pavlakos. Building a foundational framework Editor in high performance computing Yasmin Dennig Contributing Writers Megan Davidson Sandia National Laboratories has a long history of significant contributions to the high performance computing Mattie Hensley community and industry. Our innovative computer architectures allowed the United States to become the first to break the teraflop barrier—propelling us to the international spotlight. Our advanced simulation and modeling capabilities have been integral in high consequence US operations such as Operation Burnt Frost. Strong partnerships with industry leaders, such as Cray, Inc. and Goodyear, have enabled them to leverage our high performance computing capabilities to gain a tremendous competitive edge in the marketplace. Contributing Editor Laura Sowko As part of our continuing commitment to provide modern computing infrastructure and systems in support of Sandia’s missions, we made a major investment in expanding Building 725 to serve as the new home of high performance computer (HPC) systems at Sandia. Work is expected to be completed in 2018 and will result in a modern facility of approximately 15,000 square feet of computer center space. The facility will be ready to house the newest National Nuclear Security Administration/Advanced Simulation and Computing (NNSA/ASC) prototype Design platform being acquired by Sandia, with delivery in late 2019 or early 2020. This new system will enable continuing Stacey Long advances by Sandia science and engineering staff in the areas of operating system R&D, operation cost effectiveness (power and innovative cooling technologies), user environment, and application code performance.
    [Show full text]
  • LLNL Computation Directorate Annual Report (2014)
    PRODUCTION TEAM LLNL Associate Director for Computation Dona L. Crawford Deputy Associate Directors James Brase, Trish Damkroger, John Grosh, and Michel McCoy Scientific Editors John Westlund and Ming Jiang Art Director Amy Henke Production Editor Deanna Willis Writers Andrea Baron, Rose Hansen, Caryn Meissner, Linda Null, Michelle Rubin, and Deanna Willis Proofreader Rose Hansen Photographer Lee Baker LLNL-TR-668095 3D Designer Prepared by LLNL under Contract DE-AC52-07NA27344. Ryan Chen This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, Print Production manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the Charlie Arteago, Jr., and Monarch Print Copy and Design Solutions United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. CONTENTS Message from the Associate Director . 2 An Award-Winning Organization . 4 CORAL Contract Awarded and Nonrecurring Engineering Begins . 6 Preparing Codes for a Technology Transition . 8 Flux: A Framework for Resource Management .
    [Show full text]
  • An Extensible Administration and Configuration Tool for Linux Clusters
    An extensible administration and configuration tool for Linux clusters John D. Fogarty B.Sc A dissertation submitted to the University of Dublin, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science 1999 Declaration I declare that the work described in this dissertation is, except where otherwise stated, entirely my own work and has not been submitted as an exercise for a degree at this or any other university. Signed: ___________________ John D. Fogarty 15th September, 1999 Permission to lend and/or copy I agree that Trinity College Library may lend or copy this dissertation upon request. Signed: ___________________ John D. Fogarty 15th September, 1999 ii Summary This project addresses the lack of system administration tools for Linux clusters. The goals of the project were to design and implement an extensible system that would facilitate the administration and configuration of a Linux cluster. Cluster systems are inherently scalable and therefore the cluster administration tool should also scale well to facilitate the addition of new nodes to the cluster. The tool allows the administration and configuration of the entire cluster from a single node. Administration of the cluster is simplified by way of command replication across one, some or all nodes. Configuration of the cluster is made possible through the use of a flexible, variables substitution scheme, which allows common configuration files to reflect differences between nodes. The system uses a GUI interface and is intuitively simple to use. Extensibility is incorporated into the system, by allowing the dynamic addition of new commands and output display types to the system.
    [Show full text]
  • FY 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory (Rev
    Description of document: FY 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory (Rev. 1 June 15, 2006) Requested date: 26-January-2007 Released date: 11-September-2007 Posted date: 15-October-2007 Title of Document Fiscal Year 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory Date/date range of document: FY 2005 Source of document: Department of Energy National Nuclear Security Administration Service Center P.O. Box 5400 Albuquerque, NM 87185 Freedom of Information Act U.S. Department of Energy 1000 Independence Ave., S.W. Washington, DC 20585 (202) 586-5955 [email protected] http://management.energy.gov/foia_pa.htm The governmentattic.org web site (“the site”) is noncommercial and free to the public. The site and materials made available on the site, such as this file, are for reference only. The governmentattic.org web site and its principals have made every effort to make this information as complete and as accurate as possible, however, there may be mistakes and omissions, both typographical and in content. The governmentattic.org web site and its principals shall have neither liability nor responsibility to any person or entity with respect to any loss or damage caused, or alleged to have been caused, directly or indirectly, by the information provided on the governmentattic.org web site or in this file. Department of Energy National Nuclear Security Administration Service Center P. O. Box 5400 Albuquerque, NM 87185 SEP 11 200t CERTIFIED MAIL - RESTRICTED DELIVERY - RETURN RECEIPT REQUESTED This is in final response to your Freedom oflnformation Act (FOIA) request dated January 26, 2007, for "a copy ofthe most recent two annualperformance reviews for Pantex Site, Kansas City Site, Sandia Site, Los Alamos Site, Y-12 Site and Livermore Site." I contacted the Site Offices who have oversight responsibility for the records you requested, and they are enclosed.
    [Show full text]
  • Biomolecular Simulation Data Management In
    BIOMOLECULAR SIMULATION DATA MANAGEMENT IN HETEROGENEOUS ENVIRONMENTS Julien Charles Victor Thibault A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biomedical Informatics The University of Utah December 2014 Copyright © Julien Charles Victor Thibault 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Julien Charles Victor Thibault has been approved by the following supervisory committee members: Julio Cesar Facelli , Chair 4/2/2014___ Date Approved Thomas E. Cheatham , Member 3/31/2014___ Date Approved Karen Eilbeck , Member 4/3/2014___ Date Approved Lewis J. Frey _ , Member 4/2/2014___ Date Approved Scott P. Narus , Member 4/4/2014___ Date Approved And by Wendy W. Chapman , Chair of the Department of Biomedical Informatics and by David B. Kieda, Dean of The Graduate School. ABSTRACT Over 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today’s supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk.
    [Show full text]
  • NNSA — Weapons Activities
    Corporate Context for National Nuclear Security Administration (NS) Programs This section on Corporate Context that is included for the first time in the Department’s budget is provided to facilitate the integration of the FY 2003 budget and performance measures. The Department’s Strategic Plan published in September 2000 is no longer relevant since it does not reflect the priorities laid out in President Bush’s Management Agenda, the 2001 National Energy Policy, OMB’s R&D project investment criteria or the new policies that will be developed to address an ever evolving and challenging terrorism threat. The Department has initiated the development of a new Strategic Plan due for publication in September 2002, however, that process is just beginning. To maintain continuity of our approach that links program strategic performance goals and annual targets to higher level Departmental goals and Strategic Objectives, the Department has developed a revised set of Strategic Objectives in the structure of the September 2000 Strategic Plan. For more than 50 years, America’s national security has relied on the deterrent provided by nuclear weapons. Designed, built, and tested by the Department of Energy (DOE) and its predecessor agencies, these weapons helped win the Cold War, and they remain a key component of the Nation’s security posture. The Department’s National Nuclear Security Administration (NNSA) now faces a new and complex set of challenges to its national nuclear security missions in countering the threats of the 21st century. One of the most critical challenges is being met by the Stockpile Stewardship program, which is maintaining the effectiveness of our nuclear deterrent in the absence of underground nuclear testing.
    [Show full text]
  • Measuring Power Consumption on IBM Blue Gene/P
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ).
    [Show full text]