The Green500 List: Encouraging Sustainable Supercomputing

Total Page:16

File Type:pdf, Size:1020Kb

The Green500 List: Encouraging Sustainable Supercomputing The Green500 List: Encouraging Sustainable Supercomputing Wu-chun Feng and Kirk W. Cameron Virginia Tech The performance-at-any-cost design mentality ignores supercomputers’excessive power consumption and need for heat dissipation and will ultimately limit their performance. Without fundamental change in the design of supercomputing systems, the performance advances common over the past two decades won’t continue. lthough there’s now been a 10,000-fold puters too unreliable for application scientists to use. increase since 1992 in the performance of Unfortunately, building exotic cooling facilities can cost supercomputers running parallel scientific as much as the supercomputer itself, and operating and applications, performance per watt has only maintaining the facilities costs even more. A improved 300-fold and performance per As “The Energy-Efficient Green Destiny” sidebar square foot only 65-fold. In response to the lagging details, the low-power supercomputer that we developed power and space-efficiency improvements, researchers was extremely reliable, with no unscheduled downtime have had to design and construct new machine rooms, in its two-year lifespan, despite residing in a dusty ware- and in some cases, entirely new buildings. house without cooling, humidification, or air filtration. Compute nodes’ exponentially increasing power The hourly cost of such downtime ranges from $90,000 requirements are a primary driver behind this less effi- for a catalog sales operation to nearly $6.5 million for a cient use of power and space. In fact, the top super- brokerage operation, according to Contingency Planning computers’ peak power consumption has been on the Research’s 2001 cost-of-downtime survey. rise over the past 15 years, as Figure 1 shows. There’s still no guarantee that the supercomputer Today, the 10 most powerful supercomputers on the won’t fail, as Table 1 illustrates. Total cost of ownership TOP500 List (www.top500.org) each require up to 10 now exceeds initial acquisition costs. megawatts of peak power—enough to sustain a city of 40,000. And even though IBM BlueGene/L, the world’s Performance at any cost fastest machine, was custom-built with low-power com- The performance-at-any-cost supercomputer design ponents, the system still consumes several megawatts of paradigm is no longer feasible. Clearly, without signif- power. At anywhere from $200,000 to $1.2 million per icant change in design, the performance gains of the megawatt, per year, these are hardly low-cost machines. past two decades won’t continue. Unfortunately, per- formance-only metrics don’t capture improvements in THE ENERGY CRISIS IN SUPERCOMPUTING power efficiency. Nonetheless, performance-only met- Power is a disruptive technology that requires us to rics derived from the Linpack benchmarks and Standard rethink supercomputer design. As a supercomputer’s Performance Evaluation Corp.’s (SPEC) code suite have nodes consume and dissipate more power, they must be significantly influenced the design of modern high- spaced out and aggressively cooled. Without exotic cool- performance systems, including servers and super- ing facilities, overheating makes traditional supercom- computers. 38 Computer P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y 0018-9162/07/$25.00 © 2007 IEEE Fujitsu IBM SP TMC Numerical Intel ASCI White Earth 6,000 kW CM-5 Wind Tunnel ASCI Red Simulator Small power plant 5 kW 100 kW 850 kW 12,000 kW generating capacity 300,000 kW 106 Gflops High-speed 105 Gflops ) electric train ax ts k (Rm 10,000 kW en pea irem ck qu npa 4 r re Li Commercial data 10 Gflops we age Efficiency po ver eak 5 a gap center 1,374 kW P Top e c n a 3 m 10 Gflops r n o atio Residential air- f plic r l ap e Rea conditioner P 15 kW 102 Gflops 10 Gflops 1 Gflop 1993 1995 1997 1999 2001 2003 2005 2007 2009 Figure 1. Rising power requirements. Peak power consumption of the top supercomputers has steadily increased over the past 15 years. The Energy-Efficient Green Destiny As a first step toward reliable and available energy- attitude shift with respect to power and energy, partic- efficient supercomputing, in 2002 we built a low- ularly in light of how quickly supercomputers’ thermal power supercomputer at Los Alamos National power envelopes have increased in size, thus adversely Laboratory. Dubbed Green Destiny, the 240-processor impacting the systems’ power and cooling costs, relia- supercomputer took up 5 square feet (the size of a bility, and availability. standard computer rack) and had a 3.2-kilowatt power The laboratory’s Biosciences Division bought a budget (the equivalent of two hairdryers) when booted Green Destiny replica about six months after Green diskless.1, 2 Its 101-gigaflop Linpack rating (equivalent Destiny’s debut. In 2006, we donated Green Destiny to a 256-processor SGI Origin 2000 supercomputer or to the division so it could run a parallel bioinformatics a Cray T3D MC1024-8) would have placed it at no. code called mpiBLAST. Both clusters are run in the 393 of the 2002 TOP500 List. same environment, yet half of the nodes are inopera- Garnering widespread media attention, Green ble on the replica, which uses higher-powered proces- Destiny delivered reliable supercomputing with no sors. Hence, although the original Green Destiny was unscheduled downtime in its two-year lifetime. It 0.150 gigahertz slower in clock speed, its productivity endured sitting in a dusty warehouse at temperatures in answers per month was much better than the faster of 85-90 degrees Fahrenheit (29-32 degrees Celsius) but often inoperable replica. and an altitude of 7,400 feet (2,256 meters). Further- Green Destiny is no longer used for computing, and more, it did so without air-conditioning, humidification resides in the Computer History Museum in Mountain control, air filtration, or ventilation. View, California. Yet despite Green Destiny’s accomplishments, not everyone was convinced of its potential. Comments References ranged from Green Destiny being so lower power 1. W. Feng, “Making a Case for Efficient Supercomputing,” that it ran just as fast when it was unplugged to the ACM Queue, Oct. 2003, pp. 54-64. notion that no one in HPC would ever care about 2. W. Feng, “The Importance of Being Low Power in High- power and cooling. Performance Computing,” Cyberinfrastructure Technology However, in the past year, we’ve seen a dramatic Watch, Aug. 2005, pp. 12-21. December 2007 39 marks for servers running commercial pro- Table 1. Reliability and availability of large-scale computing systems. duction codes. The diverse types of evalu- ations that efforts like the Green500 and System Processors Reliability and availability SPECPower (www.spec.org/ specpower) ASC Q 8,192 Mean time between interrupts: 6.5 hours, provide will give users more choice in 114 unplanned outages/month determining efficiency metrics for their sys- Outage sources: storage, CPU, memory tems and applications. ASC White 8,192 Mean time between failures: 5 hours (2001) and 40 hours (2003) Measuring efficiency Outage sources: storage, CPU, third-party hardware In the Green500 effort, we treat both PSC Lemieux 3,016 Mean time between interrupts: 9.7 hours performance (speed) and power consump- Availability: 98.33 percent tion as first-class design constraints for Google 450,000 600 reboots/day; 2-3 percent replacement/year supercomputer deployments. (estimate) Outage sources: Storage and memory Speed and workload. The supercom- Availability: ~100 percent puting community already accepts the flops metric for the Linpack benchmark, which Source: D.A. Reed the TOP500 List uses. Although TOP500 principals acknowledge that Linpack isn’t Developing new metrics the be-all or end-all benchmark for high-performance Performance-only metrics are likely to remain valu- computing (HPC), it continues to prevail despite the able for comparing existing systems prior to acquisition emergence of other benchmarks. As other benchmark and helping drive system design. Nonetheless, we need suites gain acceptance, most notably the SPEChpc3 and new metrics that capture design differences in energy HPC Challenge benchmarks,4 we plan to extend our efficiency. For example, two hypothetical high-perfor- Green500 List methodology as mentioned. For now, mance machines could both achieve 100 teraflops run- since the HPC community seems to identify with the ning Linpack and secure a high equivalent ranking on notion of a clearly articulated and easily understood sin- the TOP500 List. But enable smart-power-management gle number that indicates a machine’s prowess, we opt hardware or software1,2 on one machine that can sus- to use floating-point operations per second (flops) as a tain performance and reduce energy consumption by 10 speed metric for supercomputer performance and the percent, and the TOP500 rankings remain the same. Linpack benchmark as a scalable workload. Unfortunately, metric development is fraught with EDn metric. There are many possibilities for perfor- technical and political challenges. On the technical side, mance-efficiency metrics, including circuit design’s EDn operators must perceive the metric and its associated metric—with E standing for the energy a system uses benchmarks as representative of the workloads typically while running a benchmark, D for the time to complete running on the production system. On the political side, that same benchmark,5-8 and n a weight for the delay metrics and benchmarks need strong community buy-in. term. However, the EDn metrics are biased when applied to supercomputers, particularly as n increases. For THE GREEN500 LIST example, with large values for n, the delay term domi- We’ve been working to improve awareness of energy- nates so that very small changes in execution time efficient supercomputer (and data-center) design since impact the metric dramatically and render changes in E the turn of the century.
Recommended publications
  • Year in Review 2 NEWSLINE January 7, 2011 2010: S&T Achievement and Building for the Future
    Published for Nthe employees of LawrenceEWSLINE Livermore National Laboratory January 7, 2011 Vol. 4, No. 1 Year in review 2 NEWSLINE January 7, 2011 2010: S&T achievement and building for the future hile delivering on its mission obligations with award-winning sci- ence and technology, the Laboratory also spent 2010 building for the future. W In an October all-hands address, Director George Miller said his top priori- ties are investing for the future in programmatic growth and the underpinning infrastructure, as well as recruiting and retaining top talent at the Lab. In Review “It’s an incredibly exciting situation we find ourselves in,” Miller said in an earlier talk about the Lab’s strategic outlook. “If you look at the set of issues facing the country, the Laboratory has experience in all of them.” Defining “national security” broadly, Miller said the Lab will continue to make vital contributions to stockpile stewardship, homeland security, nonprolif- eration, arms control, the environment, climate change and sustainable energy. “Energy, environment and climate change are national security issues,” he said. With an eye toward accelerating the development of technologies that benefit national security and industry, the Lab partnered with Sandia-Calif. to launch the Livermore Valley Open Campus (LVOC) on the Lab’s southeast side. Construction has begun on an R&D campus outside the fence that will allow for collaboration in a broad set of disciplines critical to the fulfillment DOE/NNSA missions and to strengthening U.S. industry’s economic competitiveness, includ- If you look at the set of issues facing ing high-performance computing, energy, cyber security and environment.
    [Show full text]
  • Performance Analysis on Energy Efficient High
    Performance Analysis on Energy Efficient High-Performance Architectures Roman Iakymchuk, François Trahay To cite this version: Roman Iakymchuk, François Trahay. Performance Analysis on Energy Efficient High-Performance Architectures. CC’13 : International Conference on Cluster Computing, Jun 2013, Lviv, Ukraine. hal-00865845 HAL Id: hal-00865845 https://hal.archives-ouvertes.fr/hal-00865845 Submitted on 25 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Performance Analysis on Energy Ecient High-Performance Architectures Roman Iakymchuk1 and François Trahay1 1Institut Mines-Télécom Télécom SudParis 9 Rue Charles Fourier 91000 Évry France {roman.iakymchuk, francois.trahay}@telecom-sudparis.eu Abstract. With the shift in high-performance computing (HPC) towards energy ecient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing perfor- mance analysis tools to these new systems. We present EZTrace a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on a mechanism for instrumenting func- tions, a lightweight tool for recording events, and a generic interface for writing traces. To support EZTrace on energy ecient HPC systems, we developed a CUDA module and ported EZTrace to ARM processors.
    [Show full text]
  • An Overview of the Blue Gene/L System Software Organization
    An Overview of the Blue Gene/L System Software Organization George Almasi´ , Ralph Bellofatto , Jose´ Brunheroto , Calin˘ Cas¸caval , Jose´ G. ¡ Castanos˜ , Luis Ceze , Paul Crumley , C. Christopher Erway , Joseph Gagliano , Derek Lieber , Xavier Martorell , Jose´ E. Moreira , Alda Sanomiya , and Karin ¡ Strauss ¢ IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598-0218 £ gheorghe,ralphbel,brunhe,cascaval,castanos,pgc,erway, jgaglia,lieber,xavim,jmoreira,sanomiya ¤ @us.ibm.com ¥ Department of Computer Science University of Illinois at Urbana-Champaign Urabana, IL 61801 £ luisceze,kstrauss ¤ @uiuc.edu Abstract. The Blue Gene/L supercomputer will use system-on-a-chip integra- tion and a highly scalable cellular architecture. With 65,536 compute nodes, Blue Gene/L represents a new level of complexity for parallel system software, with specific challenges in the areas of scalability, maintenance and usability. In this paper we present our vision of a software architecture that faces up to these challenges, and the simulation framework that we have used for our experiments. 1 Introduction In November 2001 IBM announced a partnership with Lawrence Livermore National Laboratory to build the Blue Gene/L (BG/L) supercomputer, a 65,536-node machine de- signed around embedded PowerPC processors. Through the use of system-on-a-chip in- tegration [10], coupled with a highly scalable cellular architecture, Blue Gene/L will de- liver 180 or 360 Teraflops of peak computing power, depending on the utilization mode. Blue Gene/L represents a new level of scalability for parallel systems. Whereas existing large scale systems range in size from hundreds (ASCI White [2], Earth Simulator [4]) to a few thousands (Cplant [3], ASCI Red [1]) of compute nodes, Blue Gene/L makes a jump of almost two orders of magnitude.
    [Show full text]
  • Examining the Viability of FPGA Supercomputing
    1 Examining the Viability of FPGA Supercomputing Stephen D. Craven and Peter Athanas Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Blacksburg, VA 24061 USA email: {scraven,athanas}@vt.edu Abstract—For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) produces significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters. This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating- point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general High-Performance Computing (HPC). Index Terms— computational accelerator, digital arithmetic, Field programmable gate arrays, high- performance computing, supercomputers. I. INTRODUCTION Supercomputers have experienced a recent resurgence, fueled by government research dollars and the development of low-cost supercomputing clusters. Unlike the Massively Parallel Processor (MPP) designs found in Cray and CDC machines of the 70s and 80s, featuring proprietary processor architectures, many modern supercomputing clusters are constructed from commodity PC processors, significantly reducing procurement costs. In an effort to improve performance, several companies offer machines that place one or more FPGAs in each node of the cluster. Configurable logic devices, of which FPGAs are one example, permit the device’s hardware to be programmed multiple times after manufacture. A wide body of research over two decades has repeatedly demonstrated significant performance improvements for certain classes of applications when implemented within an FPGA’s configurable logic [1]. Applications well suited to speed-up by FPGAs typically exhibit massive parallelism and small integer or fixed-point data types.
    [Show full text]
  • Advances in Ultrashort-Pulse Lasers • Modeling Dispersions of Biological and Chemical Agents • Centennial of E
    October 2001 U.S. Department of Energy’s Lawrence Livermore National Laboratory Also in this issue: • More Advances in Ultrashort-Pulse Lasers • Modeling Dispersions of Biological and Chemical Agents • Centennial of E. O. Lawrence’s Birth About the Cover Computing systems leader Greg Tomaschke works at the console of the 680-gigaops Compaq TeraCluster2000 parallel supercomputer, one of the principal machines used to address large-scale scientific simulations at Livermore. The supercomputer is accessible to unclassified program researchers throughout the Laboratory, thanks to the Multiprogrammatic and Institutional Computing (M&IC) Initiative described in the article beginning on p. 4. M&IC makes supercomputers an institutional resource and helps scientists realize the potential of advanced, three-dimensional simulations. Cover design: Amy Henke About the Review Lawrence Livermore National Laboratory is operated by the University of California for the Department of Energy’s National Nuclear Security Administration. At Livermore, we focus science and technology on assuring our nation’s security. We also apply that expertise to solve other important national problems in energy, bioscience, and the environment. Science & Technology Review is published 10 times a year to communicate, to a broad audience, the Laboratory’s scientific and technological accomplishments in fulfilling its primary missions. The publication’s goal is to help readers understand these accomplishments and appreciate their value to the individual citizen, the nation, and the world. Please address any correspondence (including name and address changes) to S&TR, Mail Stop L-664, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, California 94551, or telephone (925) 423-3432. Our e-mail address is [email protected].
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • 2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe
    sandia national laboratories 2017 HIGH PERformance computing The 2017 High Performance Computing Annual Report is dedicated to John Noe and Dino Pavlakos. Building a foundational framework Editor in high performance computing Yasmin Dennig Contributing Writers Megan Davidson Sandia National Laboratories has a long history of significant contributions to the high performance computing Mattie Hensley community and industry. Our innovative computer architectures allowed the United States to become the first to break the teraflop barrier—propelling us to the international spotlight. Our advanced simulation and modeling capabilities have been integral in high consequence US operations such as Operation Burnt Frost. Strong partnerships with industry leaders, such as Cray, Inc. and Goodyear, have enabled them to leverage our high performance computing capabilities to gain a tremendous competitive edge in the marketplace. Contributing Editor Laura Sowko As part of our continuing commitment to provide modern computing infrastructure and systems in support of Sandia’s missions, we made a major investment in expanding Building 725 to serve as the new home of high performance computer (HPC) systems at Sandia. Work is expected to be completed in 2018 and will result in a modern facility of approximately 15,000 square feet of computer center space. The facility will be ready to house the newest National Nuclear Security Administration/Advanced Simulation and Computing (NNSA/ASC) prototype Design platform being acquired by Sandia, with delivery in late 2019 or early 2020. This new system will enable continuing Stacey Long advances by Sandia science and engineering staff in the areas of operating system R&D, operation cost effectiveness (power and innovative cooling technologies), user environment, and application code performance.
    [Show full text]
  • Performance of a Computer (Chapter 4) Vishwani D
    ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Performance of a Computer (Chapter 4) Vishwani D. Agrawal & Victor P. Nelson epartment of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 1 What is Performance? Response time: the time between the start and completion of a task. Throughput: the total amount of work done in a given time. Some performance measures: MIPS (million instructions per second). MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (1012), etc. SPEC (System Performance Evaluation Corporation) benchmarks. LINPACK benchmarks, floating point computing, used for supercomputers. Synthetic benchmarks. ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 2 Small and Large Numbers Small Large 10-3 milli m 103 kilo k 10-6 micro μ 106 mega M 10-9 nano n 109 giga G 10-12 pico p 1012 tera T 10-15 femto f 1015 peta P 10-18 atto 1018 exa 10-21 zepto 1021 zetta 10-24 yocto 1024 yotta ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 3 Computer Memory Size Number bits bytes 210 1,024 K Kb KB 220 1,048,576 M Mb MB 230 1,073,741,824 G Gb GB 240 1,099,511,627,776 T Tb TB ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 4 Units for Measuring Performance Time in seconds (s), microseconds (μs), nanoseconds (ns), or picoseconds (ps). Clock cycle Period of the hardware clock Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate) CPU time = (CPU clock cycles)/(clock rate) Cycles per instruction (CPI): average number of clock cycles used to execute a computer instruction.
    [Show full text]
  • FY 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory (Rev
    Description of document: FY 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory (Rev. 1 June 15, 2006) Requested date: 26-January-2007 Released date: 11-September-2007 Posted date: 15-October-2007 Title of Document Fiscal Year 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory Date/date range of document: FY 2005 Source of document: Department of Energy National Nuclear Security Administration Service Center P.O. Box 5400 Albuquerque, NM 87185 Freedom of Information Act U.S. Department of Energy 1000 Independence Ave., S.W. Washington, DC 20585 (202) 586-5955 [email protected] http://management.energy.gov/foia_pa.htm The governmentattic.org web site (“the site”) is noncommercial and free to the public. The site and materials made available on the site, such as this file, are for reference only. The governmentattic.org web site and its principals have made every effort to make this information as complete and as accurate as possible, however, there may be mistakes and omissions, both typographical and in content. The governmentattic.org web site and its principals shall have neither liability nor responsibility to any person or entity with respect to any loss or damage caused, or alleged to have been caused, directly or indirectly, by the information provided on the governmentattic.org web site or in this file. Department of Energy National Nuclear Security Administration Service Center P. O. Box 5400 Albuquerque, NM 87185 SEP 11 200t CERTIFIED MAIL - RESTRICTED DELIVERY - RETURN RECEIPT REQUESTED This is in final response to your Freedom oflnformation Act (FOIA) request dated January 26, 2007, for "a copy ofthe most recent two annualperformance reviews for Pantex Site, Kansas City Site, Sandia Site, Los Alamos Site, Y-12 Site and Livermore Site." I contacted the Site Offices who have oversight responsibility for the records you requested, and they are enclosed.
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]
  • Measuring Power Consumption on IBM Blue Gene/P
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ).
    [Show full text]
  • Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance
    Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance Wilson Lisan August 24, 2018 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2018 Abstract This dissertation project is based on participation in the Student Cluster Competition (SCC) at the International Supercomputing Conference (ISC) 2018 in Frankfurt, Germany as part of a four-member Team EPCC from The University of Edinburgh. There are two main projects which are the team-based project and a personal project. The team-based project focuses on the optimisations and tweaks of the HPL, HPCG, and HPCC benchmarks to meet the competition requirements. At the competition, Team EPCC suffered with hardware issues that shaped the cluster into an asymmetrical system with mixed hardware. Unthinkable and extreme methods were carried out to tune the performance and successfully drove the cluster back to its ideal performance. The personal project focuses on testing the SCC benchmarks to evaluate the performance efficiency and system balance at several HPC systems. HPCG fraction of peak over HPL ratio was used to determine the system performance efficiency from its peak and actual performance. It was analysed through HPCC benchmark that the fraction of peak ratio could determine the memory and network balance over the processor or GPU raw performance as well as the possibility of the memory or network bottleneck part. Contents Chapter 1 Introduction ..............................................................................................
    [Show full text]