Tibidabo$: Making the Case for an ARM-Based HPC System

Total Page:16

File Type:pdf, Size:1020Kb

Tibidabo$: Making the Case for an ARM-Based HPC System TibidaboI: Making the Case for an ARM-Based HPC System Nikola Rajovica,b,∗, Alejandro Ricoa,b, Nikola Puzovica, Chris Adeniyi-Jonesc, Alex Ramireza,b aComputer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain bDepartment d'Arquitectura de Computadors, Universitat Polit`ecnica de Catalunya - BarcelonaTech, Barcelona, Spain cARM Ltd., Cambridge, United Kingdom Abstract It is widely accepted that future HPC systems will be limited by their power consumption. Current HPC systems are built from commodity server pro- cessors, designed over years to achieve maximum performance, with energy efficiency being an after-thought. In this paper we advocate a different ap- proach: building HPC systems from low-power embedded and mobile tech- nology parts, over time designed for maximum energy efficiency, which now show promise for competitive performance. We introduce the architecture of Tibidabo, the first large-scale HPC clus- ter built from ARM multicore chips, and a detailed performance and energy efficiency evaluation. We present the lessons learned for the design and im- provement in energy efficiency of future HPC systems based on such low- power cores. Based on our experience with the prototype, we perform simu- lations to show that a theoretical cluster of 16-core ARM Cortex-A15 chips would increase the energy efficiency of our cluster by 8.7x, reaching an energy efficiency of 1046 MFLOPS/W. Keywords: high-performance computing, embedded processors, mobile ITibidabo is a mountain overlooking Barcelona ∗Corresponding author Email addresses: [email protected] (Nikola Rajovic), [email protected] (Alejandro Rico), [email protected] (Nikola Puzovic), [email protected] (Chris Adeniyi-Jones), [email protected] (Alex Ramirez) Preprint of the article accepted for publication in Future Generation Computer Systems, Elsevier processors, low power, cortex-a9, cortex-a15, energy efficiency 1. Introduction In High Performance Computing (HPC), there is a continued need for higher computational performance. Scientific grand challenges e.g., engineer- ing, geophysics, bioinformatics, and other types of compute-intensive appli- cations require increasing amounts of compute power. On the other hand, energy is increasingly becoming one of the most expensive resources and it substantially contributes to the total cost of running a large supercomputing facility. In some cases, the total energy cost over a few years of operation can exceed the cost of the hardware infrastructure acquisition [1, 2, 3]. This trend is not only limited to HPC systems, it also holds true for data centres in general. Energy efficiency is already a primary concern for the design of any computer system and it is unanimously recognized that reach- ing the next milestone in supercomputers' performance, e.g. one EFLOPS (exaFLOPS - 1018 floating-point operations per second), will be strongly constrained by power. The energy efficiency of a system will define the max- imum achievable performance. In this paper, we take a first step towards HPC systems developed from low-power solutions used in embedded and mobile devices. However, using CPUs from this domain is a challenge: these devices are neither crafted to exploit high ILP nor for high memory bandwidth. Most embedded CPUs lack a vector floating-point unit and their software ecosystem is not tuned for HPC. What makes them particularly interesting is the size and power characteristics which allow for higher packaging density and lower cost. In the following three subsections we further motivate our proposal from several important aspects. 1.1. Road to Exascale To illustrate our point about the need for low-power processors, let us reverse engineer a theoretical Exaflop supercomputer that has a power bud- get of 20 MW [4]. We will build our system using cores of 16 GFLOPS (8 ops/cycle @ 2 GHz), assuming that single-thread performance will not improve much beyond the performance that we observe today. An Exaflop machine will require 62.5 million of such cores, independently of how they are packaged together (multicore density, sockets per node). We also assume that 2 only 30-40% of the total power will be actually spent on the cores, the rest going to power supply overhead, interconnect, storage, and memory. That leads to a power budget of 6 MW to 8 MW for 62.5 million cores, which is 0.10 W to 0.13 W per core. Current high performance processors integrating this type of cores require tens of watts at 2 GHz. However, ARM proces- sors, designed for the embedded mobile market, consume less than 0.9 W at that frequency [5], and thus are worth exploring|even though they do not yet provide a sufficient level of performance, they have a promising roadmap ahead. 1.2. ARM Processors There is already a significant trend towards using ARM processors in data servers and cloud computing environments [6, 7, 8, 9, 10]. Those workloads are constrained by I/O and memory subsystems, not by CPU performance. Recently, ARM processors are also taking significant steps towards increased double-precision floating-point performance, making them competitive with state-of-the-art server performance. Previous generations of ARM application processors did not feature a floating-point unit capable of supporting the throughputs and latencies re- quired for HPC1. The ARM Cortex-A9 has an optional VFPv3 floating-point unit [11] and/or a NEON single-instruction multiple-data (SIMD) floating- point unit [12]. The VFPv3 unit is pipelined and is capable of executing one double-precision ADD operation per cycle, or one MUL/FMA (Fused Multiply Accumulate) every two cycles. The NEON unit is a SIMD unit and supports only integers and single-precision floating-point operands thus making itself unattractive for HPC. Then, with one double-precision floating- point arithmetic instruction per cycle (VFPv3), a 1 GHz Cortex-A9 provides a peak of 1 GFLOPS. The more recent ARM Cortex-A15 [13] processor has a fully-pipelined double-precision floating-point unit, delivering 2 GFLOPS at 1 GHz (one FMA every cycle). The new ARMv8 instruction set, which is being implemented in next-generation ARM cores, namely the Cortex-A50 Series [14], features a 64-bit address space, and adds double-precision to the NEON SIMD ISA, allowing for 4 operations per cycle per unit leading to 4 GFLOPS at 1 GHz. 1Cortex-A8 is the processor generation prior to Cortex-A9, which has a non-pipelined floating-point unit. In the best case it can deliver one floating-point ADD every ∼10 cycles; MUL and MAC have smaller throughputs. 3 1.3. Bell's Law Our approach for an HPC system is novel because we argue for the use of mobile cores. We consider the improvements expected in mobile SoCs in the near future that would make them real candidates for HPC. As Bell's law states [15], a new computer class is usually based on lower cost components, which continue to evolve at a roughly constant price but with increasing per- formance from Moore's law. This trend holds today: the class of computing systems on the rise today in HPC is those systems with large numbers of closely-coupled small cores (BlueGene/Q and Xeon Phi systems). From the architectural point of view, our proposal fits into this computing class and it has the potential for performance growth given the size and evolution of the mobile market. 1.4. Contributions In this paper, we present Tibidabo, an experimental HPC cluster that we built using NVIDIA Tegra2 chips, each featuring a performance-optimized dual-core ARM Cortex-A9 processor. We use the PCIe support in Tegra2 to connect a 1 GbE NIC, and build a tree interconnect with 48-port 1 GbE switches. We do not intend our first prototype to achieve an energy efficiency com- petitive with today's leaders. The purpose of this prototype is to be a proof of concept to demonstrate that building such energy-efficient clusters with mobile processors is possible, and to learn from the experience. On the soft- ware side, the goal is to deploy an HPC-ready software stack for ARM-based systems, and to serve as an early application development and tuning vehicle. Detailed analysis of performance and power distribution points to a ma- jor problem when building HPC systems from low-power parts: the system integration glue takes more power than the microprocessor cores themselves. The main building block of our cluster, the Q7 board, is designed having embedded and mobile software development in mind, and is not particularly optimized for energy-efficient operation. Nevertheless, the energy efficiency of our cluster is 120 MFLOPS/W, still competitive with Intel Xeon X5660 and AMD Opteron 6128 based clusters,2 but much lower than what could be anticipated from the performance and power figures of the Cortex-A9 processor. 2In the November 2012 edition of Green500 list these systems are ranked as 395th and 396th respectively. 4 We use our performance analysis to model and simulate a potential HPC cluster built from ARM Cortex-A9 and Cortex-A15 chips with higher multi- core density (number of cores per chip) and higher bandwidth interconnects, and conclude that such a system would deliver competitive energy efficiency. The work presented here, and the lessons that we learned are a first step towards such a system that will be built with the next generation of ARM cores implementing the ARMv8 architecture. The contributions of this paper are: • The design of the first HPC ARM-based cluster architecture, with a complete performance evaluation, energy efficiency evaluation, and comparison with state-of-the-art high-performance architectures. • A power distribution estimation of our ARM cluster. • Model-based performance and energy-efficiency projections of a theo- retical HPC cluster with a higher multicore density and higher-performance ARM cores.
Recommended publications
  • Performance Analysis on Energy Efficient High
    Performance Analysis on Energy Efficient High-Performance Architectures Roman Iakymchuk, François Trahay To cite this version: Roman Iakymchuk, François Trahay. Performance Analysis on Energy Efficient High-Performance Architectures. CC’13 : International Conference on Cluster Computing, Jun 2013, Lviv, Ukraine. hal-00865845 HAL Id: hal-00865845 https://hal.archives-ouvertes.fr/hal-00865845 Submitted on 25 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Performance Analysis on Energy Ecient High-Performance Architectures Roman Iakymchuk1 and François Trahay1 1Institut Mines-Télécom Télécom SudParis 9 Rue Charles Fourier 91000 Évry France {roman.iakymchuk, francois.trahay}@telecom-sudparis.eu Abstract. With the shift in high-performance computing (HPC) towards energy ecient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing perfor- mance analysis tools to these new systems. We present EZTrace a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on a mechanism for instrumenting func- tions, a lightweight tool for recording events, and a generic interface for writing traces. To support EZTrace on energy ecient HPC systems, we developed a CUDA module and ported EZTrace to ARM processors.
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]
  • Measuring Power Consumption on IBM Blue Gene/P
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ).
    [Show full text]
  • Compute System Metrics (Bof)
    Setting Trends for Energy-Efficient Supercomputing Natalie Bates, EE HPC Working Group Tahir Cader, HP & The Green Grid Wu Feng, Virginia Tech & Green500 John Shalf, Berkeley Lab & NERSC Horst Simon, Berkeley Lab & TOP500 Erich Strohmaier, Berkeley Lab & TOP500 ISC BoF; May 31, 2010; Hamburg, Germany Why We Are Here • “Can only improve what you can measure” • Context – Power consumption of HPC and facilities cost are increasing • What is needed? – Converge on a common basis between different research and industry groups for: •metrics • methodologies • workloads for energy-efficient supercomputing, so we can make progress towards solutions. Current Technology Roadmaps will Depart from Historical Gains Power is the Leading Design Constraint From Peter Kogge, DARPA Exascale Study … and the power costs will still be staggering From Peter Kogge, DARPA Exascale Study $1M per megawatt per year! (with CHEAP power) Absolute Power Levels Power Consumption Power Efficiency What We Have Done • Stages of Green Supercomputing – Denial – Awareness – Hype – Substance The Denial Phase (2001 – 2004) • Green Destiny – A 240-Node Supercomputer in 5 Sq. Ft. – LINPACK Performance: 101 Gflops – Power Consumption: 3.2 kW • Prevailing Views embedded processor – “Green Destiny is so low power that it runs just as fast when it is unplugged.” – “In HPC, no one cares about power & cooling, and no one ever will …” – “Moore’s Law for Power will stimulate the economy by creating a new market in cooling technologies.” The Awareness Phase (2004 – 2008) • Green Movements & Studies – IEEE Int’l Parallel & Distributed Processing Symp. (2005) • Workshop on High-Performance, Power-Aware Computing (HPPAC) Green500 • Metrics: Energy-Delay Product and FLOPS/Watt FLOPS/watt – Green Grid (2007) • Industry-driven consortium of all the top system vendors • Metric: Power Usage Efficiency (PUE) – Kogge et al., “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA ITO, AFRL, 2008.
    [Show full text]
  • 2010 IBM and the Environment Report
    2010 IBM and the Environment Report Committed to environmental leadership across all of IBM's business activities IBM and the Environment - 2010 Annual Report IBM AND THE ENVIRONMENT IBM has long maintained an unwavering commitment to environmental protection, which was formalized by a corporate environmental policy in 1971. The policy calls for IBM to be an environmental leader across all of our business activities, from our research, operations and products to the services and solutions we provide our clients to help them be more protective of the environment. This section of IBM’s Corporate Responsibility Report describes IBM’s programs and performance in the following environmental areas: A Commitment to Environmental 3 Energy and Climate Programs 36 Leadership A Five-Part Strategy 36 Conserving Energy 37 Global Governance and Management 4 CO2 Emissions Reduction 42 System PFC Emissions Reduction 43 Global Environmental Management 4 Renewable Energy 45 System Voluntary Climate Partnerships 46 Stakeholder Engagement 7 Transportation and Logistics 47 Voluntary Partnerships and Initiatives 8 Initiatives Environmental Investment and Return 9 Energy and Climate Protection in 48 the Supply Chain Process Stewardship 12 Environmentally Preferable Substances 12 Remediation 52 and Materials Nanotechnology 15 Audits and Compliance 53 Accidental Releases 53 Pollution Prevention 17 Fines and Penalties 54 Hazardous Waste 17 Nonhazardous Waste 19 Awards and Recognition 55 Chemical Use and Management 20 Internal Recognition 55 External Recognition
    [Show full text]
  • 101117-Green Computing and Gpus
    Wu Feng, Ph.D. Department of Computer Science Department of Electrical & Computer Engineering NSF Center for High-Performance Reconfigurable CompuAng © W. Feng, SC 2010 We have spent decades focusing on performance, performance, performance (and price/performance). © W. Feng, SC 2010 A ranking of the fastest 500 supercomputers in the world • Benchmark – LINPACK: Solves a (random) dense system of linear equaons in double-precision (64 bits) arithmeAc. • Evaluaon Metric – Performance (i.e., Speed) • Floang-Operaons Per Second (FLOPS) • Web Site – h\p://www.top500.org/ © W. Feng, SC 2010 • Metrics for Evaluang Supercomputers – Performance (i.e., Speed) • Metric: Floang-Operaons Per Second (FLOPS) – Price/Performance Cost Efficiency • Metric: AcquisiAon Cost / FLOPS • Performance & price/performance are important metrics, but … © W. Feng, September 2010 • Electrical power costs $$$$. Source: IDC, 2006 © W. Feng, SC 2010 Examples: Power, Cooling, and Infrastructure $$$ • Japanese Earth Simulator – Power & Cooling: 12 MW $10M/year © W. Feng, SC 2010 • Too much power affects efficiency, reliability, and availability. – Anecdotal Evidence from a “Machine Room” in 2001 - 2002 • Winter: “Machine Room” Temperature of 70-75° F – Failure approximately once per week. • Summer: “Machine Room” Temperature of 85-90° F – Failure approximately twice per week. – Arrenhius’ Equaon (applied to microelectronics) • For every 10° C (18° F) increase in temperature, … the failure rate of a system doubles.* * W. Feng, M. Warren, and E. Weigle, “The Bladed Beowulf: A Cost-EffecAve Alternave to TradiAonal Beowulfs,” IEEE Cluster, Sept. 2002. © W. Feng, SC 2010 • Debuted at SC 2007 • Goal: Raise Awareness in the Energy Efficiency of Supercompung – Drive energy efficiency as a first-order design constraint (on par with performance).
    [Show full text]
  • Supermicro “Industry-Standard Green HPC Systems”
    Confidential Industry-Standard Green HPC Systems HPC Advisory Council Brazil Conference 2014 May 26, 2014 University of São Paulo Attila A. Nagy Senior IT Consultant © Supermicro 2014 Confidential HPC systems: What the industry is doing Architecture Processor 0.4 15.4 8.0 0.6 Xeon Cluster 8.6 Opteron MPP Power 84.6 82.4 Sparc Other system share (%) system share (%) Source: The Top500 list of November 2013. http://www.top500.org Confidential HPC systems: What the industry is doing Interconnect Operating System 2.2 1.4 4.0 2.2 Infiniband Linux 10.0 GbE 15.4 41.4 Unix 10GbE Other 96.4 Custom 27.0 Cray system share (%) Other system share (%) Source: The Top500 list of November 2013. http://www.top500.org Confidential Industry-standard HPC clusters Standard x86 servers throughout . Compute nodes . Head/Management/Control nodes . Storage nodes Infiniband and/or Ethernet networks . Main interconnect . Cluster management and administration . Out-of-band management Linux OS environment . Comprehensive software stack for HPC . Large availability of HPC software tools . Large collaboration community Confidential Typical HPC Cluster IB Fabric Head & Management Nodes Campus Ethernet Fabric x86 servers/Linux Network Infiniband The Beowulf cluster concept Ethernet OOB Mgmt Network is as solidNetwork as ever!! Network Storage Nodes Compute Nodes x86 servers x86 servers Linux Linux Parallel FS Confidential Accelerator/Coprocessor usage All top ten systems in the latest Green500 list are coprocessor Accelerator/CP based* . Two petaflop systems N/A Up to 5x improvements in: 89.4 Nvidia . Power consumption . Physical space Xeon Phi 7.4 . Cost Other 0.8 2.4 GPUs/MICs: 80% of HPC users system share (%) at least testing them Source: The Top500 list of November 2013.
    [Show full text]
  • NVIDIA Powers the World's Top 13 Most Energy Efficient Supercomputers
    NVIDIA Powers the World's Top 13 Most Energy Efficient Supercomputers ISC -- Advancing the path to exascale computing, NVIDIA today announced that the NVIDIA® Tesla® AI supercomputing platform powers the top 13 measured systems on the new Green500 list of the world's most energy-efficient high performance computing (HPC) systems. All 13 use NVIDIA Tesla P100 data center GPU accelerators, including four systems based on the NVIDIA DGX-1™ AI supercomputer. • As Moore's Law slows, NVIDIA Tesla GPUs continue to extend computing, improving performance 3X in two years • Tesla V100 GPUs projected to provide U.S. Energy Department's Summit supercomputer with 200 petaflops of HPC, 3 exaflops of AI performance • Major cloud providers commit to bring NVIDIA Volta GPU platform to market Advancing the path to exascale computing, NVIDIA today announced that the NVIDIA® Tesla® AI supercomputing platform powers the top 13 measured systems on the new Green500 list of the world's most energy-efficient high performance computing (HPC) systems. All 13 use NVIDIA Tesla P100 data center GPU accelerators, including four systems based on the NVIDIA DGX-1™ AI supercomputer. NVIDIA today also released performance data illustrating that NVIDIA Tesla GPUs have improved performance for HPC applications by 3X over the Kepler architecture released two years ago. This significantly boosts performance beyond what would have been predicted by Moore's Law, even before it began slowing in recent years. Additionally, NVIDIA announced that its Tesla V100 GPU accelerators -- which combine AI and traditional HPC applications on a single platform -- are projected to provide the U.S.
    [Show full text]
  • TOP500 Supercomputer Sites
    7/24/2018 News | TOP500 Supercomputer Sites HOME | SEARCH | REGISTER RSS | MY ACCOUNT | EMBED RSS | SUPER RSS | Contact Us | News | TOP500 Supercomputer Sites http://top500.org/blog/category/feature-article/feeds/rss Are you the publisher? Claim or contact Browsing the Latest Browse All Articles (217 Live us about this channel Snapshot Articles) Browser Embed this Channel Description: content in your HTML TOP500 News Search Report adult content: 04/27/18--03:14: UK Commits a 0 0 Billion Pounds to AI Development click to rate The British government and the private sector are investing close to £1 billion Account: (login) pounds to boost the country’s artificial intelligence sector. The investment, which was announced on Thursday, is part of a wide-ranging strategy to make the UK a global leader in AI and big data. More Channels Under the investment, known as the “AI Sector Deal,” government, industry, and academia will contribute £603 million in new funding, adding to the £342 million already allocated in existing budgets. That brings the grand total to Showcase £945 million, or about $1.3 billion at the current exchange rate. The UK RSS Channel Showcase 1586818 government is also looking to increase R&D spending across all disciplines by 2.4 percent, while also raising the R&D tax credit from 11 to 12 percent. This is RSS Channel Showcase 2022206 part of a broader commitment to raise government spending in this area from RSS Channel Showcase 8083573 around £9.5 billion in 2016 to £12.5 billion in 2021. RSS Channel Showcase 1992889 The UK government policy paper that describes the sector deal meanders quite a bit, describing a lot of programs and initiatives that intersect with the AI investments, but are otherwise free-standing.
    [Show full text]
  • Industry Insights | HPC and the Future of Seismic
    INDUSTRY INSIGHTS August 20 By Andrew Long ([email protected]) 1 of 4 HPC and the Future of Seismic I briefly profile the world’s largest commercial computer installations, and consider their relevance for the future of high-end seismic imaging and AI pursuits by the oil and gas industry. Introduction When I began my career in seismic geophysics over 20 years ago we were regularly told that seismic processing and imaging was one of the largest users of computing, and indeed, it took developments in computing capability to mature much further before applications such as Full Waveform Inversion (FWI) and Least-Squares Migration (LSM) became commercially commonplace in the last 5-10 years. Although the oil and gas industry still has several representatives in the top-100 ranked supercomputers (discussed below), applications now more commonly include modeling and simulations essential for nuclear security, weather and climate forecasting, computational fluid dynamics, financial forecasting and risk management, drug discovery and biochemical modeling, clean energy development, exploring the fundamental laws of the universe, and so on. The dramatic growth in High Performance Computing (HPC) services in the cloud (HPCaaS: HPC-as-a-Service) in recent years has in principle made HPC accessible ‘on demand’ to anyone with an appropriate budget, and key selling points are generally the scalable and flexible capabilities, combined with a variety of vendor-specific Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastucture-as-a-Service (IaaS) offerings. This new vocabulary can be confronting, and due to the complexity of cloud-based solutions, I only focus here on the standalone supercomputer infrastructures.
    [Show full text]
  • High Performance Computing (HPC)
    Vol. 20 | No. 2 | 2013 Globe at a Glance | Pointers | Spinouts Editor’s column Managing Editor “End of the road for Roadrunner” Los Alamos important. According to the developers of the National Laboratory, March 29, 2013.a Graph500 benchmarks, these data-intensive applications are “ill-suited for platforms designed for Five years after becoming the fastest 3D physics simulations,” the very purpose for which supercomputer in the world, Roadrunner was Roadrunner was designed. New supercomputer decommissioned by the Los Alamos National Lab architectures and software systems must be on March 31, 2013. It was the rst supercomputer designed to support such applications. to reach the petaop barrier—one million billion calculations per second. In addition, Roadrunner’s These questions of power eciency and unique design combined two dierent kinds changing computational models are at the core of processors, making it the rst “hybrid” of moving supercomputers toward exascale supercomputer. And it still held the number 22 spot computing, which industry experts estimate will on the TOP500 list when it was turned o. occur sometime between 2020 and 2030. They are also the questions that are addressed in this issue of Essentially, Roadrunner became too power The Next Wave (TNW). inecient for Los Alamos to keep running. As of November 2012, Roadrunner required 2,345 Look for articles on emerging technologies in kilowatts to hit 1.042 petaops or 444 megaops supercomputing centers and the development per watt. In contrast, Oak Ridge National of new supercomputer architectures, as well as a Laboratory’s Titan, which was number one on the brief introduction to quantum computing.
    [Show full text]
  • Overview of the K Computer System
    Overview of the K computer System Hiroyuki Miyazaki Yoshihiro Kusano Naoki Shinjou Fumiyoshi Shoji Mitsuo Yokokawa Tadashi Watanabe RIKEN and Fujitsu have been working together to develop the K computer, with the aim of beginning shared use by the fall of 2012, as a part of the High-Performance Computing Infrastructure (HPCI) initiative led by Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT). Since the K computer involves over 80 000 compute nodes, building it with lower power consumption and high reliability was important from the availability point of view. This paper describes the K computer system and the measures taken for reducing power consumption and achieving high reliability and high availability. It also presents the results of implementing those measures. 1. Introduction Technology (MEXT). As the name “Kei” in Fujitsu has been actively developing and Japanese implies, one objective of this project providing advanced supercomputers for over was to achieve a computing performance of 30 years since its development of the FACOM 1016 floating-point operations per second (10 230-75 APU—Japan’s first supercomputer—in PFLOPS). The K computer, moreover, was 1977 (Figure 1). As part of this effort, it has developed not just to achieve peak performance been developing its own hardware including in benchmark tests but also to ensure high original processors and software too and building effective performance in applications used in up its technical expertise in supercomputers actual research. Furthermore, to enable the along the way. entire system to be installed and operated at The sum total of this technical expertise one location, it was necessary to reduce power has been applied to developing a massively consumption and provide a level of reliability parallel computer system—the K computer1), note)i that could ensure the total operation of a large- —which has been ranked as the top performing scale system.
    [Show full text]