The European Factor
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Performance Analysis on Energy Efficient High
Performance Analysis on Energy Efficient High-Performance Architectures Roman Iakymchuk, François Trahay To cite this version: Roman Iakymchuk, François Trahay. Performance Analysis on Energy Efficient High-Performance Architectures. CC’13 : International Conference on Cluster Computing, Jun 2013, Lviv, Ukraine. hal-00865845 HAL Id: hal-00865845 https://hal.archives-ouvertes.fr/hal-00865845 Submitted on 25 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Performance Analysis on Energy Ecient High-Performance Architectures Roman Iakymchuk1 and François Trahay1 1Institut Mines-Télécom Télécom SudParis 9 Rue Charles Fourier 91000 Évry France {roman.iakymchuk, francois.trahay}@telecom-sudparis.eu Abstract. With the shift in high-performance computing (HPC) towards energy ecient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing perfor- mance analysis tools to these new systems. We present EZTrace a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on a mechanism for instrumenting func- tions, a lightweight tool for recording events, and a generic interface for writing traces. To support EZTrace on energy ecient HPC systems, we developed a CUDA module and ported EZTrace to ARM processors. -
Examining the Viability of FPGA Supercomputing
1 Examining the Viability of FPGA Supercomputing Stephen D. Craven and Peter Athanas Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Blacksburg, VA 24061 USA email: {scraven,athanas}@vt.edu Abstract—For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) produces significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters. This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating- point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general High-Performance Computing (HPC). Index Terms— computational accelerator, digital arithmetic, Field programmable gate arrays, high- performance computing, supercomputers. I. INTRODUCTION Supercomputers have experienced a recent resurgence, fueled by government research dollars and the development of low-cost supercomputing clusters. Unlike the Massively Parallel Processor (MPP) designs found in Cray and CDC machines of the 70s and 80s, featuring proprietary processor architectures, many modern supercomputing clusters are constructed from commodity PC processors, significantly reducing procurement costs. In an effort to improve performance, several companies offer machines that place one or more FPGAs in each node of the cluster. Configurable logic devices, of which FPGAs are one example, permit the device’s hardware to be programmed multiple times after manufacture. A wide body of research over two decades has repeatedly demonstrated significant performance improvements for certain classes of applications when implemented within an FPGA’s configurable logic [1]. Applications well suited to speed-up by FPGAs typically exhibit massive parallelism and small integer or fixed-point data types. -
Overview of the SPEC Benchmarks
9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes. -
Performance of a Computer (Chapter 4) Vishwani D
ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Performance of a Computer (Chapter 4) Vishwani D. Agrawal & Victor P. Nelson epartment of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 1 What is Performance? Response time: the time between the start and completion of a task. Throughput: the total amount of work done in a given time. Some performance measures: MIPS (million instructions per second). MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (1012), etc. SPEC (System Performance Evaluation Corporation) benchmarks. LINPACK benchmarks, floating point computing, used for supercomputers. Synthetic benchmarks. ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 2 Small and Large Numbers Small Large 10-3 milli m 103 kilo k 10-6 micro μ 106 mega M 10-9 nano n 109 giga G 10-12 pico p 1012 tera T 10-15 femto f 1015 peta P 10-18 atto 1018 exa 10-21 zepto 1021 zetta 10-24 yocto 1024 yotta ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 3 Computer Memory Size Number bits bytes 210 1,024 K Kb KB 220 1,048,576 M Mb MB 230 1,073,741,824 G Gb GB 240 1,099,511,627,776 T Tb TB ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 4 Units for Measuring Performance Time in seconds (s), microseconds (μs), nanoseconds (ns), or picoseconds (ps). Clock cycle Period of the hardware clock Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate) CPU time = (CPU clock cycles)/(clock rate) Cycles per instruction (CPI): average number of clock cycles used to execute a computer instruction. -
Supercomputer Fugaku
Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst. -
Measuring Power Consumption on IBM Blue Gene/P
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Comput Sci Res Dev DOI 10.1007/s00450-011-0192-y SPECIAL ISSUE PAPER Measuring power consumption on IBM Blue Gene/P Michael Hennecke · Wolfgang Frings · Willi Homberg · Anke Zitz · Michael Knobloch · Hans Böttiger © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Energy efficiency is a key design principle of the Top10 supercomputers on the November 2010 Top500 list IBM Blue Gene series of supercomputers, and Blue Gene [1] alone (which coincidentally are also the 10 systems with systems have consistently gained top GFlops/Watt rankings an Rpeak of at least one PFlops) are consuming a total power on the Green500 list. The Blue Gene hardware and man- of 33.4 MW [2]. These levels of power consumption are al- agement software provide built-in features to monitor power ready a concern for today’s Petascale supercomputers (with consumption at all levels of the machine’s power distribu- operational expenses becoming comparable to the capital tion network. This paper presents the Blue Gene/P power expenses for procuring the machine), and addressing the measurement infrastructure and discusses the operational energy challenge clearly is one of the key issues when ap- aspects of using this infrastructure on Petascale machines. proaching Exascale. We also describe the integration of Blue Gene power moni- While the Flops/Watt metric is useful, its emphasis on toring capabilities into system-level tools like LLview, and LINPACK performance and thus computational load ne- highlight some results of analyzing the production workload glects the fact that the energy costs of memory references at Research Center Jülich (FZJ). -
Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance
Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance Wilson Lisan August 24, 2018 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2018 Abstract This dissertation project is based on participation in the Student Cluster Competition (SCC) at the International Supercomputing Conference (ISC) 2018 in Frankfurt, Germany as part of a four-member Team EPCC from The University of Edinburgh. There are two main projects which are the team-based project and a personal project. The team-based project focuses on the optimisations and tweaks of the HPL, HPCG, and HPCC benchmarks to meet the competition requirements. At the competition, Team EPCC suffered with hardware issues that shaped the cluster into an asymmetrical system with mixed hardware. Unthinkable and extreme methods were carried out to tune the performance and successfully drove the cluster back to its ideal performance. The personal project focuses on testing the SCC benchmarks to evaluate the performance efficiency and system balance at several HPC systems. HPCG fraction of peak over HPL ratio was used to determine the system performance efficiency from its peak and actual performance. It was analysed through HPCC benchmark that the fraction of peak ratio could determine the memory and network balance over the processor or GPU raw performance as well as the possibility of the memory or network bottleneck part. Contents Chapter 1 Introduction .............................................................................................. -
Compute System Metrics (Bof)
Setting Trends for Energy-Efficient Supercomputing Natalie Bates, EE HPC Working Group Tahir Cader, HP & The Green Grid Wu Feng, Virginia Tech & Green500 John Shalf, Berkeley Lab & NERSC Horst Simon, Berkeley Lab & TOP500 Erich Strohmaier, Berkeley Lab & TOP500 ISC BoF; May 31, 2010; Hamburg, Germany Why We Are Here • “Can only improve what you can measure” • Context – Power consumption of HPC and facilities cost are increasing • What is needed? – Converge on a common basis between different research and industry groups for: •metrics • methodologies • workloads for energy-efficient supercomputing, so we can make progress towards solutions. Current Technology Roadmaps will Depart from Historical Gains Power is the Leading Design Constraint From Peter Kogge, DARPA Exascale Study … and the power costs will still be staggering From Peter Kogge, DARPA Exascale Study $1M per megawatt per year! (with CHEAP power) Absolute Power Levels Power Consumption Power Efficiency What We Have Done • Stages of Green Supercomputing – Denial – Awareness – Hype – Substance The Denial Phase (2001 – 2004) • Green Destiny – A 240-Node Supercomputer in 5 Sq. Ft. – LINPACK Performance: 101 Gflops – Power Consumption: 3.2 kW • Prevailing Views embedded processor – “Green Destiny is so low power that it runs just as fast when it is unplugged.” – “In HPC, no one cares about power & cooling, and no one ever will …” – “Moore’s Law for Power will stimulate the economy by creating a new market in cooling technologies.” The Awareness Phase (2004 – 2008) • Green Movements & Studies – IEEE Int’l Parallel & Distributed Processing Symp. (2005) • Workshop on High-Performance, Power-Aware Computing (HPPAC) Green500 • Metrics: Energy-Delay Product and FLOPS/Watt FLOPS/watt – Green Grid (2007) • Industry-driven consortium of all the top system vendors • Metric: Power Usage Efficiency (PUE) – Kogge et al., “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA ITO, AFRL, 2008. -
2010 IBM and the Environment Report
2010 IBM and the Environment Report Committed to environmental leadership across all of IBM's business activities IBM and the Environment - 2010 Annual Report IBM AND THE ENVIRONMENT IBM has long maintained an unwavering commitment to environmental protection, which was formalized by a corporate environmental policy in 1971. The policy calls for IBM to be an environmental leader across all of our business activities, from our research, operations and products to the services and solutions we provide our clients to help them be more protective of the environment. This section of IBM’s Corporate Responsibility Report describes IBM’s programs and performance in the following environmental areas: A Commitment to Environmental 3 Energy and Climate Programs 36 Leadership A Five-Part Strategy 36 Conserving Energy 37 Global Governance and Management 4 CO2 Emissions Reduction 42 System PFC Emissions Reduction 43 Global Environmental Management 4 Renewable Energy 45 System Voluntary Climate Partnerships 46 Stakeholder Engagement 7 Transportation and Logistics 47 Voluntary Partnerships and Initiatives 8 Initiatives Environmental Investment and Return 9 Energy and Climate Protection in 48 the Supply Chain Process Stewardship 12 Environmentally Preferable Substances 12 Remediation 52 and Materials Nanotechnology 15 Audits and Compliance 53 Accidental Releases 53 Pollution Prevention 17 Fines and Penalties 54 Hazardous Waste 17 Nonhazardous Waste 19 Awards and Recognition 55 Chemical Use and Management 20 Internal Recognition 55 External Recognition -
101117-Green Computing and Gpus
Wu Feng, Ph.D. Department of Computer Science Department of Electrical & Computer Engineering NSF Center for High-Performance Reconfigurable CompuAng © W. Feng, SC 2010 We have spent decades focusing on performance, performance, performance (and price/performance). © W. Feng, SC 2010 A ranking of the fastest 500 supercomputers in the world • Benchmark – LINPACK: Solves a (random) dense system of linear equaons in double-precision (64 bits) arithmeAc. • Evaluaon Metric – Performance (i.e., Speed) • Floang-Operaons Per Second (FLOPS) • Web Site – h\p://www.top500.org/ © W. Feng, SC 2010 • Metrics for Evaluang Supercomputers – Performance (i.e., Speed) • Metric: Floang-Operaons Per Second (FLOPS) – Price/Performance Cost Efficiency • Metric: AcquisiAon Cost / FLOPS • Performance & price/performance are important metrics, but … © W. Feng, September 2010 • Electrical power costs $$$$. Source: IDC, 2006 © W. Feng, SC 2010 Examples: Power, Cooling, and Infrastructure $$$ • Japanese Earth Simulator – Power & Cooling: 12 MW $10M/year © W. Feng, SC 2010 • Too much power affects efficiency, reliability, and availability. – Anecdotal Evidence from a “Machine Room” in 2001 - 2002 • Winter: “Machine Room” Temperature of 70-75° F – Failure approximately once per week. • Summer: “Machine Room” Temperature of 85-90° F – Failure approximately twice per week. – Arrenhius’ Equaon (applied to microelectronics) • For every 10° C (18° F) increase in temperature, … the failure rate of a system doubles.* * W. Feng, M. Warren, and E. Weigle, “The Bladed Beowulf: A Cost-EffecAve Alternave to TradiAonal Beowulfs,” IEEE Cluster, Sept. 2002. © W. Feng, SC 2010 • Debuted at SC 2007 • Goal: Raise Awareness in the Energy Efficiency of Supercompung – Drive energy efficiency as a first-order design constraint (on par with performance). -
Tibidabo$: Making the Case for an ARM-Based HPC System
TibidaboI: Making the Case for an ARM-Based HPC System Nikola Rajovica,b,∗, Alejandro Ricoa,b, Nikola Puzovica, Chris Adeniyi-Jonesc, Alex Ramireza,b aComputer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain bDepartment d'Arquitectura de Computadors, Universitat Polit`ecnica de Catalunya - BarcelonaTech, Barcelona, Spain cARM Ltd., Cambridge, United Kingdom Abstract It is widely accepted that future HPC systems will be limited by their power consumption. Current HPC systems are built from commodity server pro- cessors, designed over years to achieve maximum performance, with energy efficiency being an after-thought. In this paper we advocate a different ap- proach: building HPC systems from low-power embedded and mobile tech- nology parts, over time designed for maximum energy efficiency, which now show promise for competitive performance. We introduce the architecture of Tibidabo, the first large-scale HPC clus- ter built from ARM multicore chips, and a detailed performance and energy efficiency evaluation. We present the lessons learned for the design and im- provement in energy efficiency of future HPC systems based on such low- power cores. Based on our experience with the prototype, we perform simu- lations to show that a theoretical cluster of 16-core ARM Cortex-A15 chips would increase the energy efficiency of our cluster by 8.7x, reaching an energy efficiency of 1046 MFLOPS/W. Keywords: high-performance computing, embedded processors, mobile ITibidabo is a mountain overlooking Barcelona ∗Corresponding author Email addresses: [email protected] (Nikola Rajovic), [email protected] (Alejandro Rico), [email protected] (Nikola Puzovic), [email protected] (Chris Adeniyi-Jones), [email protected] (Alex Ramirez) Preprint of the article accepted for publication in Future Generation Computer Systems, Elsevier processors, low power, cortex-a9, cortex-a15, energy efficiency 1. -
Supermicro “Industry-Standard Green HPC Systems”
Confidential Industry-Standard Green HPC Systems HPC Advisory Council Brazil Conference 2014 May 26, 2014 University of São Paulo Attila A. Nagy Senior IT Consultant © Supermicro 2014 Confidential HPC systems: What the industry is doing Architecture Processor 0.4 15.4 8.0 0.6 Xeon Cluster 8.6 Opteron MPP Power 84.6 82.4 Sparc Other system share (%) system share (%) Source: The Top500 list of November 2013. http://www.top500.org Confidential HPC systems: What the industry is doing Interconnect Operating System 2.2 1.4 4.0 2.2 Infiniband Linux 10.0 GbE 15.4 41.4 Unix 10GbE Other 96.4 Custom 27.0 Cray system share (%) Other system share (%) Source: The Top500 list of November 2013. http://www.top500.org Confidential Industry-standard HPC clusters Standard x86 servers throughout . Compute nodes . Head/Management/Control nodes . Storage nodes Infiniband and/or Ethernet networks . Main interconnect . Cluster management and administration . Out-of-band management Linux OS environment . Comprehensive software stack for HPC . Large availability of HPC software tools . Large collaboration community Confidential Typical HPC Cluster IB Fabric Head & Management Nodes Campus Ethernet Fabric x86 servers/Linux Network Infiniband The Beowulf cluster concept Ethernet OOB Mgmt Network is as solidNetwork as ever!! Network Storage Nodes Compute Nodes x86 servers x86 servers Linux Linux Parallel FS Confidential Accelerator/Coprocessor usage All top ten systems in the latest Green500 list are coprocessor Accelerator/CP based* . Two petaflop systems N/A Up to 5x improvements in: 89.4 Nvidia . Power consumption . Physical space Xeon Phi 7.4 . Cost Other 0.8 2.4 GPUs/MICs: 80% of HPC users system share (%) at least testing them Source: The Top500 list of November 2013.