International Hpc Activities
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
User's Manual
Management Number:OUCMC-super-技術-007-06 National University Corporation Osaka University Cybermedia Center User's manual NEC First Government & Public Solutions Division Summary 1. System Overview ................................................................................................. 1 1.1. Overall System View ....................................................................................... 1 1.2. Thress types of Computiong Environments ........................................................ 2 1.2.1. General-purpose CPU environment .............................................................. 2 1.2.2. Vector computing environment ................................................................... 3 1.2.3. GPGPU computing environment .................................................................. 5 1.3. Three types of Storage Areas ........................................................................... 6 1.3.1. Features.................................................................................................. 6 1.3.2. Various Data Access Methods ..................................................................... 6 1.4. Three types of Front-end nodes ....................................................................... 7 1.4.1. Features.................................................................................................. 7 1.4.2. HPC Front-end ......................................................................................... 7 1.4.1. HPDA Front-end ...................................................................................... -
Interconnect Your Future Enabling the Best Datacenter Return on Investment
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2016 Mellanox Accelerates The World’s Fastest Supercomputers . Accelerates the #1 Supercomputer . 39% of Overall TOP500 Systems (194 Systems) . InfiniBand Connects 65% of the TOP500 HPC Platforms . InfiniBand Connects 46% of the Total Petascale Systems . Connects All of 40G Ethernet Systems . Connects The First 100G Ethernet System on The List (Mellanox End-to-End) . Chosen for 65 End-User TOP500 HPC Projects in 2016, 3.6X Higher versus Omni-Path, 5X Higher versus Cray Aries InfiniBand is the Interconnect of Choice for HPC Infrastructures Enabling Machine Learning, High-Performance, Web 2.0, Cloud, Storage, Big Data Applications © 2016 Mellanox Technologies 2 Mellanox Connects the World’s Fastest Supercomputer National Supercomputing Center in Wuxi, China #1 on the TOP500 List . 93 Petaflop performance, 3X higher versus #2 on the TOP500 . 41K nodes, 10 million cores, 256 cores per CPU . Mellanox adapter and switch solutions * Source: “Report on the Sunway TaihuLight System”, Jack Dongarra (University of Tennessee) , June 20, 2016 (Tech Report UT-EECS-16-742) © 2016 Mellanox Technologies 3 Mellanox In the TOP500 . Connects the world fastest supercomputer, 93 Petaflops, 41 thousand nodes, and more than 10 million CPU cores . Fastest interconnect solution, 100Gb/s throughput, 200 million messages per second, 0.6usec end-to-end latency . Broadest adoption in HPC platforms , connects 65% of the HPC platforms, and 39% of the overall TOP500 systems . Preferred solution for Petascale systems, Connects 46% of the Petascale systems on the TOP500 list . Connects all the 40G Ethernet systems and the first 100G Ethernet system on the list (Mellanox end-to-end) . -
DMI-HIRLAM on the NEC SX-6
DMI-HIRLAM on the NEC SX-6 Maryanne Kmit Meteorological Research Division Danish Meteorological Institute Lyngbyvej 100 DK-2100 Copenhagen Ø Denmark 11th Workshop on the Use of High Performance Computing in Meteorology 25-29 October 2004 Outline • Danish Meteorological Institute (DMI) • Applications run on NEC SX-6 cluster • The NEC SX-6 cluster and access to it • DMI-HIRLAM - geographic areas, versions, and improvements • Strategy for utilization and operation of the system DMI - the Danish Meteorological Institute DMI’s mission: • Making observations • Communicating them to the general public • Developing scientific meteorology DMI’s responsibilities: • Serving the meteorological needs of the kingdom of Denmark • Denmark, the Faroes and Greenland, including territorial waters and airspace • Predicting and monitoring weather, climate and environmental conditions, on land and at sea Applications running on the NEC SX-6 cluster Operational usage: • Long DMI-HIRLAM forecasts 4 times a day • Wave model forecasts for the North Atlantic, the Danish waters, and for the Mediterranean Sea 4 times a day • Trajectory particle model and ozone forecasts for air quality Research usage: • Global climate simulations • Regional climate simulations • Research and development of operational and climate codes Cluster interconnect GE−int Completely closed Connection to GE−ext Switch internal network institute network Switch GE GE nec1 nec2 nec3 nec4 nec5 nec6 nec7 nec8 GE GE neci1 neci2 FC 4*1Gbit/s FC links per node azusa1 asama1 FC McData FC 4*1Gbit/s FC links per azusa fiber channel switch 8*1Gbit/s FC links per asama azusa2 asama2 FC 2*1Gbit/s FC links per lun Disk#0 Disk#1 Disk#2 Disk#3 Cluster specifications • NEC SX-6 (nec[12345678]) : 64M8 (8 vector nodes with 8 CPU each) – Desc. -
Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist
Biographies Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist Thomas Haigh University of Wisconsin Editor: Thomas Haigh Jack J. Dongarra was born in Chicago in 1950 to a he applied for an internship at nearby Argonne National family of Sicilian immigrants. He remembers himself as Laboratory as part of a program that rewarded a small an undistinguished student during his time at a local group of undergraduates with college credit. Dongarra Catholic elementary school, burdened by undiagnosed credits his success here, against strong competition from dyslexia.Onlylater,inhighschool,didhebeginto students attending far more prestigious schools, to Leff’s connect material from science classes with his love of friendship with Jim Pool who was then associate director taking machines apart and tinkering with them. Inspired of the Applied Mathematics Division at Argonne.2 by his science teacher, he attended Chicago State University and majored in mathematics, thinking that EISPACK this would combine well with education courses to equip Dongarra was supervised by Brian Smith, a young him for a high school teaching career. The first person in researcher whose primary concern at the time was the his family to go to college, he lived at home and worked lab’s EISPACK project. Although budget cuts forced Pool to in a pizza restaurant to cover the cost of his education.1 make substantial layoffs during his time as acting In 1972, during his senior year, a series of chance director of the Applied Mathematics Division in 1970– events reshaped Dongarra’s planned career. On the 1971, he had made a special effort to find funds to suggestion of Harvey Leff, one of his physics professors, protect the project and hire Smith. -
The Sunway Taihulight Supercomputer: System and Applications
SCIENCE CHINA Information Sciences . RESEARCH PAPER . July 2016, Vol. 59 072001:1–072001:16 doi: 10.1007/s11432-016-5588-7 The Sunway TaihuLight supercomputer: system and applications Haohuan FU1,3 , Junfeng LIAO1,2,3 , Jinzhe YANG2, Lanning WANG4 , Zhenya SONG6 , Xiaomeng HUANG1,3 , Chao YANG5, Wei XUE1,2,3 , Fangfang LIU5 , Fangli QIAO6 , Wei ZHAO6 , Xunqiang YIN6 , Chaofeng HOU7 , Chenglong ZHANG7, Wei GE7 , Jian ZHANG8, Yangang WANG8, Chunbo ZHOU8 & Guangwen YANG1,2,3* 1Ministry of Education Key Laboratory for Earth System Modeling, and Center for Earth System Science, Tsinghua University, Beijing 100084, China; 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 3National Supercomputing Center in Wuxi, Wuxi 214072, China; 4College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China; 5Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 6First Institute of Oceanography, State Oceanic Administration, Qingdao 266061, China; 7Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China; 8Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China Received May 27, 2016; accepted June 11, 2016; published online June 21, 2016 Abstract The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops. In this paper, we provide a detailed introduction to the TaihuLight system. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel Xeon Phi), the computing power of TaihuLight is provided by a homegrown many-core SW26010 CPU that includes both the management processing elements (MPEs) and computing processing elements (CPEs) in one chip. -
Germany's Top500 Businesses
GERMANY’S TOP500 DIGITAL BUSINESS MODELS IN SEARCH OF BUSINESS CONTENTS FOREWORD 3 INSIGHT 1 4 SLOW GROWTH RATES YET HIGH SALES INSIGHT 2 6 NOT ENOUGH REVENUE IS ATTRIBUTABLE TO DIGITIZATION INSIGHT 3 10 EU REGULATIONS ARE WEAKENING INNOVATION INSIGHT 4 12 THE GERMAN FEDERAL GOVERNMENT COULD TURN INTO AN INNOVATION DRIVER CONCLUSION 14 FOREWORD Large German companies are on the lookout. Their purpose: To find new growth prospects. While revenue increases of more than 5 percent on average have not been uncommon for Germany’s 500 largest companies in the past, that level of growth has not occurred for the last four years. The reasons are obvious. With their high export rates, This study is intended to examine critically the Germany’s industrial companies continue to be major opportunities arising at the beginning of a new era of players in the global market. Their exports have, in fact, technology. Accenture uses four insights to not only been so high in the past that it is now increasingly describe the progress that has been made in digital difficult to sustain their previous rates of growth. markets, but also suggests possible steps companies can take to overcome weak growth. Accenture regularly examines Germany’s largest companies on the basis of their ranking in “Germany’s Top500,” a list published every year in the German The four insights in detail: daily newspaper DIE WELT. These 500 most successful German companies generate revenue of more than INSIGHT 1 one billion Euros. In previous years, they were the Despite high levels of sales, growth among Germany’s engines of the German economy. -
It's a Multi-Core World
It’s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group Moore’s Law is not to blame. Intel process technology capabilities High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018 Feature Size 90nm 65nm 45nm 32nm 22nm 16nm 11nm 8nm Integration Capacity (Billions of 2 4 8 16 32 64 128 256 Transistors) Transistor for Influenza Virus 90nm Process Source: CDC 50nm Source: Intel At end of day, we keep using all those new transistors. That Power and Clock Inflection Point in 2004… didn’t get better. Fun fact: At 100+ Watts and <1V, currents are beginning to exceed 100A at the point of load! Source: Kogge and Shalf, IEEE CISE Courtesy Horst Simon, LBNL Not a new problem, just a new scale… CPU Power W) Cray-2 with cooling tower in foreground, circa 1985 And how to get more performance from more transistors with the same power. RULE OF THUMB A 15% Frequency Power Performance Reduction Reduction Reduction Reduction In Voltage 15% 45% 10% Yields SINGLE CORE DUAL CORE Area = 1 Area = 2 Voltage = 1 Voltage = 0.85 Freq = 1 Freq = 0.85 Power = 1 Power = 1 Perf = 1 Perf = ~1.8 Single Socket Parallelism Processor Year Vector Bits SP FLOPs / core / Cores FLOPs/cycle cycle Pentium III 1999 SSE 128 3 1 3 Pentium IV 2001 SSE2 128 4 1 4 Core 2006 SSE3 128 8 2 16 Nehalem 2008 SSE4 128 8 10 80 Sandybridge 2011 AVX 256 16 12 192 Haswell 2013 AVX2 256 32 18 576 KNC 2012 AVX512 512 32 64 2048 KNL 2016 AVX512 512 64 72 4608 Skylake 2017 AVX512 512 96 28 2688 Putting It All Together Prototypical Application: Serial Weather Model CPU MEMORY First Parallel Weather Modeling Algorithm: Richardson in 1917 Courtesy John Burkhardt, Virginia Tech Weather Model: Shared Memory (OpenMP) Core Fortran: !$omp parallel do Core do i = 1, n Core a(i) = b(i) + c(i) enddoCore C/C++: MEMORY #pragma omp parallel for Four meteorologists in the samefor(i=1; room sharingi<=n; i++) the map. -
A Peer-To-Peer Framework for Message Passing Parallel Programs
A Peer-to-Peer Framework for Message Passing Parallel Programs Stéphane GENAUD a;1, and Choopan RATTANAPOKA b;2 a AlGorille Team - LORIA b King Mongkut’s University of Technology Abstract. This chapter describes the P2P-MPI project, a software framework aimed at the development of message-passing programs in large scale distributed net- works of computers. Our goal is to provide a light-weight, self-contained software package that requires minimum effort to use and maintain. P2P-MPI relies on three features to reach this goal: i) its installation and use does not require administra- tor privileges, ii) available resources are discovered and selected for a computation without intervention from the user, iii) program executions can be fault-tolerant on user demand, in a completely transparent fashion (no checkpoint server to config- ure). P2P-MPI is typically suited for organizations having spare individual comput- ers linked by a high speed network, possibly running heterogeneous operating sys- tems, and having Java applications. From a technical point of view, the framework has three layers: an infrastructure management layer at the bottom, a middleware layer containing the services, and the communication layer implementing an MPJ (Message Passing for Java) communication library at the top. We explain the de- sign and the implementation of these layers, and we discuss the allocation strategy based on network locality to the submitter. Allocation experiments of more than five hundreds peers are presented to validate the implementation. We also present how fault-management issues have been tackled. First, the monitoring of the infras- tructure itself is implemented through the use of failure detectors. -
Challenges in Programming Extreme Scale Systems William Gropp Wgropp.Cs.Illinois.Edu
1 Challenges in Programming Extreme Scale Systems William Gropp wgropp.cs.illinois.edu Towards Exascale Architectures Figure 1: Core Group for Node (Low Capacity, High Bandwidth) 3D Stacked (High Capacity, Memory Low Bandwidth) DRAM Thin Cores / Accelerators Fat Core NVRAM Fat Core Integrated NIC Core for Off-Chip Coherence Domain Communication Figure 2.1: Abstract Machine Model of an exascale Node Architecture 2.1 Overarching Abstract Machine Model We begin with asingle model that highlights the anticipated key hardware architectural features that may support exascale computing. Figure 2.1 pictorially presents this as a single model, while the next subsections Figure 2: Basic Layout of a Node describe several emergingFrom technology “Abstract themes that characterize moreMachine specific hardware design choices by com- Sunway TaihuLightmercial vendors. In Section 2.2, we describe the most plausible set of realizations of the singleAdapteva model that are Epiphany-V DOE Sierra viable candidates forModels future supercomputing and architectures. Proxy • 1024 RISC June• 19, Heterogeneous2016 2.1.1 Processor 2 • Power 9 with 4 NVIDA It is likely that futureArchitectures exascale machines will feature heterogeneous for nodes composed of a collectionprocessors of more processors (MPE,than a single type of processing element. The so-called fat cores that are found in many contemporary desktop Volta GPU and server processorsExascale characterized by deep pipelines, Computing multiple levels of the memory hierarchy, instruction-level parallelism -
Hardware Technology of the Earth Simulator 1
Special Issue on High Performance Computing 27 Architecture and Hardware for HPC 1 Hardware Technology of the Earth Simulator 1 By Jun INASAKA,* Rikikazu IKEDA,* Kazuhiko UMEZAWA,* 5 Ko YOSHIKAWA,* Shitaka YAMADA† and Shigemune KITAWAKI‡ 5 1-1 This paper describes the hardware technologies of the supercomputer system “The Earth Simula- 1-2 ABSTRACT tor,” which has the highest performance in the world and was developed by ESRDC (the Earth 1-3 & 2-1 Simulator Research and Development Center)/NEC. The Earth Simulator has adopted NEC’s leading edge 2-2 10 technologies such as most advanced device technology and process technology to develop a high-speed and high-10 2-3 & 3-1 integrated LSI resulting in a one-chip vector processor. By combining this LSI technology with various advanced hardware technologies including high-density packaging, high-efficiency cooling technology against high heat dissipation and high-density cabling technology for the internal node shared memory system, ESRDC/ NEC has been successful in implementing the supercomputer system with its Linpack benchmark performance 15 of 35.86TFLOPS, which is the world’s highest performance (http://www.top500.org/ : the 20th TOP500 list of the15 world’s fastest supercomputers). KEYWORDS The Earth Simulator, Supercomputer, CMOS, LSI, Memory, Packaging, Build-up Printed Wiring Board (PWB), Connector, Cooling, Cable, Power supply 20 20 1. INTRODUCTION (Main Memory Unit) package are all interconnected with the fine coaxial cables to minimize the distances The Earth Simulator has adopted NEC’s most ad- between them and maximize the system packaging vanced CMOS technologies to integrate vector and density corresponding to the high-performance sys- 25 25 parallel processing functions for realizing a super- tem. -
Presentación De Powerpoint
Towards Exaflop Supercomputers Prof. Mateo Valero Director of BSC, Barcelona National U. of Defense Technology (NUDT) Tianhe-1A • Hybrid architecture: • Main node: • Two Intel Xeon X5670 2.93 GHz 6-core Westmere, 12 MB cache • One Nvidia Tesla M2050 448-ALU (16 SIMD units) 1150 MHz Fermi GPU: • 32 GB memory per node • 2048 Galaxy "FT-1000" 1 GHz 8-core processors • Number of nodes and cores: • 7168 main nodes * (2 sockets * 6 CPU cores + 14 SIMD units) = 186368 cores (not including 16384 Galaxy cores) • Peak performance (DP): • 7168 nodes * (11.72 GFLOPS per core * 6 CPU cores * 2 sockets + 36.8 GFLOPS per SIMD unit * 14 SIMD units per GPU) = 4701.61 TFLOPS • Linpack performance: 2.507 PF 53% efficiency • Power consumption 4.04 MWatt Source http://blog.zorinaq.com/?e=36 Cartagena, Colombia, May 18-20 Top10 Rank Site Computer Procs Rmax Rpeak 1 Tianjin, China XeonX5670+NVIDIA 186368 2566000 4701000 2 Oak Ridge Nat. Lab. Cray XT5,6 cores 224162 1759000 2331000 3 Shenzhen, China XeonX5670+NVIDIA 120640 1271000 2984300 4 GSIC Center, Tokyo XeonX5670+NVIDIA 73278 1192000 2287630 5 DOE/SC/LBNL/NERSC Cray XE6 12 cores 153408 1054000 1288630 Commissariat a l'Energie Bull bullx super-node 6 138368 1050000 1254550 Atomique (CEA) S6010/S6030 QS22/LS21 Cluster, 7 DOE/NNSA/LANL PowerXCell 8i / Opteron 122400 1042000 1375780 Infiniband National Institute for 8 Computational Cray XT5-HE 6 cores 98928 831700 1028850 Sciences/University of Tennessee 9 Forschungszentrum Juelich (FZJ) Blue Gene/P Solution 294912 825500 825500 10 DOE/NNSA/LANL/SNL Cray XE6 8-core 107152 816600 1028660 Cartagena, Colombia, May 18-20 Looking at the Gordon Bell Prize • 1 GFlop/s; 1988; Cray Y-MP; 8 Processors • Static finite element analysis • 1 TFlop/s; 1998; Cray T3E; 1024 Processors • Modeling of metallic magnet atoms, using a variation of the locally self-consistent multiple scattering method. -
Financing the Future of Supercomputing: How to Increase
INNOVATION FINANCE ADVISORY STUDIES Financing the future of supercomputing How to increase investments in high performance computing in Europe years Financing the future of supercomputing How to increase investment in high performance computing in Europe Prepared for: DG Research and Innovation and DG Connect European Commission By: Innovation Finance Advisory European Investment Bank Advisory Services Authors: Björn-Sören Gigler, Alberto Casorati and Arnold Verbeek Supervisor: Shiva Dustdar Contact: [email protected] Consultancy support: Roland Berger and Fraunhofer SCAI © European Investment Bank, 2018. All rights reserved. All questions on rights and licensing should be addressed to [email protected] Financing the future of supercomputing Foreword “Disruptive technologies are key enablers for economic growth and competitiveness” The Digital Economy is developing rapidly worldwide. It is the single most important driver of innovation, competitiveness and growth. Digital innovations such as supercomputing are an essential driver of innovation and spur the adoption of digital innovations across multiple industries and small and medium-sized enterprises, fostering economic growth and competitiveness. Applying the power of supercomputing combined with Artificial Intelligence and the use of Big Data provide unprecedented opportunities for transforming businesses, public services and societies. High Performance Computers (HPC), also known as supercomputers, are making a difference in the everyday life of citizens by helping to address the critical societal challenges of our times, such as public health, climate change and natural disasters. For instance, the use of supercomputers can help researchers and entrepreneurs to solve complex issues, such as developing new treatments based on personalised medicine, or better predicting and managing the effects of natural disasters through the use of advanced computer simulations.