IBM BG/P Workshop Lukas Arnold, Forschungszentrum Jülich, 14.-16.10.2009 Contact: [email protected] Aim of This Workshop Contribution

Total Page:16

File Type:pdf, Size:1020Kb

IBM BG/P Workshop Lukas Arnold, Forschungszentrum Jülich, 14.-16.10.2009 Contact: L.Arnold@Fz-Juelich.De Aim of This Workshop Contribution IBM BG/P Workshop Lukas Arnold, Forschungszentrum Jülich, 14.-16.10.2009 contact: [email protected] aim of this workshop contribution ! give a brief introduction to the IBM BG/P (sw+hw) ! guide intensively through two aspects ! spent most time with hands-on ! this is not a complete reference talk, as there are already many of them ! aimed for HPC beginners 14.-16.10.2009 Lukas Arnold 2 contents ! part I - Introduction to FZJ/BGP ! systems at FZJ ! IBM Blue Gene/P architecture overview ! part II - jugene Usage ! compiler, submission system ! hands-on: “Hallo (MPI) World!” ! part III - PowerPC 450 ! ASIC, internal structure, compiler optimization ! hands-on: “Matrix-Matrix-Multiplication, a.k.a. dgemm” ! part IV - 3D Torus Network ! torus network strategy, linkage and usage, DMA engine ! hands-on: “Simple Hyperbolic Solver” and “communication and computation overlap” 14.-16.10.2009 Lukas Arnold 3 PART I INTRODUCTION TO FZJ/BGP 14.-16.10.2009 Lukas Arnold 4 Forschungszentrum Jülich (FZJ) ! one of the 15 Helmholtz Research Centers in Germany ! Europe’s largest multi-disciplinary research center ! Area 2.2 km2, 4400 employees, 1300 scientists 14.-16.10.2009 Lukas Arnold 5 Jülich Supercomputing Center (JSC) @ FZJ ! operation of the supercomputers, user support, R&D work in the field of computer and computational science, education and training, 130 employees ! peer-reviewed provision of computer time to national and European computational science projects (NIC, John von Neumann Institute for Computing) 14.-16.10.2009 Lukas Arnold 6 research fields of current projects 14.-16.10.2009 Lukas Arnold 7 user support at JSC 14.-16.10.2009 Lukas Arnold 8 simulation laboratories 14.-16.10.2009 Lukas Arnold 9 systems @ JSC jugene just hpc-ff juropa ! total power consumption: 2.5 MW (jugene) + 0.3 MW (just) + 1.5 MW (hpc-ff+juropa) + 0.9 MW (cooling) " 5 MW ! total performance: 1000 TF/s (jugene) + 300 TF/s (hpc-ff+juropa) " 1300 TF/s = 1.3 PF/s ! total storage: 0.3 PB (Lustre-FS) + 2.2 PB (GPFS@34GB/s) + 2.5 PB (Archive) ! 5 PB 14.-16.10.2009 Lukas Arnold 10 hpc-ff + juropa ! 3288 Compute nodes in total ! 2 Intel Xeon X5570 (Nehalem-EP) ! quad-core processors per node ! 2.93 GHz and Hyperthreading ! 3 GB per physical core ! Installed at JSC in April-June 2009 ! 308 TFlop/s peak performance ! 274.8 TFlop/s LINPACK performance ! No. 10 in TOP500 on June 2009 14.-16.10.2009 Lukas Arnold 11 jugene ! IBM BlueGene/P system ! 72 Racks (294,912 cores) ! Installed at JSC in April/May 2009 ! 1 PFlop/s peak performance ! 825.5 TFlop/s LINPACK performance ! No. 3 in TOP500 of June 2009 ! No. 1 system in Europe 14.-16.10.2009 Lukas Arnold 12 jugene setup in 60 seconds 14.-16.10.2009 Lukas Arnold 13 jugene building blocks Node Card Jugene system (32 chips 4x4x2) 72 Racks, 72x32x32 32 compute, 0-2 IO cards 1 PF/s, 144 TB 435 GF/s, 64 GB Rack 32 Node Cards Cabled 8x8x16 13.9 TF/s, 2 TB Chip 4 processors 13.6 GF/s Compute Card 1 chip, 13.6 GF/s 2.0 GB DDR2 (4.0GB optional) 14.-16.10.2009 Lukas Arnold 14 BG/P compute and node card Blue Gene/P compute ASIC 4 cores, 8MB cache Cu heatsink SDRAM – DDR2 2GB memory Node card connector network, power 14.-16.10.2009 Lukas Arnold 15 BG/P in numbers Property Node Node Processors 4* 450 PowerPC® Properties Processor Frequency 0.85GHz Coherency SMP L3 Cache size (shared) 8MB Main Store 2GB Main Store Bandwidth (1:2 pclk) 13.6 GB/s Peak Performance 13.9 GF/node Torus Bandwidth 6*2*425MB/s=5.1GB/s Network Hardware Latency (Nearest 100ns (32B packet) Neighbour) 800ns (256B packet) Hardware Latency (Worst Case) 3.2#s(64 hops) Tree Bandwidth 2*0.85GB/s=1.7GB/s Network Hardware Latency (worst case) 3.5#s System Area (72k nodes) 160m2 Properties Peak Performance (72k nodes) ~ 1PF Total Power ~2.3MW 14.-16.10.2009 Lukas Arnold 16 system access Blue Gene/P 73728 Compute Nodes Control-System Service Node 600 I/O Nodes Service Node mpirun FrontEnd FrontEnd SSH RAID DB2 Fileserver JUST 14.-16.10.2009 Lukas Arnold 17 system access (cont.) ! Compute Nodes dedicated to running user application, and almost nothing else -simple compute node kernel (CNK) ! I/O Nodes run Linux and provide a more complete range of OS services –files, sockets, process launch, signalling, debugging, and termination ! Service Node performs system management services (e.g., partitioning, heart beating, monitoring errors) -transparent to application software 14.-16.10.2009 Lukas Arnold 18 BG/P compute node software ! Compute Node Kernel (CNK) ! minimal kernel ! handles signals, function shipping ! system calls to I/O nodes, starting/stopping jobs, threads ! not much else ! very “linux-like”, uses glibc ! missing some system calls (fork() mostly) ! limited support for mmap(), execve() ! but, most apps that run on Linux work out-of-the-box on BG/P 14.-16.10.2009 Lukas Arnold 19 BG/P I/O node software ! I/O Node Kernel, Mini-Control Program (MCP) ! Linux ! port of the Linux kernel, GPL/LGPL licensed ! Linux version 2.6.16 ! very minimal distribution ! only connection from compute nodes to outside world ! handles syscalls (ie fopen()) and I/O requests ! file system support: NFS, PVFS, GPFS, Lustre FS 14.-16.10.2009 Lukas Arnold 20 BG/P networks ! 3D torus network ! only for point-to-point between compute nodes ! hardware latency: 0.5 – 5 #s MPI latency: 3 – 10 #s ! bandwidth: 6$2$425 MB/s=5.1 GB/s (per compute node) ! direct memory access (DMA) unit, communication and computation overlap ! collective network ! one-to-all, reduction functionality (compute and I/O nodes) ! one way tree transversal latency: 1.3 #s; MPI: 5 #s ! bandwidth: 850 MB/s per link 14.-16.10.2009 Lukas Arnold 21 BG/P networks (cont.) ! barrier network ! hardware latency for full system: 0.65 #s; MPI 1.6 #s ! 10 Gb network ! I/O nodes only ! file I/O, all external communication ! 1 Gb network ! control network (boot, debug, monitor) ! compute and I/O nodes 14.-16.10.2009 Lukas Arnold 22 BG/P architectural features ! low area foot print (4k cores per rack) ! high energy efficiency (2.5kW per 1 TF/s) ! no network hierarchy, scalable up to full system ! easy programming based on MPI ! high reliability ! balanced system 14.-16.10.2009 Lukas Arnold 23 comparison to other architectures (approximation) ! core linpack performance ! BG/P 3 GF/s ! XT5/PWR6/x86 7/ 12.5/ 12 GF/s ! triad memory bandwidth [related to GF/s] per core ! BG/P: 4.4 GB/s [ 1.5 byte/flop ] ! XT5/PWR6/x86 2.5/ 3.3/ (8) GB/s [ 0.3/ 0.25/ 0.7 ] ! all-to-all performance, two nodes [related to GF/s] ! BG/P: 1 GB/s [ 0.08 byte/flop ] ! XT5/PWR6/x86 3/ 3/ 2 GB/s [ 0.05/ 0.004/ 0.01 ] ! energy efficiency ! BG/P: 300 MF/J ! XT5/PWR6/x86 150/ 85/ 200 MF/J 14.-16.10.2009 Lukas Arnold 24 BG/P cons ! only 512 MB memory per core ! low core performance, 5 to 10 times more cores needed (compared to nowadays general CPUs) ! torus network might not perform well for unstructured communication pattern ! cross compilation ! CNK (compute node kernel) is not a full Linux system 14.-16.10.2009 Lukas Arnold 25 application scaling example PEPC performance 100 10 time in inner loop [s] inner loop timein 1 512 1024 2048 4096 number of cores IBM BG/P - jugene Intel Nehalem - juropa Cray XT5 - louhi IBM Power6 - huygens 14.-16.10.2009 Lukas Arnold 26 application scaling example (cont.) PEPC performance 100 10 time in inner loop [s] inner loop timein 1 1 10 100 partition performance [TF/s] IBM BG/P - jugene Intel Nehalem - juropa Cray XT5 - louhi IBM Power6 - huygens 14.-16.10.2009 Lukas Arnold 27 practical information ! contact me (now or tomorrow) for a private key ! account will be valid until 18.10.2009 ! common passphrase: (WS-kra09) ! make sure you are able to login on jugene, !"#$$%#&'#()*+,-)#%./0,1223456)+)789:45)0'/%7;)# ! have a brief look at our documentation and user info, http://www.fz-juelich.de/jsc/jugene/ ! you will be able to submit jobs on 16./17.10.2009 14.-16.10.2009 Lukas Arnold 28 PART II JUGENE USAGE 14.-16.10.2009 Lukas Arnold 29 login ! use the uniquely distributed private key ! Login via ! ssh -i ssh_key [email protected] ! Automatically distributed to two different login nodes ! jugene3 and jugene4 see: http://www.fz-juelich.de/jsc/jugene/usage/logon/ 14.-16.10.2009 Lukas Arnold 30 available compiler ! need to cross-compile ! compiler for front-end (Power6) only ! GNU: gcc, gfortran, ... ! IBM XL: xlc, xlf90, ... ! and for jugene (PowerPC 450) with MPI wrapper ! GNU: mpicc, mpif90, ... ! IBM XL: mpixlc, mpixlf90, ... ! thread save versions available (*_r) FZJ-Info: http://www.fz-juelich.de/jsc/jugene/usage/tuning/ IBM XL documentation: http://publib.boulder.ibm.com/infocenter/compbgpl/v9v111/index.jsp BP/P redbook: http://www.fz-juelich.de/jsc/datapool/jugene/bgp_appl_sg247287_V1.4.pdf 14.-16.10.2009 Lukas Arnold 31 XL compiler options (optimization) ! -O2 ! default optimization level ! eliminates redundant code ! basic loop optimization ! can structure code to take advantage of -qarch and -qtune settings ! -O3 ! In-depth memory access analysis ! Better loop scheduling ! High-order loop analysis and transformations ! Inlining of small procedures within a compilation unit by default ! Pointer aliasing improvements to enhance other optimizations ! ..
Recommended publications
  • Survey of Computer Architecture
    Architecture-aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 11/17/2010 1 H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Germany in June - All data available from www.top2 500.org 36rd List: The TOP10 Rmax % of Power Flops/ Rank Site Computer Country Cores [Pflops] Peak [MW] Watt Nat. SuperComputer NUDT YH Cluster, X5670 1 China 186,368 2.57 55 4.04 636 Center in Tianjin 2.93Ghz 6C, NVIDIA GPU DOE / OS Jaguar / Cray 2 USA 224,162 1.76 75 7.0 251 Oak Ridge Nat Lab Cray XT5 sixCore 2.6 GHz Nebulea / Dawning / TC3600 Nat. Supercomputer 3 Blade, Intel X5650, Nvidia China 120,640 1.27 43 2.58 493 Center in Shenzhen C2050 GPU Tusbame 2.0 HP ProLiant GSIC Center, Tokyo 4 SL390s G7 Xeon 6C X5670, Japan 73,278 1.19 52 1.40 850 Institute of Technology Nvidia GPU Hopper, Cray XE6 12-core 5 DOE/SC/LBNL/NERSC USA 153,408 1.054 82 2.91 362 2.1 GHz Commissariat a Tera-100 Bull bullx super- 6 l'Energie Atomique France 138,368 1.050 84 4.59 229 node S6010/S6030 (CEA) DOE / NNSA Roadrunner / IBM 7 USA 122,400 1.04 76 2.35 446 Los Alamos Nat Lab BladeCenter QS22/LS21 NSF / NICS / kyaken/ Cray 8 USA 98,928 .831 81 3.09 269 U of Tennessee Cray XT5 sixCore 2.6 GHz Forschungszentrum Jugene / IBM 9 Germany 294,912 .825 82 2.26 365 Juelich (FZJ) Blue Gene/P Solution DOE/ NNSA / 10 Cray XE6 8-core 2.4 GHz USA 107,152 .817 79 2.95 277 Los Alamos Nat Lab 36rd List: The TOP10 Rmax % of Power Flops/ Rank Site Computer Country Cores [Pflops] Peak [MW] Watt Nat.
    [Show full text]
  • JSC News No. 166, July 2008
    JUMP Succession two more years. This should provide users No. 166 • July 2008 with sufficient available computing power On 1 July 2008, the new IBM Power6 ma- on the general purpose part and time to mi- chine p6 575 – known as JUMP, just like grate their applications to the new cluster. its predecessor – took over the production (Contact: Klaus Wolkersdorfer, ext. 6579) workload from the previous Power4 clus- ter as scheduled. First benchmark tests show that users can expect their applica- Gauss Alliance Founded tions to run about three times faster on the During the ISC 2008, an agreement was new machine. This factor could even be im- signed to found the Gauss Alliance, which proved by making the most out of the follow- will unite supercomputer forces in Ger- ing Power6 features: many. The Gauss Centre for Supercom- • Simultaneous multi-threading (SMT puting (GCS) and eleven regional and topi- mode): 64 threads (instead of 32) can cal high-performance computer centres are be used on one node, where 2 threads participating in the alliance, thus creating a now share one physical processor with computer association that is unique world- its dedicated memory caches and float- wide. The signatories are: Gauss Cen- ing point units. tre for Supercomputing (GCS), Center for • Medium size virtual memory pages Computing and Communication of RWTH (64K): applications can set 64K pages Aachen University, Norddeutscher Ver- as a default during link time and thus bund für Hoch- und Höchstleistungsrech- can benefit from improved hardware ef- nen (HLRN) consisting of Zuse Institute ficiencies by accessing these pages.
    [Show full text]
  • Document.Pdf
    THE EVOLUTION OF THE HPC FACILITY AT JSC 2019-06-04 I D. KRAUSE (WITH VARIOUS CONTRIBUTIONS) RESEARCH AND DEVELOPMENT @ FZJ on 2.2 Square Kilometres FORSCHUNGSZENTRUM JÜLICH: AT A GLANCE Facts and Figures 1956 Shareholders 11 609.3 5,914 867 90 % Federal Republic FOUNDATION INSTITUTES million euros EMPLOYEES VISITING of Germany REVENUE SCIENTISTS on 12 December 10 % North Rhine- 2 project 2,165 scientists total Westphalia management 536 doctoral from 65 countries organizations researchers (40 % external 323 trainees and funding) students on placement STRATEGIC PRIORITIES CLIMATE RESEARCH QUANTUM COMPUTING LLEC ENERGY STORAGE - SUPER COMPUTING CORE MATERIALS RESEARCH HBP FACILITIES INFORMATION ALZHEIMER’S RESEARCH SOIL BIOECONOMY RESEARCH NEUROMORPHIC COMPUTING - BIO TECHNOLOGY PLANT RESEARCH LARGE-SCALE INSTRUMENTS on campus JÜLICH SUPERCOMPUTING CENTRE JÜLICH SUPERCOMPUTING CENTRE • Supercomputer operation for: • Center – FZJ • Region – RWTH Aachen University • Germany – Gauss Centre for Supercomputing John von Neumann Institute for Computing • Europe – PRACE, EU projects • Application support • Unique support & research environment at JSC • Peer review support and coordination • R-&-D work • Methods and algorithms, computational science, performance analysis and tools • Scientific Big Data Analytics • Computer architectures, Co-Design Exascale Laboratories: EIC, ECL, NVIDIA • Education and Training IBM Power 4+ JUMP, 9 TFlop/s IBM Blue Gene/L IBM Power 6 JUBL, 45 TFlop/s JUMP, 9 TFlop/s JUROPA IBM Blue Gene/P 200 TFlop/s JUGENE, 1 PFlop/s
    [Show full text]
  • An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers
    ORNL/TM-2020/1561 An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers Hyogi Sim Awais Khan Sudharshan S. Vazhkudai Approved for public release. Distribution is unlimited. August 11, 2020 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website: www.osti.gov/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-605-6000 (1-800-553-6847) TDD: 703-487-4639 Fax: 703-605-6900 E-mail: [email protected] Website: http://classic.ntis.gov/ Reports are available to DOE employees, DOE contractors, Energy Technology Data Ex- change representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone: 865-576-8401 Fax: 865-576-5728 E-mail: [email protected] Website: http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal lia- bility or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or rep- resents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not nec- essarily constitute or imply its endorsement, recommendation, or fa- voring by the United States Government or any agency thereof.
    [Show full text]
  • HPC Achievement and Impact – 2009 a Personal Perspective
    Invited Keynote presentation to ISC 2009 - Hamburg HPC Achievement and Impact – 2009 a personal perspective Dr. Prof. Thomas Sterling (with Maciej Brodowicz & Chirag Dekate) Ad&EddPfDttfCtSiArnaud & Edwards Professor, Department of Computer Science Adjunct Faculty, Department of Electrical and Computer Engineering Louisiana State University Visiting Associate, California Institute Technology Distinguished Visiting Scientist, Oak Ridge National Laboratory CSRI Fellow, Sandia National Laboratory June 24, 2008 DEPARTMENT OF COMPUTER SCIENCE @ LOUISIANA STATE UNIVERSITY HPC Year in Review • A continuing tradition at ISC – (6th year, and still going at it) • Previous Years’ Themes: – 2004: “Constr u ctiv e Continu ity ” – 2005: “High Density Computing” – 2006: “Multicore to Petaflops” – 2007: “Multicore: the Next Moore’ s Law” – 2008: “Run-Up to Petaflops” • This Year’s Theme: “Year 1 A.P. (after Petaflops)” • As alwayy,s, a p ersonal p ersp ective – how I’ve seen it – Highlights – the big picture • But not all the nitty details, sorry – Necessarily biased, but not intentionally so – Iron-oritdbtftiented, but software too – Trends and implications for the future • And a continuing predictor: The Canonical HEC Computer – Based on average and leading predictors 2 Trends in Highlight • Year 1 after Petaflops (1 A.P.) • Applying Petaflops Roadrunner & Jaguar to computational challenges • Deploying Petaflops Systems around the World starting with Jugene • Programming Multicore to save Moore’s Law – Quad core dominates mainstream processor architectures
    [Show full text]
  • 33Rd TOP500 List
    33rd TOP500 List ISC’ 09, Hamburg Agenda • Welcome and Introduction (H.W. Meuer) • TOP10 and Awards (H.D. Simon) • Hig hlig hts of the 30th TOP500 (E. ShiStrohmaier) • HPC Power Consumption (J. Shalf) • Multicore and Manycore and their Impact on HPC (J. J. Dongarra) • Discussion 33rd List: The TOP10 Rmax Power Rank Site Manufacturer Computer Country Cores [Tflops] [MW] Roadrunner 1 DOE/NNSA/LANL IBM USA 129,600 1,105.0 2.48 BladeCenter QS22/LS21 Oak Ridge National Jaguar 2 Cray Inc. USA 150,152 1,059.0 6.95 Laboratory Cray XT5 QC 2.3 GHz Forschungszentrum Jugene 3 IBM Germany 294,912 825.50 2.26 Juelich (FZJ) Blue Gene/P Solution NASA/Ames Pleiades 4 Research SGI USA 51,200 487.0 2.09 SGI Altix ICE 8200EX Center/NAS BlueGene/L 5 DOE/NNSA/LLNL IBM USA 212,992 478.2 2.32 eServer Blue Gene Solution University of Kraken 6 Cray USA 66,000 463.30 Tennessee Cray XT5 QC 2.3 GHz ANtilArgonne National ItIntrep id 7 IBM USA 163,840 458.61 1.26 Laboratory Blue Gene/P Solution Ranger 8 TACC/U. of Texas Sun USA 62,976 433.2 2.0 SunBlade x660420 Dawn 9 DOE/NNSA/LANL IBM USA 147,456 415.70 1.13 Blue Gene/P Solution Forschungszentrum JUROPA 10 Sun/Bull SA Germany 26,304 274.80 1.54 Juelich (FZJ) NovaScale /Sun Blade The TOP500 Project • Listing the 500 most powerful computers in the world • Yardstick: Rmax of Linpack – Solve Ax=b, dense problem, matrix is random • Update twice a year: – ISC’ xy in June in Germany • SCxy in November in the U.S.
    [Show full text]
  • The TOP500 Project Looking Back Over 16 Years of Supercomputing Experience with Special Emphasis on the Industrial Segment
    The TOP500 Project Looking Back Over 16 Years of Supercomputing Experience with Special Emphasis on the Industrial Segment Hans Werner Meuer [email protected] Prometeus GmbH & Universität Mannheim Microsoft Industriekunden – Veranstaltung Frankfurt /16. Oktober 2008 / Windows HPC Server 2008 Launch page 1 31th List: The TOP10 Rmax Power Manufacturer Computer Installation Site Country #Cores [TF/s] [MW] Roadrunner 1 IBM 1026 DOE/NNSA/LANL USA 2.35 122,400 BladeCenter QS22/LS21 BlueGene/L 2 IBM 478.2 DOE/NNSA/LLNL USA 2.33 212,992 eServer Blue Gene Solution Intrepid 3 IBM 450.3 DOE/ANL USA 1.26 163,840 Blue Gene/P Solution Ranger 4 Sun 326 TACC USA 2.00 62,976 SunBlade x6420 Jaguar 5 Cray 205 DOE/ORNL USA 1.58 30,976 Cray XT4 QuadCore JUGENE Forschungszentrum 6 IBM 180 Germany 0.50 65,536 Blue Gene/P Solution Juelich (FZJ) Encanto New Mexico Computing 7 SGI 133.2 USA 0.86 14,336 SGI Altix ICE 8200 Applications Center EKA Computational Research 8 HP 132.8 India 1.60 14,384 Cluster Platform 3000 BL460c Laboratories, TATA SONS 9 IBM Blue Gene/P Solution 112.5 IDRIS France 0.32 40,960 Total Exploration 10 SGI SGI Altix ICE 8200EX 106.1 France 0.44 10,240 Production Microsoft Industriekunden – Veranstaltung page 2 Outline Mannheim Supercomputer Statistics & Top500 Project Start in 1993 Competition between Manufacturers, Countries and Sites My Supercomputer Favorite in the Top500 Lists The 31st List as of June 2008 Performance Development and Projection Bell‘s Law Supercomputing, quo vadis? Top500, quo vadis? Microsoft Industriekunden – Veranstaltung
    [Show full text]
  • On4off Project – Welcome & Use Case „Vier Sterne Tisch“
    ON4OFF PROJECT – WELCOME & USE CASE „VIER STERNE TISCH“ UPDATE PROF. DR. – ING. MORRIS RIEDEL, JUELICH SUPERCOMPUTING CENTRE (JSC) / UNIVERSITY OF ICELAND HEAD OF HIGH PRODUCTIVITY DATA PROCESSING & CROSS-SECTIONAL TEAM DEEP LEARNING 1ST JULY 2019, ON4OFF PROJECT MEETING, JUELICH SUPERCOMPUTING CENTRE, GERMANY WELCOME AT FORSCHUNGSZENTRUM JUELICH JUELICH SUPERCOMPUTING CENTRE (JSC) OFFERING MODULAR SUPERCOMPUTING 1st July 2019 Page 2 JUELICH SUPERCOMPUTING CENTRE (JSC) Institute of Multi-Disciplinary Research Centre Juelich of the Helmholtz Association in Germany . Selected Facts . One of EU largest inter-disciplinary research centres (~5000 employees) . Special expertise in physics, materials science, nanotechnology, neuroscience and medicine & information technology (HPC & Data) [1] Holmholtz Association Web Page 1st July 2019 Page 3 UNIVERSITY OF ICELAND School of Engineering and Natural Sciences (SENS) . Selected Facts . Ranked among the top 300 universities in the world (by Times Higher Education) . ~2900 students at the SENS school . Long collaboration with Forschungszentrum Juelich . ~350 MS students and ~150 doctoral students. Many foreign & Erasmus students; english courses [2] University of Iceland Web page 1st July 2019 Page 4 HPC & DATA SCIENCE: A FIELD OF CONSTANT EVOLUTION Perspective: Floating Point Operations per one second (FLOPS or FLOP/s) 1.000.000 FLOP/s . 1 GigaFlop/s = 109 FLOPS ~1984 . 1 TeraFlop/s = 1012 FLOPS . 1 PetaFlop/s = 1015 FLOPS . 1 ExaFlop/s = 1018 FLOPS © Photograph by Rama, Wikimedia Commons 1.000.000.000.000.000 FLOP/s ~295.000 cores~2009 (JUGENE) >5.900.000.000.000.000 FLOP/s ~ 500.000 cores ~2013 end of service in 2018 1st July 2019 Page 5 GERMAN GAUSS CENTRE FOR SUPERCOMPUTING Alliance of the three national supercomputing centres HLRS (Stuttgart), JSC (Juelich) & LRZ (Munich) .
    [Show full text]
  • Europe Shoots up in TOP500: 4 Systems in the Top 10
    Press release For release: 18th June 2012, 11:00 AM Europe shoots up in TOP500: 4 systems in the top 10 The 39th TOP500 list (http://www.top500.org/lists/2012/06) was released on Monday, June 18, at the 2012 International Supercomputing Conference (ISC) in Hamburg, Germany. Four European systems are in the top 10: SuperMUC at #4, FERMI at #7, JUQUEEN at #8 and CURIE thin nodes at #9. SuperMUC, installed at Leibniz-Rechenzentrum (LRZ@GCS), Germany, ranks #4 in the TOP500 making it the most powerful system in Europe. SuperMUC is a System X iDataPlex from IBM. It is equipped with more than 155,000 processor cores, which deliver an aggregate peak performance of more than 3 Petaflops (3 quadrillion floating point operations per second, a 3 with 15 zeroes). More than 330 Terabytes of main memory are available for data processing. These data can be transferred between nodes via a non-blocking InfiniBand network with fat tree topology. In addition, up to 10 Petabytes of data can intermediately be stored in parallel file systems based on IBM’s GPFS. For permanent storage of user data like program source code, input data etc., a storage solution of NetApp with more than 4 Petabytes capacity is available, renowned for its high reliability. Furthermore, magnetic tape libraries with a capacity of 16.5 Petabytes are available for long-term archiving of data. „Since it is comprised of processors with a standard instruction set that are well known from laptops, PCs and servers, SuperMUC is especially user friendly. This makes adapting user software much easier than for many other of the TOP500 systems that only can achieve high performance by use of special accelerators but can hardly be used for the vast majority of application programs.“, explains Prof.
    [Show full text]
  • No Slide Title
    Cap 1 Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution and Convergence of Parallel Architectures IC/Unicamp Fundamental Design Issues – Adaptado dos slides da editora por Mario Côrtes Mario por editorada slides dos Adaptado pag 1 2 What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues: • IC/Unicamp Resource Allocation: – – how large a collection? – how powerful are the elements? – how much memory? • Data access, Communication and Synchronization – how do the elements cooperate and communicate? – how are data transmitted between processors? – what are the abstractions and primitives for cooperation? • Performance and Scalability – how does it all translate into performance? – how does it scale? (satura o crescimento ou não) Adaptado dos slides da editora por Mario Côrtes Mario por editorada slides dos Adaptado 3 1.1 Why Study Parallel Architecture? Role of a computer architect: To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology cost IC/Unicamp and . – Parallelism: • Provides alternative to faster clock for performance (limitações de tecnologia) • Applies at all levels of system design (ênfase do livro: pipeline, cache, comunicação, sincronização) • Is a fascinating perspective from which to view architecture • Is increasingly central in information processing Côrtes Mario por editorada slides dos Adaptado pag 4 4
    [Show full text]
  • No. 160 • Jan. 2008 JUGENE: Jülich's Next Step Towards Petascale
    JUGENE: Jülich’s Next Step Computational scientists from many re- No. 160 • Jan. 2008 towards Petascale Computing search areas took the chance to apply for significant shares of Blue Gene/L com- When IBM Blue Gene technology became puter time in order to tackle issues that available in 2004/2005, Forschungszen- could not be resolved in the past. Due trum Jülich recognized the potential of this to a large user demand and in line with architecture as a leadership-class system its strategy to strengthen leadership-class for capability computing applications. A key computing, Forschungszentrum Jülich de- feature of this architecture is its scalability cided to procure a powerful next-generation towards petaflop computing based on low Blue Gene system. In October 2007, a 16- power consumption, small footprints and an rack Blue Gene/P system with 65,536 pro- outstanding price-performance ratio. cessors was installed. This system was In early summer 2005, Jülich started test- mainly financed by the Helmholtz Associ- ing a single Blue Gene/L rack with 2,048 ation and the State of North Rhine West- processors. It soon became obvious that phalia. With its peak performance (Rpeak) many more applications than expected of 222.8 TFlop/s and a measured LINPACK could be ported and efficiently run on the computing power (Rmax) of 167.3 TFlop/s, Blue Gene architecture. Due to the fact Jülich’s Blue Gene/P – dubbed JUGENE – that the system is well balanced in terms of was ranked second in the TOP500 list of processor speed, memory latency and net- the fastest computers in the world which work performance, many applications can was released in November 2007 in Reno, be successfully scaled up to large numbers USA.
    [Show full text]
  • Fujitsu's Vision for High Performance Computing
    Fujitsu's Challenge for Petascale Computing Sustained October 16th, 2008 Motoi Okuda Technical Computing Solutions Unit Fujitsu Limited IDC HPC User Forum, Oct. 16th 、2008 Agenda z Fujitsu’s Approach for Petascale Computing and HPC Solution Offerings z Japanese Next Generation Supercomputer Project and Fujitsu’s Contributions z Fujitsu’s Challenges for Petascale Computing z Conclusion IDC HPC User Forum, Oct. 16th, 20081 All Rights Reserved, Copyright FUJITSU LIMITED 2008 Fujitsu’s Approach for Scaling up to 10 PFlops System performance = Processor performance x Number of processors 10 PFlops Many cores CPU or accelerator approach 1 PFlops 1000 High-end general purpose CPU approach LANL Roadrunner 100 TFlops Our approach 100 Give priority to application migration ! Low power consumption 10 TFlops embedded processor 10 NMCAC SGI Altix ICE8200 approach ES Peak performance per processor (GFlops) ASC Purple P5 575 JUGENE LLNL BG/L 1 BG/P 1,000 10,000 100,000 1,000,000 Number of processors IDC HPC User Forum, Oct. 16th, 20082 All Rights Reserved, Copyright FUJITSU LIMITED 2008 Key Issues for Approaching Petascale Computing z How to utilize multi-core CPU? z How to handle a hundred thousand processes? z How to realize high reliability, availability and data integrity of a hundred thousand node system? z How to decrease electric power and footprint? z Fujitsu’s stepwise approach to product release ensures that customers can be prepared for Petascale computing Step1 : 2008 ~ The new high end technical computing server FX1 New Integrated Multi-core Parallel ArChiTecture Intelligent interconnect Extremely reliable CPU design ÎProvides a highly efficient hybrid parallel programming environment Design of Petascale system which inherits FX1 architecture Step2 : 2011 ~ Petascale system with new high performance, highly reliable and low power consumption CPU, innovative interconnect and high density packaging IDC HPC User Forum, Oct.
    [Show full text]