Intel Paragon XP/S Overview Distributed-Memory MIMD
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
TMA4280—Introduction to Supercomputing
Supercomputing TMA4280—Introduction to Supercomputing NTNU, IMF January 12. 2018 1 Outline Context: Challenges in Computational Science and Engineering Examples: Simulation of turbulent flows and other applications Goal and means: parallel performance improvement Overview of supercomputing systems Conclusion 2 Computational Science and Engineering (CSE) What is the motivation for Supercomputing? Solve complex problems fast and accurately: — efforts in modelling and simulation push sciences and engineering applications forward, — computational requirements drive the development of new hardware and software. 3 Computational Science and Engineering (CSE) Development of computational methods for scientific research and innovation in engineering and technology. Covers the entire spectrum of natural sciences, mathematics, informatics: — Scientific model (Physics, Biology, Medical, . ) — Mathematical model — Numerical model — Implementation — Visualization, Post-processing — Validation ! Feedback: virtuous circle Allows for larger and more realistic problems to be simulated, new theories to be experimented numerically. 4 Outcome in Industrial Applications Figure: 2004: “The Falcon 7X becomes the first aircraft in industry history to be entirely developed in a virtual environment, from design to manufacturing to maintenance.” Dassault Systèmes 5 Evolution of computational power Figure: Moore’s Law: exponential increase of number of transistors per chip, 1-year rate (1965), 2-year rate (1975). WikiMedia, CC-BY-SA-3.0 6 Evolution of computational power -
The TOP500 List and Progress in High- Performance Computing
COVER FEATURE GRAND CHALLENGES IN SCIENTIFIC COMPUTING The TOP500 List and Progress in High- Performance Computing Erich Strohmaier, Lawrence Berkeley National Laboratory Hans W. Meuer, University of Mannheim Jack Dongarra, University of Tennessee Horst D. Simon, Lawrence Berkeley National Laboratory For more than two decades, the TOP list has enjoyed incredible success as a metric for supercomputing performance and as a source of data for identifying technological trends. The project’s editors refl ect on its usefulness and limitations for guiding large-scale scientifi c computing into the exascale era. he TOP list (www.top.org) has served TOP500 ORIGINS as the de ning yardstick for supercomput- In the mid-s, coauthor Hans Meuer started a small ing performance since . Published twice a and focused annual conference that has since evolved year, it compiles the world’s largest instal- into the prestigious International Supercomputing Con- Tlations and some of their main characteristics. Systems ference (www.isc-hpc.com). During the conference’s are ranked according to their performance of the Lin- opening session, Meuer presented statistics about the pack benchmark, which solves a dense system of linear numbers, locations, and manufacturers of supercomput- equations. Over time, the data collected for the list has ers worldwide collected from vendors and colleagues in enabled the early identi cation and quanti cation of academia and industry. many important technological and architectural trends Initially, it was obvious that the supercomputer label related to high-performance computing (HPC).− should be reserved for vector processing systems from Here, we brie y describe the project’s origins, the companies such as Cray, CDC, Fujitsu, NEC, and Hitachi principles guiding data collection, and what has made that each claimed to have the fastest system for scienti c the list so successful during the two-decades-long tran- computation by some selective measure. -
Thor's Hammer/Red Storm
Bill Camp & Jim Tomkins The Design Specification and Initial Implementation of the Red Storm Architecture --in partnership with Cray, Inc. William J. Camp & James L. Tomkins CCIM, Sandia National Laboratories Albuquerque, NM [email protected] Our rubric Mission critical engineering & science applications Large systems with a few processors per node Message passing paradigm Balanced architecture Use commodity wherever possible Efficient systems software Emphasis on scalability & reliability in all aspects Critical advances in parallel algorithms Vertical integration of technologies Computing domains at Sandia Peak Mid-Range Domain Volume # Procs 1 101 102 103 104 XXX Red Storm Cplant XXX Beowulf X X X Desktop X Red Storm is targeting the highest-end market but has real advantages for the mid-range market (from 1 cabinet on up) Red Storm Architecture True MPP, designed to be a single system Distributed memory MIMD parallel supercomputer Fully connected 3D mesh interconnect. Each compute node processor has a bi-directional connection to the primary communication network 108 compute node cabinets and 10,368 compute node processors (AMD Sledgehammer @ 2.0 GHz) ~10 TB of DDR memory @ 333MHz Red/Black switching: ~1/4, ~1/2, ~1/4 8 Service and I/O cabinets on each end (256 processors for each color)-- may add on-system viz nodes to SIO partition 240 TB of disk storage (120 TB per color) Red Storm Architecture Functional hardware partitioning: service and I/O nodes, compute nodes, and RAS nodes Partitioned Operating System (OS): -
ASCI Red Vs. Red Storm
7X Performance Results – Final Report: ASCI Red vs. Red Storm Joel 0. Stevenson, Robert A. Ballance, Karen Haskell, and John P. Noe, Sandia National Laboratories and Dennis C. Dinge, Thomas A. Gardiner, and Michael E. Davis, Cray Inc. ABSTRACT: The goal of the 7X performance testing was to assure Sandia National Laboratories, Cray Inc., and the Department of Energy that Red Storm would achieve its performance requirements which were defined as a comparison between ASCI Red and Red Storm. Our approach was to identify one or more problems for each application in the 7X suite, run those problems at two or three processor sizes in the capability computing range, and compare the results between ASCI Red and Red Storm. The first part of this paper describes the two computer systems, the 10 applications in the 7X suite, the 25 test problems, and the results of the performance tests on ASCI Red and Red Storm. During the course of the testing on Red Storm, we had the opportunity to run the test problems in both single-core mode and dual-core mode and the second part of this paper describes those results. Finally, we reflect on lessons learned in undertaking a major head-to-head benchmark comparison. KEYWORDS: 7X, ASCI Red, Red Storm, capability computing, benchmark This work was supported in part by the U.S. Department of Energy. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States National Nuclear Security Administration and the Department of Energy under contract DE-AC04-94AL85000 processor sizes, and compare the results between ASCI 1. -
DARPA's HPCS Program: History, Models, Tools, Languages
DARPA's HPCS Program: History, Models, Tools, Languages Jack Dongarra, University of Tennessee and Oak Ridge National Lab Robert Graybill, USC Information Sciences Institute William Harrod, DARPA Robert Lucas, USC Information Sciences Institute Ewing Lusk, Argonne National Laboratory Piotr Luszczek, University of Tennessee Janice McMahon, USC Information Sciences Institute Allan Snavely, University of California – San Diego Jeffery Vetter, Oak Ridge National Laboratory Katherine Yelick, Lawrence Berkeley National Laboratory Sadaf Alam, Oak Ridge National Laboratory Roy Campbell, Army Research Laboratory Laura Carrington, University of California – San Diego Tzu-Yi Chen, Pomona College Omid Khalili, University of California – San Diego Jeremy Meredith, Oak Ridge National Laboratory Mustafa Tikir, University of California – San Diego Abstract The historical context surrounding the birth of the DARPA High Productivity Computing Systems (HPCS) program is important for understanding why federal government agencies launched this new, long- term high performance computing program and renewed their commitment to leadership computing in support of national security, large science, and space requirements at the start of the 21st century. In this chapter we provide an overview of the context for this work as well as various procedures being undertaken for evaluating the effectiveness of this activity including such topics as modeling the proposed performance of the new machines, evaluating the proposed architectures, understanding the languages used to program these machines as well as understanding programmer productivity issues in order to better prepare for the introduction of these machines in the 2011-2015 timeframe. 1 of 94 DARPA's HPCS Program: History, Models, Tools, Languages ........................................ 1 1.1 HPCS Motivation ...................................................................................................... 9 1.2 HPCS Vision .......................................................................................................... -
Catamount Vs. Cray Linux Environment
To Upgrade or not to Upgrade? Catamount vs. Cray Linux Environment S.D. Hammond, G.R. Mudalige, J.A. Smith, J.A. Davis, S.A. Jarvis High Performance Systems Group, University of Warwick, Coventry, CV4 7AL, UK J. Holt Tessella PLC, Abingdon Science Park, Berkshire, OX14 3YS, UK I. Miller, J.A. Herdman, A. Vadgama Atomic Weapons Establishment, Aldermaston, Reading, RG7 4PR, UK Abstract 1 Introduction Assessing the performance of individual hardware and Modern supercomputers are growing in diversity and software-stack configurations for supercomputers is a complexity – the arrival of technologies such as multi- difficult and complex task, but with potentially high core processors, general purpose-GPUs and specialised levels of reward. While the ultimate benefit of such sys- compute accelerators has increased the potential sci- tem (re-)configuration and tuning studies is to improve entific delivery possible from such machines. This is application performance, potential improvements may not however without some cost, including significant also include more effective scheduling, kernel parame- increases in the sophistication and complexity of sup- ter refinement, pointers to application redesign and an porting operating systems and software libraries. This assessment of system component upgrades. With the paper documents the development and application of growing complexity of modern systems, the effort re- methods to assess the potential performance of select- quired to discover the elusive combination of hardware ing one hardware, operating system (OS) and software and software-stack settings that improve performance stack combination against another. This is of par- across a range of applications is, in itself, becoming an ticular interest to supercomputing centres, which rou- HPC grand challenge . -
Early Evaluation of the Cray XT3 at ORNL
Early Evaluation of the Cray XT3 at ORNL J. S. Vetter S. R. Alam T. H. Dunigan, Jr. M. R. Fahey P. C. Roth P. H. Worley Oak Ridge National Laboratory Oak Ridge, TN, USA 37831 ABSTRACT: Oak Ridge National Laboratory recently received delivery of a Cray XT3. The XT3 is Cray’s third-generation massively parallel processing system. The system builds on a single processor node—the AMD Opteron—and uses a custom chip—called SeaStar—to provide interprocessor communication. In addition, the system uses a lightweight operating system on the compute nodes. This paper describes our initial experiences with the system, including micro-benchmark, kernel, and application benchmark results. In particular, we provide performance results for important Department of Energy applications areas including climate and fusion. We demonstrate experiments on the partially installed system, scaling applications up to 3,600 processors. KEYWORDS: performance evaluation; Cray XT3; Red Storm; Catamount; performance analysis; benchmarking. 1 Introduction computational resources utilized by a particular application. Computational requirements for many large-scale simulations and ensemble studies of vital interest to the ORNL has been evaluating these critical factors on Department of Energy (DOE) exceed what is currently several platforms that include the Cray X1 [1], the SGI offered by any U.S. computer vendor. As illustrated in Altix 3700 [13], and the Cray XD1 [14]. This report the DOE Scales report [30] and the High End describes the initial evaluation results collected on an Computing Revitalization Task Force report [17], early version of the Cray XT3 sited at ORNL. Recent examples are numerous, ranging from global climate results are also publicly available from the ORNL change research to combustion to biology. -
My Group in Tennessee
1 2 High Performance Computing Technologies My Group in Tennessee Numerical Linear Algebra { Basic algorithms for HPC Jack Dongarra { EISPACK, LINPACK, BLAS, LAPACK, ScaLA- University of Tennessee PACK Oak Ridge National Lab oratory Heterogeneous Network Computing http://www.netlib.org/utk/p eopl e/JackDongarra/ { PVM { MPI Software Rep ositories { Netlib { High-Performance Software Exchange Performance Evaluation { Linpack Benchmark, Top500 { ParkBench 3 4 Computational Science WhyTurn to Simulation? ... To o Large HPC o ered a new way to do science: Climate/Weather Mo delling { Exp eriment { Theory { Computation Computation used to approximate physical systems Advantages include: { playing with simulation parameters to study of emergent trends { p ossible replay of a particular simulation event { study systems where no exact theories exist Data intensive problems data-mining, oil resevoir simulation Problems with large length and time scales cosmol- ogy 5 6 Automotive Industry Why Parallel Computers? Desire to solve bigger, more realistic applicatio ns prob- lems. Huge users of HPC technology: Ford US is 25th largest user of HPC in the world Fundamental limits are b eing approached. Main uses of simulation: { aero dynamics similar to aerospace industry More cost e ective solution { crash simulation { metal sheet forming { noise/vibrati onal optimization Example: Weather Prediction Navier-Stokes with 3D Grid around the Earth { trac simulation 8 > > > temper atur e Main gains: > > > > > > > < pr essur e 6 v ar iabl es -
Survey of “High Performance Machines”
SurveySurvey ofof ““HighHigh PerformancePerformance MachinesMachines”” Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 1 OverviewOverview ♦ Processors ♦ Interconnect ♦ Look at the 3 Japanese HPCs ♦ Examine the Top131 2 History of High Performance Computers 1000000.00001P Aggregate Systems Performance 100000.0000 100T Earth Simulator ASCI Q ASCI White 10000.0000 SX-6 10T VPP5000 SX-5 SR8000G1 VPP800 ASCI Blue Increasing 1000.00001T ASCI Red VPP700 ASCI Blue Mountain SX-4 T3E SR8000 Parallelism Paragon VPP500 SR2201/2K NWT/166 100G100.0000 T3D CM-5 SX-3R T90 S-3800 SX-3 SR2201 10G10.0000 C90 Single CPU Performance FLOPS VP2600/1 CRAY-2 Y-MP8 0 SX-2 S-820/80 1.0000 10GHz 1G S-810/20 VP-400 X-MP VP-200 100M0.1000 1GHz CPU Frequencies 10M0.0100 100MHz 1M0.0010 10MHz 1980 1985 1990 1995 2000 2005 2010 Year 3 VibrantVibrant FieldField forfor HighHigh PerformancePerformance ComputersComputers ♦ Cray X1 ♦ Coming soon … ♦ SGI Altix ¾ Cray RedStorm ♦ IBM Regatta ¾ Cray BlackWidow ¾ NEC SX-8 ♦ Sun ¾ IBM Blue Gene/L ♦ HP ♦ Bull ♦ Fujitsu PowerPower ♦ Hitachi SR11000 ♦ NEC SX-7 ♦ Apple 4 JD1 Architecture/SystemsArchitecture/Systems ContinuumContinuum Loosely ♦ Commodity processor with commodity interconnect ¾ Clusters Coupled ¾ Pentium, Itanium, Opteron, Alpha, PowerPC ¾ GigE, Infiniband, Myrinet, Quadrics, SCI ¾ NEC TX7 ¾ HP Alpha ¾ Bull NovaScale 5160 ♦ Commodity processor with custom interconnect ¾ SGI Altix ¾ Intel Itanium 2 ¾ Cray Red Storm ¾ AMD Opteron ¾ IBM Blue Gene/L (?) ¾ IBM Power PC ♦ Custom processor with custom -
Parallel Machines
Lecture 6 Parallel Machines A parallel computer is a connected configuration of processors and memories. The choice space available to a computer architect includes the network topology, the node processor, the address- space organization, and the memory structure. These choices are based on the parallel computation model, the current technology, and marketing decisions. No matter what the pace of change, it is impossible to make intelligent decisions about parallel computers right now without some knowledge of their architecture. For more advanced treatment of computer architecture we recommend Kai Hwang's Advanced Computer Architecture and Parallel Computer Architecture by Gupta, Singh, and Culler. One may gauge what architectures are important today by the Top500 Supercomputer1 list published by Meuer, Strohmaier, Dongarra and Simon. The secret is to learn to read between the lines. There are three kinds of machines on The November 2003 Top 500 list: Distributed Memory Multicomputers (MPPs) • Constellation of Symmetric Multiprocessors (SMPs) • Clusters (NOWs and Beowulf cluster) • Vector Supercomputers, Single Instruction Multiple Data (SIMD) Machines and SMPs are no longer present on the list but used to be important in previous versions. How can one simplify (and maybe grossly oversimplify) the current situation? Perhaps by pointing out that the world's fastest machines are mostly clusters. Perhaps it will be helpful to the reader to list some of the most important machines first sorted by type, and then by highest rank in the top 500 -
It's a Sign, Not a Race
Meeting of the Advanced Scientific Computing Advisory Committee October 17 and 18, 2002 Hilton Washington Embassy Row Hotel 2015 Massachusetts Avenue, NW, Washington, DC It’s a Sign, Not a Race Jack Dongarra University of Tennessee 1 A Tour d’Force for a Supercomputer • Japanese System Jaeri/Jamstec/Nasda/Riken Earth Simulator (2002) • Target Application: CFD-Weather, Climate, Earthquakes • 640 NEC SX/6 Nodes (mod) – 5120 CPUs which have vector ops • 40TeraFlops (peak) • 7 MWatts (ASCI White: 1.2 MW; Q: 6 MW) – Say 10 cent/KWhr - $16.8K/day = $6M/year! • $250-500M for things in building • Footprint of 4 tennis courts • Expect to be on top of Top500 until 60-100 TFlop ASCI machine arrives (Earth Simulator Picture from JAERI web page) • Homogeneous, Centralized, Proprietary, Expensive! NASDA : National Space Development Agency of Japan JAMSTEC : Japan Marine Science and Technology Center2 JAERI : Japan Atomic Energy Research Institute RIKEN : The Institute of Physical and Chemical Research R&D results Not Revolutionary Comparison of vector processors 115mm 110mm 225mm 225mm 457457 mm 386386 mm SX4SX4 (1995) (1995) SX5SX5 (1998) (1998) EarthEarth Simulator Simulator (2002) (2002) 22 GFlop/s GFlop/s 88 GFlop/s GFlop/s 88 GFlop/s GFlop/s 88 vector vector pipes pipes 1616 vector vector pipes pipes 88 vector vector pipes pipes ClockClock :125MHz :125MHz ClockClock :250MHz :250MHz ClockClock :500MHz :500MHz LSI:LSI: 0.35 0.35µµ mm CMOS CMOS LSI:LSI: 0.25 0.25µµ mm LSI:LSI: 0.15 0.15µµ mm CMOS CMOS 37x4=14837x4=148 LSIs LSIs CMOS;CMOS; 32 32 LSIs -
History of Supercomputing
History of Supercomputing Parallel Computing (MC-8836) Esteban Meneses, PhD School of Computing Costa Rica Institute of Technology [email protected] I semester, 2021 Marenostrum Supercomputer, Barcelona Supercomputing Center (BSC) What is a Supercomputer? Elusive definition a computer at the frontline of contemporary processing capacity - particularly speed of calculation (Wikipedia) a large and very fast computer (Merriam-Webster) a machine that hierarchically integrates many components to accelerate the execution of particular programs (Esteban Meneses) a device for turning compute-bound problems into I/O problems (Ken Batcher) Parallel Computing (Meneses) History 3 Supercomputer Features • They are expensive • They represent a major enhancement in computational power • They have a short life (approximately 5 years) • Their performance is measured in FLOPS. Parallel Computing (Meneses) History 4 First Digital Computers Are they really super-computers? Colossus (1944) ENIAC (1946) MARK 1 (1948) First Electronic Digital First Electronic Stored-program Computer Programmable Computer, General-purpose Computer University of Manchester, Bletchley Park, United University of Pennsylvania, United Kingdom Kingdom United States Parallel Computing (Meneses) History 5 CDC 6600 Generally regarded as the first supercomputer • Designed by Seymour Cray (the father of supercomputing) • Built at Control Data Corporation • Single CPU, RISC-like system • Memory was faster than 1964 CPU 3 megaFLOPS • Used Minnesota $60 million today FORTRAN Parallel Computing