My Group in Tennessee

Total Page:16

File Type:pdf, Size:1020Kb

My Group in Tennessee 1 2 High Performance Computing Technologies My Group in Tennessee Numerical Linear Algebra { Basic algorithms for HPC Jack Dongarra { EISPACK, LINPACK, BLAS, LAPACK, ScaLA- University of Tennessee PACK Oak Ridge National Lab oratory Heterogeneous Network Computing http://www.netlib.org/utk/p eopl e/JackDongarra/ { PVM { MPI Software Rep ositories { Netlib { High-Performance Software Exchange Performance Evaluation { Linpack Benchmark, Top500 { ParkBench 3 4 Computational Science WhyTurn to Simulation? ... To o Large HPC o ered a new way to do science: Climate/Weather Mo delling { Exp eriment { Theory { Computation Computation used to approximate physical systems Advantages include: { playing with simulation parameters to study of emergent trends { p ossible replay of a particular simulation event { study systems where no exact theories exist Data intensive problems data-mining, oil resevoir simulation Problems with large length and time scales cosmol- ogy 5 6 Automotive Industry Why Parallel Computers? Desire to solve bigger, more realistic applicatio ns prob- lems. Huge users of HPC technology: Ford US is 25th largest user of HPC in the world Fundamental limits are b eing approached. Main uses of simulation: { aero dynamics similar to aerospace industry More cost e ective solution { crash simulation { metal sheet forming { noise/vibrati onal optimization Example: Weather Prediction Navier-Stokes with 3D Grid around the Earth { trac simulation 8 > > > temper atur e Main gains: > > > > > > > < pr essur e 6 v ar iabl es { reduced time to market of new cars; > > > humidity > > > > > > > { increased quality; : 3 w ind v el ocity { reduced need to build exp ensive prototyp es; 1 Kilometer Cells { more ecient &; integrated manufacturing pro- 9 10 slices ! 5 10 cells cesses 11 each cell is 8 bytes, 2 10 Bytes = 200 GBytes at each cell will p erform 100 ops/cell 1 minute time step 9 100ops=cel l 510 cells =8GF l op=s 1min60sec=min 7 8 GC Computing Requirements Grand Challenge Science US Oce of Science and Technology Policy Some De nitions A Grand Challenge is a fundamen- tal problem in science or engineering, with p oten- tially broad economic, p oliti cal and/or scienti c im- pact, that could b e advanced by applying High Per- formance Computing resources The Grand Challenges of High Performance Comput- ing are those pro jects which are almost to o dicult to investigate using current sup ercomputers! 9 10 GC Summary High-Performance Computing Today In the past decade, the world has exp erienced one of Computational science is a relatively new metho d the most exciting p erio ds in computer development of investigating the world Computer p erformance improvements have b een dra- Current generation of high p erformance computers matic - a trend that promises to continue for the next are making an impact in many areas of science several years. New Grand Challenges app earing { e.g., global mo d- One reason for the improved p erformance is the rapid eling, computational geography advance in micropro cessor technology. Users still want more p ower! Micropro cessors have b ecome smaller, denser, and more p owerful. ... and all this applies to HPC in business If cars had made equal progress, you could buy a car Mayb e the problems in computational science are not for a few dollars, drive it across the country in a few so di erent from those in business ...? minutes, and \park" the car in your p o cket! The result is that micropro cessor-based sup ercom- puting is rapidly b ecoming the technology of prefer- ence in attacking some of the most imp ortant prob- lems of science and engineering. 11 12 Growth in Microprocessor Performance in 1990’s 366 100 34 4 10 Cray T−90 Cray C−90 322 127 3 Cray 2 51 10 Cray X−MP Cray Y−MP Alpha 2 RS 6000/590 10 Alpha Cray 1S R8000 RS 6000/540 194 CMOS proprietary 64 i860 1 242 10 R2000 ECL 10 0 Performance in Mflop/s 61 246 −1 10 80387 193 6881 8087 80287 10 −2 1980 1982 1984 1986 1988 1990 1992 1994 TOP500 - CPU Technology 313 CMOS off the shelf Year 63 124 332 59 109 6/93 11/93 6/94 11/94 6/95 11/95 0 50 400 350 300 250 200 150 100 # Systems # Universität Mannheim 13 14 Scalable Multipro cessors The Maturation of Highly Parallel Technology What is Required? A ordable parallel systems now out-p erform the b est Must scale the lo cal memory bandwidth linearly. conventional sup ercomputers. Performance p er dollar is particularly favorable. Must scale the global interpro cessor communication The eld is thinning to a few very capable systems. bandwidth. Reliability is greatly improved. Third-party scienti c and engineering applications are Scaling memory bandwidth cost-e ectivel y requires app earing. separate, distributed memories. Business applications are app earing. Commercial customers, not just research labs, are Cost-e ectiveness also requires b est price-p erformance acquiring systems. in individual pro cessors. What we get Comp elling Price/Performance Tremendous scalabili ty Tolerable entry price Tackle intractable problems 15 16 Cray v Cray Silicon Graphics Inc. SGI Cray Research Inc. v Cray Computer Company The new kids on the blo ck ... CRI: Founded by Seymour Cray in 1972, the father Founded in 1981 as a Stanford University spin-out of the sup ercomputer Sales originally based on graphics workstations Business based on vector sup ercomputers & later MPP { Graphics done in hardware { Cray1 `76, XMP`82, YMP`87, C90`92, J90`93, { exception to the rule of custom built chips b eing T90 `95, .... less cost e ective than general-purp ose pro cessors { Cray1 `76, Cray2`85, Cray3? running software { T3D `94, T3E `96, ... All machines use mass pro duced pro cessors from MIPS Computer Systems now an SGI subsidiary Seymour Cray left to form CCC in 1989 to develop exotic pro cessor technology Cray3 Aggressively marketed 1994 CCC went bust 1995 CRI returned to pro t + huge order backlog 17 18 SGI Today The Giants No longer just biding their time IBM: released SP2 in 1994 based on workstation New markets: moveaway from graphics workstations chips; to general purp ose HPC: intro duction of paralleli sm { Market p osition: 21 of machines in "Top 500" Current: POWER CHALLENGE list Aim: DEC: Memory Channel architecture released 1994 sell a ordable / accessible / entry-level / scalable from networking and workstation pro cessor exp eri- HPC ence Market p osition: 23 of machines in "Top 500" list { Market p osition: 3 of machines in "Top 500" list Interesting asides: Intel: early exp eriences with hyp ercub e machines 1982- 90 1995: won contract for US Government "Ter- { MIPS announce deal to supply pro cessors for the a ops machine" next generation of Nintendo machines: HPC feed- ing into the mainstream { Market p osition: 5 of machines in "Top 500" list { Feb. 26, 1996: SGI buy 75 of CRI sto ck: low end HP Convex: HP b ought Convex in 1994, to bring HPC having strong in uence on high end HPC together workstation knowledge & HPC { Market p osition: 4 of machines in "Top 500" list ... but how many of them are making a pro t in MPP systems? Others: Fujitsu 7, NEC 8, Hitachi 3, Tera, Meiko 2 19 20 Scienti c Computing: 1986 vs. 1996 Teraflop Cray C−90 Multiprocessors Massively parallel 1986: Intel Paragon Delta 1. Minisup ercomputers 1 - 20 M op/s: Alliant, Con- CM−2 Cray−2 Vector vex, DEC. 2. Parallel vector pro cessors PVP 20 - 2000 M op/s: Cray Y−MP CRI, CDC, IBM. Cray X−MP ILLIAC IV 1996: Cray−1 1. PCs 200 M op/s: Intel Pentium Pro 2. RISC workstations 10 - 1000 M op/s: DEC, HP, Scalar IBM, SGI, Sun. CDC 7600 3. RISC based symmetric multipro cessors SMP Stretch CDC 6600 LARC 0.5 - 15 G op/s: HP-Convex, DEC, and SGI-CRI. 4. Parallel vector pro cessors 1 - 250 G op/s: SGI- UNIVAC CRI, Fujitsu, and NEC. IBM 704 CDC 1604 1950 1960 1970 1980 1990 2000 5. Highly parallel pro cessors 1 - 250 G op/s: HP- Convex, SGI-CRI, Fujitsu, IBM, NEC, Hitachi ENIAC Relays Vacuum tubes Transistors Integrated circuits Microprocessors Mark I 12 11 10 9 8 7 6 5 4 3 2 1 0.1 10 10 10 10 10 10 10 10 10 10 10 10 Flops 21 22 Hitachi CP−PACS Performance Improvement 360 Gflop/s 2048 proc for Scientific Computing Problems 340 320 Linpack−HPC Gflop/s 300 Speed−Up Solving a System of Dense Linear Equations Factor Derived from Supercomputer Hardware Intel Paragon 280 Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory 105 6788 proc 4 260 10 Vector Supercomputer 103 240 102 220 101 200 0 Fujitsu VPP−500 10 180 140 proc 1970 1980 1990 2000 160 Intel Paragon 3680 proc 140 Fujitsu VPP−500 120 100 proc Speed−Up TMC CM−5 Factor Derived from Computational Methods 100 1024 proc 105 80 4 NEC SX−3 10 4 proc Conjugate Gradient Multi−Grid 60 3 Fujitsu VP−2600 Intel Delta 10 1 proc 512 proc Successive Over−Relaxation 40 NEC SX−2 TMC CM−2 102 1 proc Cray Y−MP 2048 proc 8 proc 1 Gauss−Seidel 20 10 Cray 1 Cray X−MP 4 proc Sparse Gaussian Elimination 0 1 proc 100 1980 1985 1990 1995 1970 1980 1990 2000 23 24 Department of Energy's Accelerated Strategic Computing Initiative Virtual Environments 5-year, $1B program designed to deliver tera-scale computing capa- bility. When the numb er crunchers nish crunching, the user is facedwith \Sto ckpile Stewardship" - safe and reliable maintenance of the nation's the mammoth task of making sense of the data.
Recommended publications
  • TMA4280—Introduction to Supercomputing
    Supercomputing TMA4280—Introduction to Supercomputing NTNU, IMF January 12. 2018 1 Outline Context: Challenges in Computational Science and Engineering Examples: Simulation of turbulent flows and other applications Goal and means: parallel performance improvement Overview of supercomputing systems Conclusion 2 Computational Science and Engineering (CSE) What is the motivation for Supercomputing? Solve complex problems fast and accurately: — efforts in modelling and simulation push sciences and engineering applications forward, — computational requirements drive the development of new hardware and software. 3 Computational Science and Engineering (CSE) Development of computational methods for scientific research and innovation in engineering and technology. Covers the entire spectrum of natural sciences, mathematics, informatics: — Scientific model (Physics, Biology, Medical, . ) — Mathematical model — Numerical model — Implementation — Visualization, Post-processing — Validation ! Feedback: virtuous circle Allows for larger and more realistic problems to be simulated, new theories to be experimented numerically. 4 Outcome in Industrial Applications Figure: 2004: “The Falcon 7X becomes the first aircraft in industry history to be entirely developed in a virtual environment, from design to manufacturing to maintenance.” Dassault Systèmes 5 Evolution of computational power Figure: Moore’s Law: exponential increase of number of transistors per chip, 1-year rate (1965), 2-year rate (1975). WikiMedia, CC-BY-SA-3.0 6 Evolution of computational power
    [Show full text]
  • The TOP500 List and Progress in High- Performance Computing
    COVER FEATURE GRAND CHALLENGES IN SCIENTIFIC COMPUTING The TOP500 List and Progress in High- Performance Computing Erich Strohmaier, Lawrence Berkeley National Laboratory Hans W. Meuer, University of Mannheim Jack Dongarra, University of Tennessee Horst D. Simon, Lawrence Berkeley National Laboratory For more than two decades, the TOP list has enjoyed incredible success as a metric for supercomputing performance and as a source of data for identifying technological trends. The project’s editors refl ect on its usefulness and limitations for guiding large-scale scientifi c computing into the exascale era. he TOP list (www.top.org) has served TOP500 ORIGINS as the de ning yardstick for supercomput- In the mid-s, coauthor Hans Meuer started a small ing performance since . Published twice a and focused annual conference that has since evolved year, it compiles the world’s largest instal- into the prestigious International Supercomputing Con- Tlations and some of their main characteristics. Systems ference (www.isc-hpc.com). During the conference’s are ranked according to their performance of the Lin- opening session, Meuer presented statistics about the pack benchmark, which solves a dense system of linear numbers, locations, and manufacturers of supercomput- equations. Over time, the data collected for the list has ers worldwide collected from vendors and colleagues in enabled the early identi cation and quanti cation of academia and industry. many important technological and architectural trends Initially, it was obvious that the supercomputer label related to high-performance computing (HPC).− should be reserved for vector processing systems from Here, we brie y describe the project’s origins, the companies such as Cray, CDC, Fujitsu, NEC, and Hitachi principles guiding data collection, and what has made that each claimed to have the fastest system for scienti c the list so successful during the two-decades-long tran- computation by some selective measure.
    [Show full text]
  • Thor's Hammer/Red Storm
    Bill Camp & Jim Tomkins The Design Specification and Initial Implementation of the Red Storm Architecture --in partnership with Cray, Inc. William J. Camp & James L. Tomkins CCIM, Sandia National Laboratories Albuquerque, NM [email protected] Our rubric Mission critical engineering & science applications Large systems with a few processors per node Message passing paradigm Balanced architecture Use commodity wherever possible Efficient systems software Emphasis on scalability & reliability in all aspects Critical advances in parallel algorithms Vertical integration of technologies Computing domains at Sandia Peak Mid-Range Domain Volume # Procs 1 101 102 103 104 XXX Red Storm Cplant XXX Beowulf X X X Desktop X Red Storm is targeting the highest-end market but has real advantages for the mid-range market (from 1 cabinet on up) Red Storm Architecture True MPP, designed to be a single system Distributed memory MIMD parallel supercomputer Fully connected 3D mesh interconnect. Each compute node processor has a bi-directional connection to the primary communication network 108 compute node cabinets and 10,368 compute node processors (AMD Sledgehammer @ 2.0 GHz) ~10 TB of DDR memory @ 333MHz Red/Black switching: ~1/4, ~1/2, ~1/4 8 Service and I/O cabinets on each end (256 processors for each color)-- may add on-system viz nodes to SIO partition 240 TB of disk storage (120 TB per color) Red Storm Architecture Functional hardware partitioning: service and I/O nodes, compute nodes, and RAS nodes Partitioned Operating System (OS):
    [Show full text]
  • ASCI Red Vs. Red Storm
    7X Performance Results – Final Report: ASCI Red vs. Red Storm Joel 0. Stevenson, Robert A. Ballance, Karen Haskell, and John P. Noe, Sandia National Laboratories and Dennis C. Dinge, Thomas A. Gardiner, and Michael E. Davis, Cray Inc. ABSTRACT: The goal of the 7X performance testing was to assure Sandia National Laboratories, Cray Inc., and the Department of Energy that Red Storm would achieve its performance requirements which were defined as a comparison between ASCI Red and Red Storm. Our approach was to identify one or more problems for each application in the 7X suite, run those problems at two or three processor sizes in the capability computing range, and compare the results between ASCI Red and Red Storm. The first part of this paper describes the two computer systems, the 10 applications in the 7X suite, the 25 test problems, and the results of the performance tests on ASCI Red and Red Storm. During the course of the testing on Red Storm, we had the opportunity to run the test problems in both single-core mode and dual-core mode and the second part of this paper describes those results. Finally, we reflect on lessons learned in undertaking a major head-to-head benchmark comparison. KEYWORDS: 7X, ASCI Red, Red Storm, capability computing, benchmark This work was supported in part by the U.S. Department of Energy. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States National Nuclear Security Administration and the Department of Energy under contract DE-AC04-94AL85000 processor sizes, and compare the results between ASCI 1.
    [Show full text]
  • DARPA's HPCS Program: History, Models, Tools, Languages
    DARPA's HPCS Program: History, Models, Tools, Languages Jack Dongarra, University of Tennessee and Oak Ridge National Lab Robert Graybill, USC Information Sciences Institute William Harrod, DARPA Robert Lucas, USC Information Sciences Institute Ewing Lusk, Argonne National Laboratory Piotr Luszczek, University of Tennessee Janice McMahon, USC Information Sciences Institute Allan Snavely, University of California – San Diego Jeffery Vetter, Oak Ridge National Laboratory Katherine Yelick, Lawrence Berkeley National Laboratory Sadaf Alam, Oak Ridge National Laboratory Roy Campbell, Army Research Laboratory Laura Carrington, University of California – San Diego Tzu-Yi Chen, Pomona College Omid Khalili, University of California – San Diego Jeremy Meredith, Oak Ridge National Laboratory Mustafa Tikir, University of California – San Diego Abstract The historical context surrounding the birth of the DARPA High Productivity Computing Systems (HPCS) program is important for understanding why federal government agencies launched this new, long- term high performance computing program and renewed their commitment to leadership computing in support of national security, large science, and space requirements at the start of the 21st century. In this chapter we provide an overview of the context for this work as well as various procedures being undertaken for evaluating the effectiveness of this activity including such topics as modeling the proposed performance of the new machines, evaluating the proposed architectures, understanding the languages used to program these machines as well as understanding programmer productivity issues in order to better prepare for the introduction of these machines in the 2011-2015 timeframe. 1 of 94 DARPA's HPCS Program: History, Models, Tools, Languages ........................................ 1 1.1 HPCS Motivation ...................................................................................................... 9 1.2 HPCS Vision ..........................................................................................................
    [Show full text]
  • Catamount Vs. Cray Linux Environment
    To Upgrade or not to Upgrade? Catamount vs. Cray Linux Environment S.D. Hammond, G.R. Mudalige, J.A. Smith, J.A. Davis, S.A. Jarvis High Performance Systems Group, University of Warwick, Coventry, CV4 7AL, UK J. Holt Tessella PLC, Abingdon Science Park, Berkshire, OX14 3YS, UK I. Miller, J.A. Herdman, A. Vadgama Atomic Weapons Establishment, Aldermaston, Reading, RG7 4PR, UK Abstract 1 Introduction Assessing the performance of individual hardware and Modern supercomputers are growing in diversity and software-stack configurations for supercomputers is a complexity – the arrival of technologies such as multi- difficult and complex task, but with potentially high core processors, general purpose-GPUs and specialised levels of reward. While the ultimate benefit of such sys- compute accelerators has increased the potential sci- tem (re-)configuration and tuning studies is to improve entific delivery possible from such machines. This is application performance, potential improvements may not however without some cost, including significant also include more effective scheduling, kernel parame- increases in the sophistication and complexity of sup- ter refinement, pointers to application redesign and an porting operating systems and software libraries. This assessment of system component upgrades. With the paper documents the development and application of growing complexity of modern systems, the effort re- methods to assess the potential performance of select- quired to discover the elusive combination of hardware ing one hardware, operating system (OS) and software and software-stack settings that improve performance stack combination against another. This is of par- across a range of applications is, in itself, becoming an ticular interest to supercomputing centres, which rou- HPC grand challenge .
    [Show full text]
  • Early Evaluation of the Cray XT3 at ORNL
    Early Evaluation of the Cray XT3 at ORNL J. S. Vetter S. R. Alam T. H. Dunigan, Jr. M. R. Fahey P. C. Roth P. H. Worley Oak Ridge National Laboratory Oak Ridge, TN, USA 37831 ABSTRACT: Oak Ridge National Laboratory recently received delivery of a Cray XT3. The XT3 is Cray’s third-generation massively parallel processing system. The system builds on a single processor node—the AMD Opteron—and uses a custom chip—called SeaStar—to provide interprocessor communication. In addition, the system uses a lightweight operating system on the compute nodes. This paper describes our initial experiences with the system, including micro-benchmark, kernel, and application benchmark results. In particular, we provide performance results for important Department of Energy applications areas including climate and fusion. We demonstrate experiments on the partially installed system, scaling applications up to 3,600 processors. KEYWORDS: performance evaluation; Cray XT3; Red Storm; Catamount; performance analysis; benchmarking. 1 Introduction computational resources utilized by a particular application. Computational requirements for many large-scale simulations and ensemble studies of vital interest to the ORNL has been evaluating these critical factors on Department of Energy (DOE) exceed what is currently several platforms that include the Cray X1 [1], the SGI offered by any U.S. computer vendor. As illustrated in Altix 3700 [13], and the Cray XD1 [14]. This report the DOE Scales report [30] and the High End describes the initial evaluation results collected on an Computing Revitalization Task Force report [17], early version of the Cray XT3 sited at ORNL. Recent examples are numerous, ranging from global climate results are also publicly available from the ORNL change research to combustion to biology.
    [Show full text]
  • Survey of “High Performance Machines”
    SurveySurvey ofof ““HighHigh PerformancePerformance MachinesMachines”” Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 1 OverviewOverview ♦ Processors ♦ Interconnect ♦ Look at the 3 Japanese HPCs ♦ Examine the Top131 2 History of High Performance Computers 1000000.00001P Aggregate Systems Performance 100000.0000 100T Earth Simulator ASCI Q ASCI White 10000.0000 SX-6 10T VPP5000 SX-5 SR8000G1 VPP800 ASCI Blue Increasing 1000.00001T ASCI Red VPP700 ASCI Blue Mountain SX-4 T3E SR8000 Parallelism Paragon VPP500 SR2201/2K NWT/166 100G100.0000 T3D CM-5 SX-3R T90 S-3800 SX-3 SR2201 10G10.0000 C90 Single CPU Performance FLOPS VP2600/1 CRAY-2 Y-MP8 0 SX-2 S-820/80 1.0000 10GHz 1G S-810/20 VP-400 X-MP VP-200 100M0.1000 1GHz CPU Frequencies 10M0.0100 100MHz 1M0.0010 10MHz 1980 1985 1990 1995 2000 2005 2010 Year 3 VibrantVibrant FieldField forfor HighHigh PerformancePerformance ComputersComputers ♦ Cray X1 ♦ Coming soon … ♦ SGI Altix ¾ Cray RedStorm ♦ IBM Regatta ¾ Cray BlackWidow ¾ NEC SX-8 ♦ Sun ¾ IBM Blue Gene/L ♦ HP ♦ Bull ♦ Fujitsu PowerPower ♦ Hitachi SR11000 ♦ NEC SX-7 ♦ Apple 4 JD1 Architecture/SystemsArchitecture/Systems ContinuumContinuum Loosely ♦ Commodity processor with commodity interconnect ¾ Clusters Coupled ¾ Pentium, Itanium, Opteron, Alpha, PowerPC ¾ GigE, Infiniband, Myrinet, Quadrics, SCI ¾ NEC TX7 ¾ HP Alpha ¾ Bull NovaScale 5160 ♦ Commodity processor with custom interconnect ¾ SGI Altix ¾ Intel Itanium 2 ¾ Cray Red Storm ¾ AMD Opteron ¾ IBM Blue Gene/L (?) ¾ IBM Power PC ♦ Custom processor with custom
    [Show full text]
  • Intel Paragon XP/S Overview Distributed-Memory MIMD
    Intel MP Paragon XP/S 150 @ Intel Paragon XP/S Overview Oak Ridge National Labs I Distributed-memory MIMD multicomputer I 2D array of nodes, performing both OS functionality as well as user computation ● Main memory physically distributed among nodes (16-64 MB / node) ● Each node contains two Intel i860 XP processors: application processor for user’s program, message processor for inter-node communication I Balanced design: speed and memory capacity matched to interconnection network, storage facilities, etc. ● Interconnect bandwidth scales with number of nodes ● Efficient even with thousands of processors 1 Fall 2003, MIMD 2 Fall 2003, MIMD Paragon XP/S Nodes Paragon XP/S Node Interconnection I Network Interface Controller (NIC) I 2D mesh chosen after extensive analytical studies and simulation ● Connects node to its PMRC ● Parity-checked, full-duplexed router with I Paragon Mesh Routing Chip (PMRC) / error checking iMRC routes traffic in the mesh I Message processor ● 0.75 µm, triple-metal CMOS ● ● Intel i860 XP processor Routes traffic in four directions and to and from attached node at > 200 MB/s ● Handles all details of sending / receiving I 40 ns to make routing decisions and close a message between nodes, including appropriate switches protocols, packetization, etc. I Transfers are parity checked, router is ● Supports global operations including pipelined, routing is deadlock-free broadcast, synchronization, sum, min, ● Backplane is active backplane of router and, or, etc. chips rather than mass of cables I Application processor
    [Show full text]
  • Parallel Machines
    Lecture 6 Parallel Machines A parallel computer is a connected configuration of processors and memories. The choice space available to a computer architect includes the network topology, the node processor, the address- space organization, and the memory structure. These choices are based on the parallel computation model, the current technology, and marketing decisions. No matter what the pace of change, it is impossible to make intelligent decisions about parallel computers right now without some knowledge of their architecture. For more advanced treatment of computer architecture we recommend Kai Hwang's Advanced Computer Architecture and Parallel Computer Architecture by Gupta, Singh, and Culler. One may gauge what architectures are important today by the Top500 Supercomputer1 list published by Meuer, Strohmaier, Dongarra and Simon. The secret is to learn to read between the lines. There are three kinds of machines on The November 2003 Top 500 list: Distributed Memory Multicomputers (MPPs) • Constellation of Symmetric Multiprocessors (SMPs) • Clusters (NOWs and Beowulf cluster) • Vector Supercomputers, Single Instruction Multiple Data (SIMD) Machines and SMPs are no longer present on the list but used to be important in previous versions. How can one simplify (and maybe grossly oversimplify) the current situation? Perhaps by pointing out that the world's fastest machines are mostly clusters. Perhaps it will be helpful to the reader to list some of the most important machines first sorted by type, and then by highest rank in the top 500
    [Show full text]
  • It's a Sign, Not a Race
    Meeting of the Advanced Scientific Computing Advisory Committee October 17 and 18, 2002 Hilton Washington Embassy Row Hotel 2015 Massachusetts Avenue, NW, Washington, DC It’s a Sign, Not a Race Jack Dongarra University of Tennessee 1 A Tour d’Force for a Supercomputer • Japanese System Jaeri/Jamstec/Nasda/Riken Earth Simulator (2002) • Target Application: CFD-Weather, Climate, Earthquakes • 640 NEC SX/6 Nodes (mod) – 5120 CPUs which have vector ops • 40TeraFlops (peak) • 7 MWatts (ASCI White: 1.2 MW; Q: 6 MW) – Say 10 cent/KWhr - $16.8K/day = $6M/year! • $250-500M for things in building • Footprint of 4 tennis courts • Expect to be on top of Top500 until 60-100 TFlop ASCI machine arrives (Earth Simulator Picture from JAERI web page) • Homogeneous, Centralized, Proprietary, Expensive! NASDA : National Space Development Agency of Japan JAMSTEC : Japan Marine Science and Technology Center2 JAERI : Japan Atomic Energy Research Institute RIKEN : The Institute of Physical and Chemical Research R&D results Not Revolutionary Comparison of vector processors 115mm 110mm 225mm 225mm 457457 mm 386386 mm SX4SX4 (1995) (1995) SX5SX5 (1998) (1998) EarthEarth Simulator Simulator (2002) (2002) 22 GFlop/s GFlop/s 88 GFlop/s GFlop/s 88 GFlop/s GFlop/s 88 vector vector pipes pipes 1616 vector vector pipes pipes 88 vector vector pipes pipes ClockClock :125MHz :125MHz ClockClock :250MHz :250MHz ClockClock :500MHz :500MHz LSI:LSI: 0.35 0.35µµ mm CMOS CMOS LSI:LSI: 0.25 0.25µµ mm LSI:LSI: 0.15 0.15µµ mm CMOS CMOS 37x4=14837x4=148 LSIs LSIs CMOS;CMOS; 32 32 LSIs
    [Show full text]
  • History of Supercomputing
    History of Supercomputing Parallel Computing (MC-8836) Esteban Meneses, PhD School of Computing Costa Rica Institute of Technology [email protected] I semester, 2021 Marenostrum Supercomputer, Barcelona Supercomputing Center (BSC) What is a Supercomputer? Elusive definition a computer at the frontline of contemporary processing capacity - particularly speed of calculation (Wikipedia) a large and very fast computer (Merriam-Webster) a machine that hierarchically integrates many components to accelerate the execution of particular programs (Esteban Meneses) a device for turning compute-bound problems into I/O problems (Ken Batcher) Parallel Computing (Meneses) History 3 Supercomputer Features • They are expensive • They represent a major enhancement in computational power • They have a short life (approximately 5 years) • Their performance is measured in FLOPS. Parallel Computing (Meneses) History 4 First Digital Computers Are they really super-computers? Colossus (1944) ENIAC (1946) MARK 1 (1948) First Electronic Digital First Electronic Stored-program Computer Programmable Computer, General-purpose Computer University of Manchester, Bletchley Park, United University of Pennsylvania, United Kingdom Kingdom United States Parallel Computing (Meneses) History 5 CDC 6600 Generally regarded as the first supercomputer • Designed by Seymour Cray (the father of supercomputing) • Built at Control Data Corporation • Single CPU, RISC-like system • Memory was faster than 1964 CPU 3 megaFLOPS • Used Minnesota $60 million today FORTRAN Parallel Computing
    [Show full text]