CS 594 Spring 2007 Lecture 4: Overview of High-Performance Computing

Total Page:16

File Type:pdf, Size:1020Kb

CS 594 Spring 2007 Lecture 4: Overview of High-Performance Computing CS 594 Spring 2007 Lecture 4: Overview of High-Performance Computing Jack Dongarra Computer Science Department University of Tennessee 1 Top 500 Computers - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP TPP performance Ax=b, dense problem Updated twice a year Rate SC‘xy in the States in November Size Meeting in Germany in June 2 1 What is a Supercomputer? X A supercomputer is a hardware and software system that provides close to the maximum performance that can currently be achieved. X Over the last 14 years the range for the Top500 has increased greater than Moore’s Why do we need them? Law Almost all of the technical areas that X 1993: are important to the well-being of ¾ #1 = 59.7 GFlop/s humanity use supercomputing in ¾ #500 = 422 MFlop/s fundamental and essential ways. X 2007: ¾ #1 = 280 TFlop/s Computational fluid dynamics, ¾ #500 = 2.73 TFlop/s protein folding, climate modeling, national security, in particular for cryptanalysis and for simulating 3 nuclear weapons to name a few. Architecture/Systems Continuum Tightly 100% Coupled X Best processor performance for codes X Custom processor that are not “cache friendly” with custom interconnect X Good Customcommunication performance ¾ 80% Cray X1 X Simpler programming model ¾ NEC SX-8 X ¾ IBM Regatta Most expensive ¾ IBM Blue Gene/L X Commodity processor 60%X Good communication performance with custom interconnect X Good scalability ¾ SGI Altix Hybrid » Intel Itanium 2 40% ¾ Cray XT3 » AMD Opteron X Best price/performance (for codes X Commodity processor that work well with caches and are with commodity interconnect 20% latency tolerant) ¾ Clusters X More complex programming model » Pentium, Itanium, Commod Opteron, Alpha 0% » GigE, Infiniband, Jun-93 Jun-94 Jun-95 Jun-96 Jun-97 Jun-98 Jun-99 Jun-00 Jun-01 Jun-02 Jun-03 Jun-04 Myrinet, Quadrics Dec-93 Dec-94 Dec-95 Dec-96 Dec-97 Dec-98 Dec-99 Dec-00 Dec-01 Dec-02 Dec-03 ¾ NEC TX7 ¾ IBM eServer Loosely ¾ Dawning Coupled 4 2 Architectures / Systems 500 SIMD 400 Single Proc. 300 SMP 200 Const. 100 Cluster 0 MPP 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 5 Supercomputing Changes Over Time 500 Fastest Systems Over the Past 14 Years 3.54 PF/s 1 Pflop/s The Fastest Computer SUM of the 500 Fastest Computers 280.6 TF/s IBM BlueGene/L 100 Tflop/s NEC Earth Simulator 10 Tflop/s 1.167 TF/s IBM ASCI White 2.74 1 Tflop/s TF/s 59.7 GF/s Intel ASCI6-8 Red years The Computer 100 Gflop/s at Position 500 Fujitsu 'NWT' 10 Gflop/s 0.4 GF/s My Laptop 1 Gflop/s 100 Mflop/s 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 A supercomputer is a hardware and software system that provides close to the 6 maximum performance that can currently be achieved. 3 28th List: The TOP10 Rmax Year/ Manufacturer Computer Installation Site Country #Proc [TF/s] Arch 2005 1 IBM BlueGene/L 280.6 DOE/NNSA/LLNL USA 131,072 eServer Blue Gene Custom 2 2006 Sandia/Cray Red Storm 101.4 NNSA/Sandia USA 26,544 9 Cray XT3 Hybrid 3 2005 IBM BGW 91.29 IBM Thomas Watson USA 40,960 2 eServer Blue Gene Custom 4 2005 IBM ASC Purple 75.76 DOE/NNSA/LLNL USA 12,208 3 eServer pSeries p575 Custom Barcelona Supercomputer 2006 5 IBM MareNostrum 62.63 Spain 12,240 JS21 Cluster, Myrinet Center Commod 2005 6 Dell Thunderbird 53.00 NNSA/Sandia USA 9,024 PowerEdge 1850, IB Commod 7 Tera-10 2006 Bull 52.84 CEA France 9,968 5 NovaScale 5160, Quadrics Commod 8 Columbia 2004 SGI 51.87 NASA Ames USA 10,160 4 Altix, Infiniband Hybrid 9 GSIC / Tokyo Institute of 2006 NEC/Sun Tsubame 47.38 Japan 11,088 7 Fire x4600, ClearSpeed, IB Technology Commod 2006 10 Cray Jaguar 43.48 ORNL USA 10,424 Cray XT3 Hybrid 7 IBM BlueGene/L #1 131,072 Processors Total of 18 systems all in the Top100 1.6 MWatts (1600 homes) (64 racks, 64x32x32) 43,000 ops/s/person Rack 131,072 procs (32 Node boards, 8x8x16) 2048 processors BlueGene/L Compute ASIC Node Board (32 chips, 4x4x2) 16 Compute Cards 64 processors Compute Card (2 chips, 2x1x1) 180/360 TF/s 4 processors 32 TB DDR Chip (2 processors) 2.9/5.7 TF/s 0.5 TB DDR Full system total of 90/180 GF/s 131,072 processors 16 GB DDR 5.6/11.2 GF/s 2.8/5.6 GF/s 1 GB DDR 4 MB (cache) “Fastest Computer” BG/L 700 MHz 131K proc The compute node ASICs include all networking and processor functionality. 64 racks Each compute ASIC includes two 32-bit superscalar PowerPC 440 embedded Peak: 367 Tflop/s cores (note that L1 cache coherence is not maintained between these cores). 8 Linpack: 281 Tflop/s (13K sec about 3.6 hours; n=1.8M) 77% of peak 4 Performance Projection 1 Eflop/s 100 Pflop/s 10 Pflop/s 1 Pflop/s 100 Tflop/s 10 Tflop/s SUM 1 Tflop/s 6-8 years 100 Gflop/s N=1 10 Gflop/s 8-10 years 1 Gflop/s N=500 100 Mflop/s 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 9 Customer Segments / Performance 100% Government Classified 90% Vendor 80% Academic 70% 60% Industry 50% 40% 30% 20% Research 10% 0% 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 10 5 Processor Types 500 SIMD Vector 400 oth. Scalar AMD 300 Sparc MIPS 200 intel 100 HP Power 0 Alpha 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 11 Processors Used in Each of the 500 Systems 92% = 51% Intel 19% IBM Sun Sparc Intel IA-32 1% 22% 22% AMD Intel EM64T NEC 22% 1% HP Alpha Cray 1% 1% HP PA-RISC 4% Intel IA-64 7% AMD x86_64 22% IBM Power 19% 12 6 Interconnects / Systems 500 Others 400 Cray Interconnect SP Switch 300 Crossbar 200 Quadrics Infiniband (78) 100 Myrinet (79) Gigabit Ethernet (211) 0 N/A 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 GigE + Infiniband + Myrinet = 74% 13 Processors per System - Nov 2006 200 180 160 140 120 100 80 60 NumberSystems of 40 20 0 33-64 65-128 129-256 257-512 513- 1025- 2049- 4k-8k 8k-16k 16k-32k 32k-64k 64k- 1024 2048 4096 128k 14 7 KFlop/s per Capita (Flops/Pop) Based on the November 2005 Top500 6000 5000 4000 3000 2000 1000 Hint: Peter Jackson had something to do with this 0 United States Switzerland Israel WETA Digital (Lord of the Rings) New Zealand United Kingdom Netherlands Australia Japan Germany Environmental Burden of PC Spain Canada Has nothing to do with the 47.2 million sheep in NZ CPUs Korea, South Sweden Saudia Arabia France Taiwan Italy Brazil Mexico China Russia India 15 Source: Cool Chips & Micro 32 16 8 Power Consumption of World’s CPUs Year Power # CPUs (in MW) (in millions) 1992 180 87 1994 392 128 1996 959 189 1998 2,349 279 2000 5,752 412 2002 14,083 607 2004 34,485 896 2006 87,439 1,321 17 Power is an Industry Wide Problem X Google facilities ¾ leveraging hydroelectric power » old aluminum plants “Hiding in Plain Sight, Google Seeks More Power”, by John Markoff, June 14, 2006 ¾ >500,000 servers worldwide New Google Plant in The Dulles, Oregon, from NYT, June 14, 2006 18 9 And Now We Want Petascale … High-Speed Train Conventional Power Plant 10 Megawatts 300 Megawatts X What is a conventional petascale machine? ¾ Many high-speed bullet trains … a significant start to a conventional power plant. ¾ “Hiding in Plain Sight, Google Seeks More Power,” The New York Times, June 14, 2006. 19 Top Three Reasons for “Eliminating” Global Climate Warming in the Machine Room 3. HPC Contributes to Global Climate Warming ¾ “I worry that we, as HPC experts in global climate modeling, are contributing to the very thing that we are trying to avoid: the generation of greenhouse gases.” 2. Electrical Power Costs $$$. ¾ Japanese Earth Simulator » Power & Cooling: 12 MW/year Æ $9.6 million/year? ¾ Lawrence Livermore National Laboratory » Power & Cooling of HPC: $14 million/year » Power-up ASC Purple Æ “Panic” call from local electrical company. 1. Reliability & Availability Impact Productivity ¾ California: State of Electrical Emergencies (July 24-25, 2006) » 50,538 MW: A load not expected to be reached until 2010! 20 10 Reliability & Availability of HPC Systems ASCI Q ASCI White CPUs NERSC Seaborg 8,192 Reliability & Availability PSC Lemieux 8,192 MTBI: 6.5 hrs. MTBI: mean time betweenGoogle interrupts; MTBF: mean time between6,656 failures; MTTR: mean time to restore MTBF: 5 hrs. ¾(2001) and 40 hrs. (2003). HW outage sources: storage, CPU, memory. 114 unplanned outages/month. 3,016 MTBI: 14 days.¾ MTTR: 3.3 hrs. HW outage sources: storage, CPU, 3 Availability: 98.74%. ~15,000 ¾ MTBI: 9.7 hrs. SW is the main outage source. Availability: 98.33%. 20 reboots/day; 2-3% ma Availability: ~100%. ¾ HW outage sources: storage, memory. rd Fuel Efficiency: GFlops/WattSource: Daniel A. Reed, RENCI -party HW. 0.9 0.8 chines replaced/year. 0.7 0.6 0.5 GFlops/Watt 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 BlueGene/L Blue Gene ASC Purple p5 1.9 GHz Columbia - SGI Altix 1.5 GHz Thunderbird - Pentium 3.6 GHz 21 Red Storm Cray XT3, 2.0 GHz Earth-Simulator 9 10 11 12 13 14 15 16 17 MareNostrum PPC 970, 2.2 GHz Blue Gene Jaguar - Cray XT3, 2.4 GHz Based on Thunder - Intel Itanium2 1.4GHz Top 20 systems Blue Gene p Blue Gene rocessor Cray XT3, 2.6 GHz Apple XServe, 2.0 GHz p ower ratin Cray X1E (4GB) Cray X1E (2GB) 18 19 20 ASCI Q - Alpha 1.25 GHz g onl IBM p5 575 1.9 GHz y (3,>100,>800) System X 2.3 GHz Apple XServe/ 22 11 Top500 Conclusions X Microprocessor based supercomputers have brought a major change in accessibility and affordability.
Recommended publications
  • The ASCI Red TOPS Supercomputer
    The ASCI Red TOPS Supercomputer http://www.sandia.gov/ASCI/Red/RedFacts.htm The ASCI Red TOPS Supercomputer Introduction The ASCI Red TOPS Supercomputer is the first step in the ASCI Platforms Strategy, which is aimed at giving researchers the five-order-of-magnitude increase in computing performance over current technology that is required to support "full-physics," "full-system" simulation by early next century. This supercomputer, being installed at Sandia National Laboratories, is a massively parallel, MIMD computer. It is noteworthy for several reasons. It will be the world's first TOPS supercomputer. I/O, memory, compute nodes, and communication are scalable to an extreme degree. Standard parallel interfaces will make it relatively simple to port parallel applications to this system. The system uses two operating systems to make the computer both familiar to the user (UNIX) and non-intrusive for the scalable application (Cougar). And it makes use of Commercial Commodity Off The Shelf (CCOTS) technology to maintain affordability. Hardware The ASCI TOPS system is a distributed memory, MIMD, message-passing supercomputer. All aspects of this system architecture are scalable, including communication bandwidth, main memory, internal disk storage capacity, and I/O. Artist's Concept The TOPS Supercomputer is organized into four partitions: Compute, Service, System, and I/O. The Service Partition provides an integrated, scalable host that supports interactive users, application development, and system administration. The I/O Partition supports a scalable file system and network services. The System Partition supports system Reliability, Availability, and Serviceability (RAS) capabilities. Finally, the Compute Partition contains nodes optimized for floating point performance and is where parallel applications execute.
    [Show full text]
  • 2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe
    sandia national laboratories 2017 HIGH PERformance computing The 2017 High Performance Computing Annual Report is dedicated to John Noe and Dino Pavlakos. Building a foundational framework Editor in high performance computing Yasmin Dennig Contributing Writers Megan Davidson Sandia National Laboratories has a long history of significant contributions to the high performance computing Mattie Hensley community and industry. Our innovative computer architectures allowed the United States to become the first to break the teraflop barrier—propelling us to the international spotlight. Our advanced simulation and modeling capabilities have been integral in high consequence US operations such as Operation Burnt Frost. Strong partnerships with industry leaders, such as Cray, Inc. and Goodyear, have enabled them to leverage our high performance computing capabilities to gain a tremendous competitive edge in the marketplace. Contributing Editor Laura Sowko As part of our continuing commitment to provide modern computing infrastructure and systems in support of Sandia’s missions, we made a major investment in expanding Building 725 to serve as the new home of high performance computer (HPC) systems at Sandia. Work is expected to be completed in 2018 and will result in a modern facility of approximately 15,000 square feet of computer center space. The facility will be ready to house the newest National Nuclear Security Administration/Advanced Simulation and Computing (NNSA/ASC) prototype Design platform being acquired by Sandia, with delivery in late 2019 or early 2020. This new system will enable continuing Stacey Long advances by Sandia science and engineering staff in the areas of operating system R&D, operation cost effectiveness (power and innovative cooling technologies), user environment, and application code performance.
    [Show full text]
  • An Extensible Administration and Configuration Tool for Linux Clusters
    An extensible administration and configuration tool for Linux clusters John D. Fogarty B.Sc A dissertation submitted to the University of Dublin, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science 1999 Declaration I declare that the work described in this dissertation is, except where otherwise stated, entirely my own work and has not been submitted as an exercise for a degree at this or any other university. Signed: ___________________ John D. Fogarty 15th September, 1999 Permission to lend and/or copy I agree that Trinity College Library may lend or copy this dissertation upon request. Signed: ___________________ John D. Fogarty 15th September, 1999 ii Summary This project addresses the lack of system administration tools for Linux clusters. The goals of the project were to design and implement an extensible system that would facilitate the administration and configuration of a Linux cluster. Cluster systems are inherently scalable and therefore the cluster administration tool should also scale well to facilitate the addition of new nodes to the cluster. The tool allows the administration and configuration of the entire cluster from a single node. Administration of the cluster is simplified by way of command replication across one, some or all nodes. Configuration of the cluster is made possible through the use of a flexible, variables substitution scheme, which allows common configuration files to reflect differences between nodes. The system uses a GUI interface and is intuitively simple to use. Extensibility is incorporated into the system, by allowing the dynamic addition of new commands and output display types to the system.
    [Show full text]
  • Scalability and Performance of Salinas on the Computational Plant
    Scalability and performance of salinas on the computa- tional plant Manoj Bhardwaj, Ron Brightwell & Garth Reese Sandia National Laboratories Abstract This paper presents performance results from a general-purpose, finite element struc- tural dynamics code called Salinas, on the Computational Plant (CplantTM), which is a large-scale Linux cluster. We compare these results to a traditional supercomputer, the ASCI/Red machine at Sandia National Labs. We describe the hardware and software environment of the Computational Plant, including its unique ability to support eas- ily switching a large section of compute nodes between multiple different independent cluster heads. We provide an overview of the Salinas application and present results from up to 1000 nodes on both machines. In particular, we discuss one of the chal- lenges related to scaling Salinas beyond several hundred processors on CplantTM and how this challenge was addressed and overcome. We have been able to demonstrate that the performance and scalability of Salinas is comparable to a proprietary large- scale parallel computing platform. 1 Introduction Parallel computing platforms composed of commodity personal computers (PCs) in- terconnected by gigabit network technology are a viable alternative to traditional pro- prietary supercomputing platforms. Small- and medium-sized clusters are now ubiqui- tous, and larger-scale procurements, such as those made recently by National Science Foundation for the Distributed Terascale Facility [1] and by Pacific Northwest National Lab [13], are becoming more prevalent. The cost effectiveness of these platforms has allowed for larger numbers of processors to be purchased. In spite of the continued increase in the number of processors, few real-world ap- plication results on large-scale clusters have been published.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Test, Evaluation, and Build Procedures for Sandia's ASCI Red (Janus) Teraflops Operating System
    SANDIA REPORT SAND2005-3356 Unlimited Release Printed October 2005 Test, Evaluation, and Build Procedures For Sandia's ASCI Red (Janus) Teraflops Operating System Daniel W. Barnette Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 and Livermore, California 94550 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under Contract DE-AC04-94AL85000. Approved for public release; further dissemination unlimited. Issued by Sandia National Laboratories, operated for the United States Department of Energy by Sandia Corporation. NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any of their contractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government, any agency thereof, or any of their contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors. Printed in the United States of America. This report has been reproduced directly from the best available copy. Available to DOE and DOE contractors from U.S.
    [Show full text]
  • High Performance Computing
    C R C P R E S S . T A Y L O R & F R A N C I S High Performance Computing A Chapter Sampler www.crcpress.com Contents 1. Overview of Parallel Computing From: Elements of Parallel Computing, by Eric Aubanel 2. Introduction to GPU Parallelism and CUDA From: GPU Parallel Program Development Using CUDA, by Tolga Soyata 3. Optimization Techniques and Best Practices for Parallel Codes From: Parallel Programming for Modern High Performance Computing Systems, by Pawel Czarnul 4. Determining an Exaflop Strategy From: Programming for Hybrid Multi/Manycore MMP Systems, by John Levesque, Aaron Vose 5. Contemporary High Performance Computing From: Contemporary High Performance Computing: From Petascale toward Exascale, by Jeffrey S. Vetter 6. Introduction to Computational Modeling From: Introduction to Modeling and Simulation with MATLAB® and Python, by Steven I. Gordon, Brian Guilfoos 20% Discount Available We're offering you 20% discount on all our entire range of CRC Press titles. Enter the code HPC10 at the checkout. Please note: This discount code cannot be combined with any other discount or offer and is only valid on print titles purchased directly from www.crcpress.com. www.crcpress.com Copyright Taylor & Francis Group. Do Not Distribute. CHAPTER 1 Overview of Parallel Computing 1.1 INTRODUCTION In the first 60 years of the electronic computer, beginning in 1940, computing performance per dollar increased on average by 55% per year [52]. This staggering 100 billion-fold increase hit a wall in the middle of the first decade of this century. The so-called power wall arose when processors couldn't work any faster because they couldn't dissipate the heat they pro- duced.
    [Show full text]
  • Delivering Insight: the History of the Accelerated Strategic Computing
    Lawrence Livermore National Laboratory Computation Directorate Dona L. Crawford Computation Associate Director Lawrence Livermore National Laboratory 7000 East Avenue, L-559 Livermore, CA 94550 September 14, 2009 Dear Colleague: Several years ago, I commissioned Alex R. Larzelere II to research and write a history of the U.S. Department of Energy’s Accelerated Strategic Computing Initiative (ASCI) and its evolution into the Advanced Simulation and Computing (ASC) Program. The goal was to document the first 10 years of ASCI: how this integrated and sustained collaborative effort reached its goals, became a base program, and changed the face of high-performance computing in a way that independent, individually funded R&D projects for applications, facilities, infrastructure, and software development never could have achieved. Mr. Larzelere has combined the documented record with first-hand recollections of prominent leaders into a highly readable, 200-page account of the history of ASCI. The manuscript is a testament to thousands of hours of research and writing and the contributions of dozens of people. It represents, most fittingly, a collaborative effort many years in the making. I’m pleased to announce that Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI) has been approved for unlimited distribution and is available online at https://asc.llnl.gov/asc_history/. Sincerely, Dona L. Crawford Computation Associate Director Lawrence Livermore National Laboratory An Equal Opportunity Employer • University of California • P.O. Box 808 Livermore, California94550 • Telephone (925) 422-2449 • Fax (925) 423-1466 Delivering Insight The History of the Accelerated Strategic Computing Initiative (ASCI) Prepared by: Alex R.
    [Show full text]
  • Sandia's ASCI Red, World's First Teraflop Supercomputer, Is
    Sandia’s ASCI Red, world’s first teraflop supercomputer, is decommissioned Participants at informal wake recall struggles and glories of nine-year run ALBUQUERQUE, N.M. — On a table in a small meeting room at Sandia National Laboratories rested a picture of the deceased — a row of identical cabinets that formed part of the entity known as ASCI Red, the world’s first teraflop supercomputer. Still one of the world’s 500 fastest supercomputers after all these years — nine — ASCI Red was being YOU'RE STILL THE ONE — By supercomputer standards, Sandia’s decommissioned. ASCI Red, the world’s first teraflop machine, was ancient, but what a run “I’ve never buried a computer before,” said Justin Rattner, it had! Here, designer Jim Tomkins (left) talks about ASCI Red and its Intel Chief Technology Officer, to 30 people from Sandia accomplishments with Intel officials and the Intel Corp. who gathered in mid-June to pay their Justin Rattner and Stephen Wheat. Rob Leland looks on at right. (Photos by Paul respects. “We should go around the room so everyone can Edward Sanchez) say their final farewells.” On a nearby table sat a simple white frosted cake. Encircled top and bottom by two strings of small simulated pearls and topped by pink flowers and a silver ribbon, it resembled a hat that could be worn by a very elderly lady, and indeed, ASCI Red was very old by supercomputer standards. Sandia vice-president Rick Stulen eulogized, “ASCI Red broke all records and most importantly ushered the world into the teraflop regime.
    [Show full text]
  • Performance of Various Computers Using Standard Linear Equations Software
    ———————— CS - 89 - 85 ———————— Performance of Various Computers Using Standard Linear Equations Software Jack J. Dongarra* Electrical Engineering and Computer Science Department University of Tennessee Knoxville, TN 37996-1301 Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN 37831 University of Manchester CS - 89 - 85 June 15, 2014 * Electronic mail address: [email protected]. An up-to-date version of this report can be found at http://www.netlib.org/benchmark/performance.ps This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract DE-AC05-96OR22464, and in part by the Science Alliance a state supported program at the University of Tennessee. 6/15/2014 2 Performance of Various Computers Using Standard Linear Equations Software Jack J. Dongarra Electrical Engineering and Computer Science Department University of Tennessee Knoxville, TN 37996-1301 Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN 37831 University of Manchester June 15, 2014 Abstract This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from the Earth Simulator to personal computers. 1. Introduction and Objectives The timing information presented here should in no way be used to judge the overall performance of a computer system. The results reflect only one problem area: solving dense systems of equations. This report provides performance information on a wide assortment of computers ranging from the home-used PC up to the most powerful supercomputers. The information has been collected over a period of time and will undergo change as new machines are added and as hardware and software systems improve.
    [Show full text]
  • Supercomputers: the Amazing Race Gordon Bell November 2014
    Supercomputers: The Amazing Race Gordon Bell November 2014 Technical Report MSR-TR-2015-2 Gordon Bell, Researcher Emeritus Microsoft Research, Microsoft Corporation 555 California, 94104 San Francisco, CA Version 1.0 January 2015 1 Submitted to STARS IEEE Global History Network Supercomputers: The Amazing Race Timeline (The top 20 significant events. Constrained for Draft IEEE STARS Article) 1. 1957 Fortran introduced for scientific and technical computing 2. 1960 Univac LARC, IBM Stretch, and Manchester Atlas finish 1956 race to build largest “conceivable” computers 3. 1964 Beginning of Cray Era with CDC 6600 (.48 MFlops) functional parallel units. “No more small computers” –S R Cray. “First super”-G. A. Michael 4. 1964 IBM System/360 announcement. One architecture for commercial & technical use. 5. 1965 Amdahl’s Law defines the difficulty of increasing parallel processing performance based on the fraction of a program that has to be run sequentially. 6. 1976 Cray 1 Vector Processor (26 MF ) Vector data. Sid Karin: “1st Super was the Cray 1” 7. 1982 Caltech Cosmic Cube (4 node, 64 node in 1983) Cray 1 cost performance x 50. 8. 1983-93 Billion dollar SCI--Strategic Computing Initiative of DARPA IPTO response to Japanese Fifth Gen. 1990 redirected to supercomputing after failure to achieve AI goals 9. 1982 Cray XMP (1 GF) Cray shared memory vector multiprocessor 10. 1984 NSF Establishes Office of Scientific Computing in response to scientists demand and to counteract the use of VAXen as personal supercomputers 11. 1987 nCUBE (1K computers) achieves 400-600 speedup, Sandia winning first Bell Prize, stimulated Gustafson’s Law of Scalable Speed-Up, Amdahl’s Law Corollary 12.
    [Show full text]
  • Architectural Specification for Massively
    CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2005; 17:1271–1316 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.893 Architectural specification for massively parallel computers: an experience and measurement-based approach‡ Ron Brightwell1,∗,†, William Camp1,BenjaminCole1, Erik DeBenedictis1, Robert Leland1, James Tomkins1 and Arthur B. Maccabe2 1Sandia National Laboratories, Scalable Computer Systems, P.O. Box 5800, Albuquerque, NM 87185-1110, U.S.A. 2Department of Computer Science, University of New Mexico, Albuquerque, NM 87131-0001, U.S.A. SUMMARY In this paper, we describe the hardware and software architecture of the Red Storm system developed at Sandia National Laboratories. We discuss the evolution of this architecture and provide reasons for the different choices that have been made. We contrast our approach of leveraging high-volume, mass-market commodity processors to that taken for the Earth Simulator. We present a comparison of benchmarks and application performance that support our approach. We also project the performance of Red Storm and the Earth Simulator. This projection indicates that the Red Storm architecture is a much more cost-effective approach to massively parallel computing. Published in 2005 by John Wiley & Sons, Ltd. KEY WORDS: massively parallel computing; supercomputing; commodity processors; vector processors; Amdahl; shared memory; distributed memory 1. INTRODUCTION In the early 1980s the performance of commodity microprocessors reached a level that made it feasible to consider aggregating large numbers of them into a massively parallel processing (MPP) computer intended to compete in performance with traditional vector supercomputers based on moderate numbers of custom processors.
    [Show full text]