My Group in Tennessee

1 2 High Performance Computing Technologies My Group in Tennessee Numerical Linear Algebra { Basic algorithms for HPC Jack Dongarra { EISPACK, LINPACK, BLAS, LAPACK, ScaLA- University of Tennessee PACK Oak Ridge National Lab oratory Heterogeneous Network Computing http://www.netlib.org/utk/p eopl e/JackDongarra/ { PVM { MPI Software Rep ositories { Netlib { High-Performance Software Exchange Performance Evaluation { Linpack Benchmark, Top500 { ParkBench 3 4 Computational Science WhyTurn to Simulation? ... To o Large HPC o ered a new way to do science: Climate/Weather Mo delling { Exp eriment { Theory { Computation Computation used to approximate physical systems Advantages include: { playing with simulation parameters to study of emergent trends { p ossible replay of a particular simulation event { study systems where no exact theories exist Data intensive problems data-mining, oil resevoir simulation Problems with large length and time scales cosmol- ogy 5 6 Automotive Industry Why Parallel Computers? Desire to solve bigger, more realistic applicatio ns problems. Huge users of HPC technology: Ford US is 25th largest user of HPC in the world Fundamental limits are b eing approached. Main uses of simulation: { aero dynamics similar to aerospace industry More cost e ective solution { crash simulation { metal sheet forming { noise/vibrati onal optimization Example: Weather Prediction Navier-Stokes with 3D Grid around the Earth { trac simulation 8 > > > temper atur e Main gains: > > > > > > > < pr essur e 6 v ar iabl es { reduced time to market of new cars; > > > humidity > > > > > > > { increased quality; : 3 w ind v el ocity { reduced need to build exp ensive prototyp es; 1 Kilometer Cells { more ecient &; integrated manufacturing pro- 9 10 slices ! 5 10 cells cesses 11 each cell is 8 bytes, 2 10 Bytes = 200 GBytes at each cell will p erform 100 ops/cell 1 minute time step 9 100ops=cel l 510 cells =8GF l op=s 1min60sec=min 7 8 GC Computing Requirements Grand Challenge Science US Oce of Science and Technology Policy Some De nitions A Grand Challenge is a fundamental problem in science or engineering, with p oten- tially broad economic, p oliti cal and/or scienti c impact, that could b e advanced by applying High Per- formance Computing resources The Grand Challenges of High Performance Comput- ing are those pro jects which are almost to o dicult to investigate using current sup ercomputers! 9 10 GC Summary High-Performance Computing Today In the past decade, the world has exp erienced one of Computational science is a relatively new metho d the most exciting p erio ds in computer development of investigating the world Computer p erformance improvements have b een dra- Current generation of high p erformance computers matic - a trend that promises to continue for the next are making an impact in many areas of science several years. New Grand Challenges app earing { e.g., global mo d- One reason for the improved p erformance is the rapid eling, computational geography advance in micropro cessor technology. Users still want more p ower! Micropro cessors have b ecome smaller, denser, and more p owerful. ... and all this applies to HPC in business If cars had made equal progress, you could buy a car Mayb e the problems in computational science are not for a few dollars, drive it across the country in a few so di erent from those in business ...? minutes, and \park" the car in your p o cket! The result is that micropro cessor-based sup ercom- puting is rapidly b ecoming the technology of prefer- ence in attacking some of the most imp ortant problems of science and engineering. 11 12 Growth in Microprocessor Performance in 1990’s 366 100 34 4 10 Cray T−90 Cray C−90 322 127 3 Cray 2 51 10 Cray X−MP Cray Y−MP Alpha 2 RS 6000/590 10 Alpha Cray 1S R8000 RS 6000/540 194 CMOS proprietary 64 i860 1 242 10 R2000 ECL 10 0 Performance in Mflop/s 61 246 −1 10 80387 193 6881 8087 80287 10 −2 1980 1982 1984 1986 1988 1990 1992 1994 TOP500 - CPU Technology 313 CMOS off the shelf Year 63 124 332 59 109 6/93 11/93 6/94 11/94 6/95 11/95 0 50 400 350 300 250 200 150 100 # Systems # Universität Mannheim 13 14 Scalable Multipro cessors The Maturation of Highly Parallel Technology What is Required? A ordable parallel systems now out-p erform the b est Must scale the lo cal memory bandwidth linearly. conventional sup ercomputers. Performance p er dollar is particularly favorable. Must scale the global interpro cessor communication The eld is thinning to a few very capable systems. bandwidth. Reliability is greatly improved. Third-party scienti c and engineering applications are Scaling memory bandwidth cost-e ectivel y requires app earing. separate, distributed memories. Business applications are app earing. Commercial customers, not just research labs, are Cost-e ectiveness also requires b est price-p erformance acquiring systems. in individual pro cessors. What we get Comp elling Price/Performance Tremendous scalabili ty Tolerable entry price Tackle intractable problems 15 16 Cray v Cray Silicon Graphics Inc. SGI Cray Research Inc. v Cray Computer Company The new kids on the blo ck ... CRI: Founded by Seymour Cray in 1972, the father Founded in 1981 as a Stanford University spin-out of the sup ercomputer Sales originally based on graphics workstations Business based on vector sup ercomputers & later MPP { Graphics done in hardware { Cray1 `76, XMP`82, YMP`87, C90`92, J90`93, { exception to the rule of custom built chips b eing T90 `95, .... less cost e ective than general-purp ose pro cessors { Cray1 `76, Cray2`85, Cray3? running software { T3D `94, T3E `96, ... All machines use mass pro duced pro cessors from MIPS Computer Systems now an SGI subsidiary Seymour Cray left to form CCC in 1989 to develop exotic pro cessor technology Cray3 Aggressively marketed 1994 CCC went bust 1995 CRI returned to pro t + huge order backlog 17 18 SGI Today The Giants No longer just biding their time IBM: released SP2 in 1994 based on workstation New markets: moveaway from graphics workstations chips; to general purp ose HPC: intro duction of paralleli sm { Market p osition: 21 of machines in "Top 500" Current: POWER CHALLENGE list Aim: DEC: Memory Channel architecture released 1994 sell a ordable / accessible / entry-level / scalable from networking and workstation pro cessor exp eri- HPC ence Market p osition: 23 of machines in "Top 500" list { Market p osition: 3 of machines in "Top 500" list Interesting asides: Intel: early exp eriences with hyp ercub e machines 1982- 90 1995: won contract for US Government "Ter- { MIPS announce deal to supply pro cessors for the a ops machine" next generation of Nintendo machines: HPC feed- ing into the mainstream { Market p osition: 5 of machines in "Top 500" list { Feb. 26, 1996: SGI buy 75 of CRI sto ck: low end HP Convex: HP b ought Convex in 1994, to bring HPC having strong in uence on high end HPC together workstation knowledge & HPC { Market p osition: 4 of machines in "Top 500" list ... but how many of them are making a pro t in MPP systems? Others: Fujitsu 7, NEC 8, Hitachi 3, Tera, Meiko 2 19 20 Scienti c Computing: 1986 vs. 1996 Teraflop Cray C−90 Multiprocessors Massively parallel 1986: Intel Paragon Delta 1. Minisup ercomputers 1 - 20 M op/s: Alliant, Con- CM−2 Cray−2 Vector vex, DEC. 2. Parallel vector pro cessors PVP 20 - 2000 M op/s: Cray Y−MP CRI, CDC, IBM. Cray X−MP ILLIAC IV 1996: Cray−1 1. PCs 200 M op/s: Intel Pentium Pro 2. RISC workstations 10 - 1000 M op/s: DEC, HP, Scalar IBM, SGI, Sun. CDC 7600 3. RISC based symmetric multipro cessors SMP Stretch CDC 6600 LARC 0.5 - 15 G op/s: HP-Convex, DEC, and SGI-CRI. 4. Parallel vector pro cessors 1 - 250 G op/s: SGI- UNIVAC CRI, Fujitsu, and NEC. IBM 704 CDC 1604 1950 1960 1970 1980 1990 2000 5. Highly parallel pro cessors 1 - 250 G op/s: HP- Convex, SGI-CRI, Fujitsu, IBM, NEC, Hitachi ENIAC Relays Vacuum tubes Transistors Integrated circuits Microprocessors Mark I 12 11 10 9 8 7 6 5 4 3 2 1 0.1 10 10 10 10 10 10 10 10 10 10 10 10 Flops 21 22 Hitachi CP−PACS Performance Improvement 360 Gflop/s 2048 proc for Scientific Computing Problems 340 320 Linpack−HPC Gflop/s 300 Speed−Up Solving a System of Dense Linear Equations Factor Derived from Supercomputer Hardware Intel Paragon 280 Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory 105 6788 proc 4 260 10 Vector Supercomputer 103 240 102 220 101 200 0 Fujitsu VPP−500 10 180 140 proc 1970 1980 1990 2000 160 Intel Paragon 3680 proc 140 Fujitsu VPP−500 120 100 proc Speed−Up TMC CM−5 Factor Derived from Computational Methods 100 1024 proc 105 80 4 NEC SX−3 10 4 proc Conjugate Gradient Multi−Grid 60 3 Fujitsu VP−2600 Intel Delta 10 1 proc 512 proc Successive Over−Relaxation 40 NEC SX−2 TMC CM−2 102 1 proc Cray Y−MP 2048 proc 8 proc 1 Gauss−Seidel 20 10 Cray 1 Cray X−MP 4 proc Sparse Gaussian Elimination 0 1 proc 100 1980 1985 1990 1995 1970 1980 1990 2000 23 24 Department of Energy's Accelerated Strategic Computing Initiative Virtual Environments 5-year, $1B program designed to deliver tera-scale computing capa- bility. When the numb er crunchers nish crunching, the user is facedwith \Sto ckpile Stewardship" - safe and reliable maintenance of the nation's the mammoth task of making sense of the data.

My Group in Tennessee

TMA4280—Introduction to Supercomputing

The TOP500 List and Progress in High- Performance Computing

Thor's Hammer/Red Storm

ASCI Red Vs. Red Storm

DARPA's HPCS Program: History, Models, Tools, Languages

Catamount Vs. Cray Linux Environment

Early Evaluation of the Cray XT3 at ORNL

Survey of “High Performance Machines”

Intel Paragon XP/S Overview Distributed-Memory MIMD

Parallel Machines

It's a Sign, Not a Race

History of Supercomputing