High Performance Computing and the Computational Grid

High Performance Computing and the Computational Grid Jack Dongarra University of Tennessee and Oak Ridge National Lab September 23, 12003 TTecechnology TTrenrends: Microprocessor Capacity Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor Microprocessors have become chips would double roughly every smaller, denser, and more powerful. 18 months. Not just processors, bandwidth, storage, etc. 2X transistors/Chip Every 1.5 years 2X memory and processor speed and Called “Moore’s Law” ½ size, cost, & power every 18 months. 2 1 Moore’s Law Moore’s Law Super Scalar/Vector/Parallel 1 PFlop/s Earth Parallel Simulator ASCI White ASCI Red 1 TFlop/s Pacific TMC CM-5 Cray T3D Vector TMC CM-2 Cray 2 1 GFlop/s Cray X-MP Super Scalar Cray 1 1941 1 (Floating Point operations / second, Flop/s) CDC 7600 IBM 360/195 1945 100 1 MFlop/s Scalar 1949 1,000 (1 KiloFlop/s, KFlop/s) CDC 6600 1951 10,000 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) IBM 7090 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1 KFlop/s 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) UNIVAC 1 2000 10,000,000,000,000 EDSAC 1 2003 35,000,000,000,000 (35 TFlop/s) 3 1950 1960 1970 1980 1990 2000 2010 H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Mannheim, Germany in June - All data available from www.top500.org 4 2 June 2003 Manufacturer Computer Rmax Installation Site Year # Proc Rpeak Earth Simulator Center 1 NEC Earth-Simulator 35860 2002 5120 40960 Yokohama Hewlett- ASCI Q - AlphaServer SC Los Alamos National Laboratory 2 13880 2002 8192 20480 Packard ES45/1.25 GHz Los Alamos Linux NetworX MCR Linux Cluster Xeon 2.4 GHz - Lawrence Livermore National Laboratory 3 7634 2002 2304 11060 Quadrics Quadrics Livermore Lawrence Livermore National Laboratory 4 IBM ASCI White, SP Power3 375 MHz 7304 2000 8192 12288 Livermore NERSC/LBNL 5 IBM SP Power3 375 MHz 16 way 7304 2002 6656 9984 Berkeley xSeries Cluster Xeon 2.4 GHz - Lawrence Livermore National Laboratory 6 IBM/Quadrics 6586 2003 1920 9216 Quadrics Livermore National Aerospace Lab 7 Fujitsu PRIMEPOWER HPC2500 (1.3 GHz) 5406 2002 2304 11980 Tokyo Hewlett- rx2600 Itanium2 1 GHz Cluster - Pacific Northwest National Laboratory 8 4881 2003 1540 6160 Packard Quadrics Richland Hewlett- Pittsburgh Supercomputing Center 9 AlphaServer SC ES45/1 GHz 4463 2001 3016 6032 Packard Pittsburgh Hewlett- Commissariat a l'Energie Atomique (CEA) 10 AlphaServer SC ES45/1 GHz 3980 2001 2560 5120 Packard Bruyeres-le-Chatel 5 A Tour de Force in Engineering ♦ Homogeneous, Centralized, Proprietary, Expensive! ♦ Target Application: CFD-Weather, Climate, Earthquakes ♦ 640 NEC SX/6 Nodes (mod) ¾ 5120 CPUs which have vector ops ¾ Each CPU 8 Gflop/s Peak ♦ 40 TFlop/s (peak) ♦ $1/2 Billion for machine & building ♦ Footprint of 4 tennis courts ♦ 7 MWatts ¾ Say 10 cent/KWhr - $16.8K/day = $6M/year! ♦ Expect to be on top of Top500 until 60-100 TFlop ASCI machine arrives ♦ From the Top500 (June 2003) ¾ Performance of ESC ≈ΣNext Top 4 Computers ¾ ~ 10% of performance of all 6 the Top500 machines 3 TOP500 1 Pflop/s 100 Tflop/s 10 Tflop/s – 1 Tflop/s Performance 100 Gflop/s 10 Gflop/s 1.17 TF/s 1 Gflop/s 59.7 GF/s 100 Mflop/s - 0.4 GF/s June 2003 Fujitsu SUM June 2003 'NWT' NAL Jun-93 N=1 Nov-93 Jun-94 Intel ASCI Red N ov-94 N=500 Sandia Jun-95 N ov-95 374 TF/s SETI@home Jun-96 SETI@home 35.8 TF/s ♦ N ov-96 IBM ASCI White Running on 500,000 PCs, ~1300 CPU Jun-97 N Years per Day ov-97 LLNL ♦ NEC ¾ Jun-98 Sophisticated Data & Signal N ov-98 ES ♦ 1.3M CPU Years so far Processing Analysis Jun-99 Distributes Datasets from Arecibo: Global Distributed Computing 244 GF/s Radio Telescope : Global Distributed ComputingNov-99 Jun-00 N ov-00 My Laptop Jun-01 N ov-01 Jun-02 N ov-02 Jun-03 7 8 4 SETI@home ♦ Use thousands of Internet- connected PCs to help in the search for extraterrestrial intelligence. ♦ When their computer is idle or being wasted this ♦ Largest distributed software will download ~ half a MB chunk of data computation project for analysis. Performs about 3 Tflops for each in existence client in 15 hours. ¾ Averaging 55 Tflop/s ♦ The results of this analysis are sent back to the SETI ♦ Today a number of team, combined with companies trying this thousands of other participants. for profit. 9 ♦ Google query attributes ♦ Eigenvalue problem 9 ¾ 150M queries/day ¾ n=2.7x10 (see: Cleve’s Corner) (2000/second) ¾ 100 countries ¾ 3B documents in the index Forward link ♦ Data centers Back links ¾ 15,000 Linux systems in 6 data ¾ 1 if there’s a hyperlink from page i to j centers ♦ Form a transition probability matrix of ¾ 15 TFlop/s and 1000 TB total the Markov chain capability ¾ Matrix is not sparse, but it is a rank ¾ 40-80 1U/2U servers/cabinet one modification of a sparse matrix ♦ Largest eigenvalue is equal to one; ¾ 100 MB Ethernet want the corresponding eigenvector switches/cabinet with gigabit (the state vector of the Markov Ethernet uplink chain). ¾ ¾ The elements of eigenvector are growth from 4,000 systems Google’s PageRank (Larry Page). (June 2000) ♦ When you search: They have an ¾ 18M queries then inverted index of the web pages ♦ Performance and operation ¾ Words and links that have those words ¾ simple reissue of failed commands ♦ Your query of words: find links then to new servers order lists of pages by their PageRank. 10 ¾ no performance debugging ¾ problems are not reproducible Source: Monika Henzinger, Google & Cleve Moler 5 Grid ComputinComputingg is About … Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations DATA ADVANCED ACQUISITION VISUALIZATION ,ANALYSIS QuickTime™ and a decompressor are needed to see this picture. COMPUTATIONAL RESOURCES IMAGING INSTRUMENTS LARGE-SCALE DATABASES “Telescience Grid”, Courtesy of Mark Ellisman 11 The Grid Architecture Picture User Portal: Application Problem Solving Application Science Grid Access & Info Environments Portals Service Layers: OS Resource Discovery Co- Scheduling & Allocation Fault Tolerance Authentication Naming & Files Events Resource Layer: Computers Data bases Online instruments Software Hardware Software Databases Remote instruments High speed networks and routers 12 6 Some Grid RRequequirements – User Perspective ♦ Single sign-on: authentication to any Grid resources authenticates for all others ♦ Single compute space: one scheduler for all Grid resources ♦ Single data space: can address files and data from any Grid resources ♦ Single development environment: Grid tools and libraries that work on all grid resources 13 Extensible TeraGrid Facility (ETF) Becoming operational Caltech: Data collection analysis LEGEND ANL: Visualization Visualization Sun Cluster 0.4 TF IA-64 IA64 Cluster IA32 Storage Server Shared Memory 1.25 TF IA-64 IA32 Datawulf IA64 96 Viz nodes IA32 IA32 80 TB Storage Disk Storage Backplane Router 20 TB Storage Extensible Backplane Network LA Chicago 30 Gb/s Hub 40 Gb/s Hub 30 Gb/s 30 Gb/s 30 Gb/s 30 Gb/s 4 TF IA-64 10 TF IA-64 6 TF EV68 DB2, Oracle Servers 128 large memory nodes EV7 71 TB Storage 500 TB Disk Storage Sun 230 TB Disk Storage 0.3 TF EV7 shared-memory 6 PB Tape Storage GPFS and data mining 150 TB Storage Server IA64 1.1 TF Power4 EV68 Sun IA64 Pwr4 SDSC: Data Intensive NCSA: Compute Intensive PSC: Compute Intensive 14 7 Atmospheric Sciences Grid Real time data Data Fusion General Circulation model Regional weather model TopographyTopography DatabaseDatabase VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model DatabaseDatabase EmissionsEmissions InventoryInventory 15 Standard Implementation Change Models Real time data Data Fusion MPI MPI MPI General Circulation model GASS/GridFTP/GRC Regional weather model TopographyTopography DatabaseDatabase MPI VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model GASS DatabaseDatabase EmissionsEmissions GASS InventoryInventory MPI 16 8 The Computing Continuum Special Purpose “Grids” Clusters Highly Tightly Tightly Coupled Loosely Loosely Coupled “SETI / Google” Parallel ♦ Each strikes a different balance ¾ computation/communication coupling ♦ Implications for execution efficiency ♦ Applications for diverse needs ¾ computing is only one part of the story! 17 Globus Grid Services ♦ The Globus toolkit provides a range of basic Grid services ¾ Security, information, fault detection, communication, resource management, ... ♦ These services are simple and orthogonal ¾ Can be used independently, mix and match ¾ Programming model independent ♦ For each there are well-defined APIs ♦ Standards are used extensively ¾ E.g., LDAP, GSS-API, X.509, ... ♦ You don’t program in Globus, it’s a set of tools like Unix 18 9 The Grid 19 NetSolve Grid Enabled Server ♦ NetSolve is an example of a Grid based hardware/software/data server. ♦ Based on a Remote Procedure Call model but with … ¾ resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, … ♦ Easy-of-use paramount ♦ Its about providing transparent access to resources. 20 10 NetSolve: The Big Picture Client Schedule Database AGENT(s) Matlab, IBP Depot Octave, Scilab Mathematica C, Fortran, Excel S3 S4 S1 S2 21 No knowledge of the grid required, RPC like. NetSolve: The Big Picture Client Schedule Database A AGENT(s) , B Matlab, IBP Depot Octave, Scilab Mathematica C, Fortran, Excel S3 S4 S1 S2 22 No knowledge of the grid required, RPC like.

High Performance Computing and the Computational Grid

The ASCI Red TOPS Supercomputer

2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe

An Extensible Administration and Configuration Tool for Linux Clusters

Scalability and Performance of Salinas on the Computational Plant

Computer Architectures an Overview

Test, Evaluation, and Build Procedures for Sandia's ASCI Red (Janus) Teraflops Operating System

High Performance Computing

Delivering Insight: the History of the Accelerated Strategic Computing

Sandia's ASCI Red, World's First Teraflop Supercomputer, Is

Performance of Various Computers Using Standard Linear Equations Software

Supercomputers: the Amazing Race Gordon Bell November 2014

Architectural Specification for Massively