High Performance Computing and the Computational Grid

Jack Dongarra University of Tennessee and Oak Ridge National Lab

September 23, 12003

TTecechnology TTrenrends: Microprocessor Capacity

Gordon Moore (co-founder of ) predicted in 1965 that the transistor density of semiconductor Microprocessors have become chips would double roughly every smaller, denser, and more powerful. 18 months. Not just processors, bandwidth, storage, etc. 2X transistors/Chip Every 1.5 years 2X memory and processor speed and Called “Moore’s Law” ½ size, cost, & power every 18 months. 2

1 Moore’s Law Moore’s Law Super Scalar/Vector/Parallel 1 PFlop/s Earth Parallel Simulator

ASCI White ASCI Red 1 TFlop/s Pacific TMC CM-5 Cray T3D

Vector TMC CM-2 Cray 2 1 GFlop/s Cray X-MP Super Scalar Cray 1

1941 1 (Floating Point operations / second, Flop/s) CDC 7600 IBM 360/195 1945 100 1 MFlop/s Scalar 1949 1,000 (1 KiloFlop/s, KFlop/s) CDC 6600 1951 10,000 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) IBM 7090 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1 KFlop/s 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) UNIVAC 1 2000 10,000,000,000,000 EDSAC 1 2003 35,000,000,000,000 (35 TFlop/s) 3 1950 1960 1970 1980 1990 2000 2010

H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP

Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Mannheim, Germany in June

- All data available from www..org 4

2 June 2003

Manufacturer Computer Rmax Installation Site Year # Proc Rpeak

Earth Simulator Center 1 NEC Earth-Simulator 35860 2002 5120 40960 Yokohama

Hewlett- ASCI Q - AlphaServer SC Los Alamos National Laboratory 2 13880 2002 8192 20480 Packard ES45/1.25 GHz Los Alamos

Linux NetworX MCR Cluster 2.4 GHz - Lawrence Livermore National Laboratory 3 7634 2002 2304 11060 Quadrics Quadrics Livermore

Lawrence Livermore National Laboratory 4 IBM ASCI White, SP Power3 375 MHz 7304 2000 8192 12288 Livermore

NERSC/LBNL 5 IBM SP Power3 375 MHz 16 way 7304 2002 6656 9984 Berkeley

xSeries Cluster Xeon 2.4 GHz - Lawrence Livermore National Laboratory 6 IBM/Quadrics 6586 2003 1920 9216 Quadrics Livermore

National Aerospace Lab 7 Fujitsu PRIMEPOWER HPC2500 (1.3 GHz) 5406 2002 2304 11980 Tokyo

Hewlett- rx2600 Itanium2 1 GHz Cluster - Pacific Northwest National Laboratory 8 4881 2003 1540 6160 Packard Quadrics Richland

Hewlett- Pittsburgh Supercomputing Center 9 AlphaServer SC ES45/1 GHz 4463 2001 3016 6032 Packard Pittsburgh

Hewlett- Commissariat a l'Energie Atomique (CEA) 10 AlphaServer SC ES45/1 GHz 3980 2001 2560 5120 Packard Bruyeres-le-Chatel 5

A Tour de Force in Engineering

♦ Homogeneous, Centralized, Proprietary, Expensive! ♦ Target Application: CFD-Weather, Climate, Earthquakes ♦ 640 NEC SX/6 Nodes (mod) ¾ 5120 CPUs which have vector ops ¾ Each CPU 8 Gflop/s Peak ♦ 40 TFlop/s (peak) ♦ $1/2 Billion for machine & building ♦ Footprint of 4 tennis courts ♦ 7 MWatts ¾ Say 10 cent/KWhr - $16.8K/day = $6M/year! ♦ Expect to be on top of Top500 until 60-100 TFlop ASCI machine arrives

♦ From the Top500 (June 2003) ¾ Performance of ESC ≈ΣNext Top 4 Computers ¾ ~ 10% of performance of all 6 the Top500 machines

3 TOP500 – Performance - June 2003

1 Pflop/s 374 TF/s 100 Tflop/s SUM 35.8 TF/s 10 Tflop/s NEC ES 1.17 TF/s N=1 IBM ASCI White 1 Tflop/s LLNL 59.7 GF/s Intel ASCI Red 100 Gflop/s Sandia Fujitsu 244 GF/s 10 Gflop/s 'NWT' NAL N=500 0.4 GF/s 1 Gflop/s My Laptop 100 Mflop/s

3 3 4 4 95 5 6 96 7 7 8 8 9 9 00 0 1 01 2 2 3 -9 -9 - v-9 -9 -9 n-9 - v-0 -0 -0 -0 -0 ov o ov- un ov-9 ov-9 o ov- ov un Jun-9Nov-9Jun N Jun N Jun N J N Jun-9N Ju Nov-9Jun N Jun N Jun N J 7

SETI@home: Global Distributed Computing ♦ Running on 500,000 PCs, ~1300 CPU Years per Day ¾ 1.3M CPU Years so far ♦ Sophisticated Data & Signal Processing Analysis ♦ Distributes Datasets from Arecibo Radio Telescope

8

4 SETI@home

♦ Use thousands of Internet- connected PCs to help in the search for extraterrestrial intelligence. ♦ When their computer is idle or being wasted this ♦ Largest distributed software will download ~ half a MB chunk of data computation project for analysis. Performs about 3 Tflops for each in existence client in 15 hours. ¾ Averaging 55 Tflop/s ♦ The results of this analysis are sent back to the SETI ♦ Today a number of team, combined with companies trying this thousands of other participants. for profit. 9

♦ Google query attributes ♦ Eigenvalue problem 9 ¾ 150M queries/day ¾ n=2.7x10 (see: Cleve’s Corner) (2000/second) ¾ 100 countries ¾ 3B documents in the index Forward link ♦ Data centers Back links ¾ 15,000 Linux systems in 6 data ¾ 1 if there’s a hyperlink from page i to j centers ♦ Form a transition probability matrix of ¾ 15 TFlop/s and 1000 TB total the Markov chain capability ¾ Matrix is not sparse, but it is a rank ¾ 40-80 1U/2U servers/cabinet one modification of a sparse matrix ♦ Largest eigenvalue is equal to one; ¾ 100 MB Ethernet want the corresponding eigenvector switches/cabinet with gigabit (the state vector of the Markov Ethernet uplink chain). ¾ ¾ The elements of eigenvector are growth from 4,000 systems Google’s PageRank (Larry Page). (June 2000) ♦ When you search: They have an ¾ 18M queries then inverted index of the web pages ♦ Performance and operation ¾ Words and links that have those words ¾ simple reissue of failed commands ♦ Your query of words: find links then to new servers order lists of pages by their PageRank. 10 ¾ no performance debugging ¾ problems are not reproducible Source: Monika Henzinger, Google & Cleve Moler

5 Grid ComputinComputingg is About …

Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

DATA ADVANCED ACQUISITION VISUALIZATION ,ANALYSIS

QuickTime™ and a decompressor are needed to see this picture. COMPUTATIONAL RESOURCES IMAGING INSTRUMENTS LARGE-SCALE DATABASES

“Telescience Grid”, Courtesy of Mark Ellisman 11

The Grid Architecture Picture

User Portal: Application Problem Solving Application Science Grid Access & Info Environments Portals

Service Layers: OS Resource Discovery Co- Scheduling & Allocation Fault Tolerance

Authentication Naming & Files Events

Resource Layer: Computers Data bases Online instruments Software Hardware Software Databases Remote instruments High speed networks and routers 12

6 Some Grid RRequequirements – User Perspective

♦ Single sign-on: authentication to any Grid resources authenticates for all others ♦ Single compute space: one scheduler for all Grid resources ♦ Single data space: can address files and data from any Grid resources ♦ Single development environment: Grid tools and libraries that work on all grid resources

13

Extensible TeraGrid Facility (ETF) Becoming operational

Caltech: Data collection analysis LEGEND ANL: Visualization Visualization Sun Cluster 0.4 TF IA-64 IA64 Cluster IA32 Storage Server Shared Memory 1.25 TF IA-64 IA32 Datawulf IA64 96 Viz nodes IA32 IA32 80 TB Storage Disk Storage Backplane Router 20 TB Storage

Extensible Backplane Network LA Chicago 30 Gb/s Hub 40 Gb/s Hub 30 Gb/s

30 Gb/s 30 Gb/s 30 Gb/s

4 TF IA-64 10 TF IA-64 6 TF EV68 DB2, Oracle Servers 128 large memory nodes EV7 71 TB Storage 500 TB Disk Storage Sun 230 TB Disk Storage 0.3 TF EV7 shared-memory 6 PB Tape Storage GPFS and data mining 150 TB Storage Server IA64 1.1 TF Power4 EV68 Sun IA64 Pwr4

SDSC: Data Intensive NCSA: Compute Intensive PSC: Compute Intensive

14

7 Atmospheric Sciences Grid

Real time data

Data Fusion

General Circulation model

Regional weather model TopographyTopography DatabaseDatabase

VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model DatabaseDatabase EmissionsEmissions InventoryInventory

15

Standard Implementation Change Models Real time data

Data Fusion

MPI MPI MPI General Circulation model

GASS/GridFTP/GRC Regional weather model TopographyTopography DatabaseDatabase MPI

VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model GASS DatabaseDatabase EmissionsEmissions GASS InventoryInventory

MPI

16

8 The Computing Continuum

Special Purpose “Grids” Clusters Highly Tightly Tightly Coupled

Loosely Loosely Coupled “SETI / Google” Parallel

♦ Each strikes a different balance ¾ computation/communication coupling ♦ Implications for execution efficiency ♦ Applications for diverse needs ¾ computing is only one part of the story!

17

Globus Grid Services

♦ The Globus toolkit provides a range of basic Grid services ¾ Security, information, fault detection, communication, resource management, ... ♦ These services are simple and orthogonal ¾ Can be used independently, mix and match ¾ Programming model independent ♦ For each there are well-defined APIs ♦ Standards are used extensively ¾ E.g., LDAP, GSS-API, X.509, ... ♦ You don’t program in Globus, it’s a set of tools like Unix 18

9 The Grid

19

NetSolve Grid Enabled Server

♦ NetSolve is an example of a Grid based hardware/software/data server. ♦ Based on a Remote Procedure Call model but with … ¾ resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, … ♦ Easy-of-use paramount ♦ Its about providing transparent access to resources.

20

10 NetSolve: The Big Picture

Client Schedule Database

AGENT(s)

Matlab, IBP Depot Octave, Scilab Mathematica C, Fortran, Excel

S3 S4 S1 S2

21 No knowledge of the grid required, RPC like.

NetSolve: The Big Picture

Client Schedule Database

A AGENT(s) , B Matlab, IBP Depot Octave, Scilab Mathematica C, Fortran, Excel

S3 S4 S1 S2

22 No knowledge of the grid required, RPC like.

11 NetSolve: The Big Picture

Client Schedule Database

H AGENT(s) a n b d a le Matlab, ck IBP Depot Octave, Scilab Mathematica C, Fortran, Excel

S3 S4 S1 S2

23 No knowledge of the grid required, RPC like.

NetSolve: The Big Picture

Client Schedule Database Request AGENT(s) S2 ! Matlab, IBP Depot Octave, Scilab O P Mathematica , A h C, Fortran, an n d sw leA Excel , e B r (C S3 S4 ) Op(C, A, B) S1 S2

24 No knowledge of the grid required, RPC like.

12 Agent

Hardware Server SWS, User’s (HWS) Request Agent Passed to HW Selects Server TRUSTED HWServer ENVIRONMENT & SWServer Software Software Description Returned Passed to User Repository Request Results Call NetSolve(…) Back to Software Client User Server (SWS)

Client

NetSolve Agent Agent

♦ Name server for the NetSolve system. ♦ Information Service ¾ client users and administrators can query the hardware and software services available. ♦ Resource scheduler ¾ maintains both static and dynamic information regarding the NetSolve server components to use for the allocation of resources

26

13 NetSolve Agent Agent

♦ Resource Scheduling (cont’d): ¾ CPU Performance (LINPACK). ¾ Network bandwidth, latency. ¾ Server workload. ¾ Problem size/algorithm complexity. ¾ Calculates a “Time to Compute.” for each appropriate server. ¾ Notifies client of most appropriate server.

27

NetSolve Client

♦ Function Based Interface. ♦ Client program embeds call Client from NetSolve’s API to access additional resources. ♦ Interface available to C, Fortran, Matlab, Octave, Mathematica, … ♦ Opaque networking interactions. ♦ NetSolve can be invoked using a variety of methods: blocking, non- blocking, task farms, … 28

14 Hiding the Parallel Processing

♦ User maybe unaware of parallel processing

♦ NetSolve takes care of the starting the message passing system, data distribution, and returning

the results. 29

Basic Usage Scenarios

♦ Grid based numerical library routines ♦ “Blue Collar” Grid Based ¾ User doesn’t have to have software library on their Computing machine, LAPACK, SuperLU, ¾ Does not require deep ScaLAPACK, PETSc, AZTEC, knowledge of network ARPACK programming ♦ Task farming applications ¾ Level of expressiveness ¾ “Pleasantly parallel” execution right for many users eg Parameter studies ¾ User can set things up, ♦ Remote application execution no “su” required ¾ Complete applications with user ¾ In use today, up to 200 specifying input parameters and servers in 9 countries receiving output ♦ Can plug into Globus, Condor, NINF, … 30

15 University of Tennessee Deployment: Scalable Intracampus Research Grid: SInRG

♦ Federated Ownership: CS, Chem Eng., Medical School, Computational Ecology, El. Eng. The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. ♦ An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, and Real applications, middleware Oak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2), Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA) development, logistical 31 Regional Information Infrastructure (RII), and Southern Crossroads (SoX). networking The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.

NetSolve- Things Not Touched On ♦ Integration with other NMI tools ¾ Globus, Condor, Network Weather Service ♦ Security ¾ Using Kerberos V5 for authentication. ♦ Separate Server Characteristics ¾ Hardware and Software servers ♦ Monitor NetSolve Network ¾ Track and monitor usage ♦ Fault Tolerance ♦ Local / Global Configurations ♦ Dynamic Nature of Servers ♦ Automated Adaptive Algorithm Selection ¾ Dynamic determine the best algorithm based on system status and nature of user problem ♦ NetSolve evolving into GridRPC ¾ Being worked on under GGF with joint with NINF 32

16 If You Want to Participate …

33

Grids vs. Capability vs. Cluster Computing ♦ Not an “either/or” question ¾ Each addresses different needs ¾ Each are part of an integrated solution ♦ Grid strengths ¾ Coupling necessarily distributed resources ¾ instruments, software, hardware, archives, and people ¾ Eliminating time and space barriers ¾ remote resource access and capacity computing ¾ Grids are not a cheap substitute for capability HPC ♦ Capability computing strengths ¾ Supporting foundational computations ¾ terascale and petascale “nation scale” problems ¾ Engaging tightly coupled computations and teams ♦ Clusters ¾ Low cost, group solution ¾ Potential hidden costs 34

17 Collaborators / Support

♦ TOP500 ¾Thanks ¾ H. Mauer, Mannheim U ¾ H. Simon, NERSC NSF ¾ E. Strohmaier, NERSC E. Strohmaier, NERSC Next Generation Software ♦ NetSolve ¾ Sudesh Agrawal, UTK ¾ Keith Seymour, UTK ¾ Karin Sagi, UTK ¾ Zhiao Shi, UTK ¾ Henri Casanova, UCSD

♦ For more information… Many opportunities within my group at Tennessee 35

18