Between Now and the End

Grid Computing „ Today March 30th ‰ Grids and NetSolve and „ April 6th ‰ Felix Wolf: More on , peer-to-peer, grid NetSolve infrastructure etc „ April 13th Jack Dongarra ‰ Performance Measurement with PAPI (Dan Terpstra) „ April 20th: ‰ Debugging (Shirley Moore) „ April 25th (Monday): ‰ Presentation of class projects

1 2

More on the In-Class Presentations

„ Start at 1:00 on Monday, 4/25/05 „ Concept has been around for two decades „ Roughly 20 minutes each „ Basic idea: run scheduler across systems to „ Use slides runs processes on least-used systems first

„ Describe your project, perhaps motivate via ‰ Maximize utilization

application ‰ Minimize turnaround time „ Describe your method/approach „ Have to load executables and input files to „ Provide comparison and results selected resource

‰ Shared file system

„ See me about your topic ‰ File transfers upon resource selection

3 4

Examples of Distributed Computing SETI@home: Global Distributed Computing „ Running on 500,000 PCs, ~1000 CPU Years per Day „ Workstation farms, Condor flocks, etc. ‰ 485,821 CPU Years so far „ Sophisticated Data & Signal Processing Analysis ‰ Generally share file system „ Distributes Datasets from Arecibo Radio Telescope „ SETI@home project, United Devices, etc.

‰ Only one source code; copies correct binary code and input data to each system „ Napster, Gnutella: file/data sharing „ NetSolve

‰ Runs numerical kernel on any of multiple independent systems, much like a Grid solution

5 6

1 SETI@home Grid Computing - from ET to „ Use thousands of Internet- connected PCs to help in the Anthrax search for extraterrestrial intelligence. „ Uses data collected with the Arecibo Radio Telescope, in Puerto Rico „ Largest distributed „ When their computer is idle or computation project in being wasted this software will existence download a 300 kilobyte chunk ‰ ~ 400,000 machines of data for analysis. ‰ Averaging 27 Tflop/s „ The results of this analysis are sent back to the SETI team, „ Today many companies trying combined with thousands of this for profit. other participants.

7 8

United Devices and Cancer Screening Grid Computing

„ Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships. „ Resources (HPC systems, visualization systems & displays, storage systems, sensors, instruments, people) are integrated via ‘middleware’ to facilitate use of all resources.

9 10

Why Grids? Why Grids?

„ A biochemist exploits 10,000 computers to screen „ Resources have different functions, but 100,000 compounds in an hour multiple classes resources are necessary for „ 1,000 physicists worldwide pool resources for most interesting problems. petaop analyses of petabytes of data „ Power of any single resource is small „ Civil engineers collaborate to design, execute, & compared to aggregations of resources analyze shake table experiments „ Network connectivity is increasing rapidly in „ Climate scientists visualize, annotate, & analyze bandwidth and availability terabyte simulation datasets „ An emergency response team couples real time „ Large problems require teamwork and data, weather model, population data computation

11 12

2 Why Grids? (contd.) The Grid Problem

„ A multidisciplinary analysis in aerospace couples „ Flexible, secure, coordinated resource code and data in four companies sharing among dynamic collections of „ A home user invokes architectural design individuals, institutions, and resource From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” functions at an application service provider „ Enable communities (“virtual organizations”) „ An application service provider purchases cycles to share geographically distributed from compute cycle providers resources as they pursue common goals -- „ Scientists working for a multinational soap assuming the absence of… company design a new product ‰ central location, ‰ central control, „ A community group pools members’ PCs to analyze alternative designs for a local road ‰ omniscience, ‰ existing trust relationships.

13 14

Elements of the Problem Grid vs. Internet/Web Services?

„ Resource sharing „ We’ve had computers connected by networks ‰ Computers, storage, sensors, networks, … for 20 years ‰ Sharing always conditional: issues of trust, policy, negotiation, payment, … „ We’ve had web services for 5-10 years „ Coordinated problem solving „ The Grid combines these things, and brings ‰ Beyond client-server: distributed data analysis, additional notions computation, collaboration, … ‰ Virtual Organizations „ Dynamic, multi-institutional virtual organisations ‰ Infrastructure to enable computation to be carried out ‰ Community overlays on classic org structures across these ‰ Large or small, static or dynamic „ Authentication, monitoring, information, resource discovery, status, coordination, etc „ Can I just plug my application into the Grid? ‰ No! Much work to do to get there! 15 16

What does this mean now for users and Network Bandwidth Growth developers? „ There is a grand vision of the future „ Network vs. computer performance ‰ Collecting resources around the world into Vos ‰ Computer speed doubles every 18 months ‰ Seamless access to them, with a single signon ‰ Network speed doubles every 9 months ‰ NEW applications to exploit of them in unique ways! ‰ Today we want to help you prepare for this ‰ Difference = order of magnitude per 5 years „ There is a frustrating reality of the present „ 1986 to 2000

‰ These technologies are not yet fully mature ‰ Computers: x 500 ‰ Not fully deployed ‰ Networks: x 340,000 ‰ Not consistent across even single Vos „ „ But centers and funding agencies worldwide are 2001 to 2010 pushing this very, very hard ‰ Computers: x 60 ‰ Better get ready now ‰ Networks: x 4000 ‰ You can help! Work with your centers to get this deployed

17 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American18(Jan- 2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

3 Computational Grids and The Grid Electric Power Grids

„ To treat CPU cycles and software like commodities. „ Why the Computational „ Why the Computational Grid is like the Electric Grid is different from „ Napster on steroids. Power Grid the Electric Power Grid

‰ Electric power is ‰ Wider spectrum of „ Enable the coordinated use of geographically distributed resources – in the absence of central control and existing trust relationships. ubiquitous performance „ Computing power is produced much like utilities such as power and ‰ Don’t need to know the ‰ Wider spectrum of water are produced for consumers. source of the power services

„ Users will have access to “power” on demand (transformer, ‰ Access governed by „ “When the Network is as fast as the computer’s internal links, the generator) or the power more complicated machine disintegrates across the Net into a set of special purpose company that serves it issues appliances” „ Security ‰ Gilder Technology Report June 2000 „ Performance „ Socio-political factors 19 20

What is Grid Computing? The Computational Grid is… Resource sharing & coordinated problem solving in dynamic, multi-institutional „ …a distributed control infrastructure that allows virtual organizations applications to treat compute cycles as commodities. „ Power Grid analogy ‰ Power producers: machines, software, networks, storage

DATA ADVANCED systems ACQUISITION VISUALIZATION ,ANALYSIS ‰ Power consumers: user applications „ Applications draw power from the Grid the way appliances draw electricity from the power utility. QuickTime™ and a decompressor are needed to see this picture. COMPUTATIONAL ‰ Seamless RESOURCES IMAGING INSTRUMENTS ‰ LARGE-SCALE DATABASES High-performance ‰ Ubiquitous ‰ Dependable 21 22

Introduction Introduction

„ Grid systems can be classified depending on „ Computational Grid:

their usage: ‰ denotes a system that has a higher aggregate Distributed Computational Supercomputing capacity than any of its constituent machine Grid High ‰ it can be further categorized based on how the Throughput overall capacity is used Grid Data Grid Systems „ Distributed Supercomputing Grid: On Demand ‰ executes the application in parallel on multiple Service machines to reduce the completion time of a job Collaborative Grid

Multimedia

23 24

4 Introduction Introduction

„ Grand challenge problems typically require a „ Data Grid:

distributed supercomputing Grid – one of the ‰ systems that provide an infrastructure for motivating factors of early Grid research – synthesizing new information from data still driving in some quarters repositories such as digital libraries or data „ High throughput Grid: warehouses ‰ increases the completion rate of a stream of jobs ‰ applications for these systems would be special arriving in real time purpose data mining that correlates information ‰ ASIC or processor design verifications tests would from multiple different high volume data sources be run on a high throughput Grid

25 26

Introduction Introduction

„ Service Grid: „ Multimedia Grid:

‰ systems that provide services that are not ‰ provides an infrastructure for real time multimedia provided by any single machine applications -- requires the support quality of

‰ subdivided based on the type of service they service across multiple different machines provide whereas a multimedia application on a single dedicated machine can be deployed without QoS „ collaborative Grid: ‰ synchronization between network and end-point ‰ connects users and applications into collaborative QoS workgroups -- enable real time interaction between humans and applications via a virtual workspace

27 28

Introduction The Grid

„ demand Grid:

‰ category dynamically aggregates different resources to provide new services

‰ data visualization workbench that allows a scientist to dynamically increase the fidelity of a simulation by allocating more machines to a simulation would be an example

29 30

5 Atmospheric Sciences Grid The Grid Architecture Picture

Real time data User Portals Problem Solving Application Science Data Fusion Grid Access & Info Environments Portals

General Circulation model

Service Layers Resource Discovery Topography Co- Scheduling & Allocation Fault Tolerance Regional weather model Topography DatabaseDatabase

Authentication Naming & Files Events

VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model DatabaseDatabase EmissionsEmissions Resource Layer Computers Data bases Online instruments Inventory Software Inventory

High speed networks and routers 31 32

Standard Implementation Challenges in Grid Computing

Change Models Real time data „ Data Fusion Reliable performance

MPI „ Trust relationships between multiple security MPI MPI domains General Circulation model „ GASS/GridFTP/GRC Deployment and maintenance of grid middleware Regional weather model TopographyTopography DatabaseDatabase across hundreds or thousands of nodes MPI „ Access to data across WAN’s „ Access to state information of remote processes VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model GASS DatabaseDatabase EmissionsEmissions GASS „ Workflow / dependency management InventoryInventory „ Distributed software and license management MPI „ Accounting and billing

33 34

Basic Grid Architecture Advantages of Grid Computing

„ Clusters and how grids are different than „ Use Resources Scattered Across the World clusters • Access to more computing power • Better access to data „ Departmental Grid Model • Utilize unused cycles „ Enterprise Grid Model „ Computing at a new level of complexity and scale „ Global Grid Model „ Increased Collaboration across Virtual Organizations (VO ) • groups of organizations that use the Grid to share resources

35 36

6 Examples of Grids Teragrid Network

„ TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org) „ European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org) „ Access Grid – collaboration systems using commodity technologies (www.accessgrid.org) „ Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)

37 38

...and enabling broad based collaborations

39 NLRTeraGrid Gloriad UltraScience Net40

Grid Possibilities Some Grid Usage Models

„ A biochemist exploits 10,000 computers to screen „ Distributed computing: job scheduling on Grid 100,000 compounds in an hour resources with secure, automated data „ 1,000 physicists worldwide pool resources for transfer petaflop analyses of petabytes of data „ Workflow: synchronized scheduling and „ Civil engineers collaborate to design, execute, & automated data transfer from one system to analyze shake table experiments next in pipeline (e.g. compute-viz-storage) „ Climate scientists visualize, annotate, & analyze „ Coupled codes, with pieces running on terabyte simulation datasets differnet systems simultaneously „ An emergency response team couples real time data, weather model, population data „ Meta-applications: parallel apps spanning multiple systems

41 42

7 Grid Usage Models Example Application Projects

„ Some models are similar to models already „ Earth Systems Grid: environment (US DOE) being used, but are much simpler due to: „ EU DataGrid: physics, environment, etc. (EU) „ EuroGrid: various (EU) ‰ single sign-on „ Fusion Collaboratory (US DOE) ‰ automatic process scheduling „ GridLab: astrophysics, etc. (EU)

‰ automated data transfers „ Grid Physics Network (US NSF) „ MetaNEOS: numerical optimization (US NSF) „ But Grids can encompass new resources „ NEESgrid: civil engineering (US NSF) likes sensors and instruments, so new usage „ Particle Physics Data Grid (US DOE) models will arise

43 44

Some Grid Requirements – Some Grid Requirements – Systems/Deployment Perspective User Perspective „ Identity & authentication „ Adaptation „ Single allocation: if any at all „ Authorization & policy „ Intrusion detection „ Resource discovery „ Single sign-on: authentication to any Grid resources „ Resource management „ Resource characterization authenticates for all others „ Accounting & payment „ Resource allocation „ „ (Co-)reservation, workflow „ Fault management Single compute space: one scheduler for all Grid „ Distributed algorithms „ System evolution resources „ Remote data access „ Etc. „ Single data space: can address files and data from „ High-speed data transfer „ Etc. „ Performance guarantees any Grid resources „ Monitoring „ Single development environment: Grid tools and libraries that work on all grid resources

45 46

Programming & Systems Challenges The Systems Challenges: Resource Sharing Mechanisms That… „ The programming problem

‰ Facilitate development of sophisticated applns „ Address security and policy concerns of

‰ Facilitate code sharing resource owners and users

‰ Requires prog. envs: APIs, SDKs, tools „ Are flexible enough to deal with many „ The systems problem resource types and sharing modalities ‰ Facilitate coordinated use of diverse resources „ Scale to large number of resources, many ‰ Facilitate infrastructure sharing: e.g., certificate authorities, info services participants, many program components

‰ Requires systems: protocols, services „ Operate efficiently when dealing with large ‰ E.g., port/service/protocol for accessing information, allocating resources amounts of data & computation

47 48

8 The Security Problem The Resource Management Problem

„ Resources being used may be extremely valuable & the „ Enabling secure, controlled remote access to problems being solved extremely sensitive computational resources and management of „ Resources are often located in distinct administrative domains

‰ Each resource may have own policies & procedures remote computation „ The set of resources used by a single computation may be large, ‰ Authentication and authorization dynamic, and/or unpredictable ‰ Resource discovery & characterization ‰ Not just client/server „ It must be broadly available & applicable ‰ Reservation and allocation ‰ Standard, well-tested, well-understood protocols ‰ Computation monitoring and control ‰ Integration with wide variety of tools

49 50

Grid Systems Technologies The Programming Problem

„ Systems and security problems addressed by „ How does a user develop robust, secure, new protocols & services. E.g., Globus: long-lived applications for dynamic,

‰ Grid Security Infrastructure (GSI) for security heterogeneous, Grids? ‰ Globus Metadata Directory Service (MDS) for „ Presumably need: discovery ‰ Abstractions and models to add to ‰ Globus Resource Allocations Manager (GRAM) speed/robustness/etc. of development protocol as a basic building block ‰ Tools to ease application development and „ Resource brokering & co-allocation services diagnose common problems ‰ GridFTP, IBP for data movement ‰ Code/tool sharing to allow reuse of code components developed by others

51 52

Grid Programming Technologies Examples of Grid Programming Technologies „ “Grid applications” are incredibly diverse (data, „ MPICH-G2: Grid-enabled message passing collaboration, computing, sensors, …) „ CoG Kits, GridPort: Portal construction, based on N- ‰ Seems unlikely there is one solution tier architectures „ Most applications have been written “from „ GDMP, Data Grid Tools, SRB: replica management, scratch,” with or without Grid services collection management „ Application-specific libraries have been shown to „ Condor-G: simple workflow management provide significant benefits „ Legion: object models for Grid computing „ No new language, programming model, etc., has „ NetSolve: Network enabled solver yet emerged that transforms things „ Cactus: Grid-aware numerical solver framework ‰ But certainly still quite possible ‰ Note tremendous variety, application focus

53 54

9 MPICH-G2: A Grid-Enabled MPI Grid Events

„ A complete implementation of the Message Passing „ Global Grid Forum: working meeting Interface (MPI) for heterogeneous, wide area ‰ Meets 3 times/year, alternates U.S.-Europe, with environments July meeting as major event ‰ Based on the Argonne MPICH implementation of MPI (Gropp and Lusk) „ HPDC: major academic conference „ Globus services for authentication, resource ‰ HPDC-11 in Scotland with GGF-8, July 2002 allocation, executable staging, output, etc. „ Other meetings include „ Programs run in wide area without change ‰ IPDPS, CCGrid, EuroGlobus, Globus Retreats „ See also: MetaMPI, PACX, STAMPI, MAGPIE

www.globus.org/mpi 55 www.gridforum.org, www.hpdc.org 56

Useful References Emergence of Grids

„ Book (Morgan Kaufman) „ But Grids enable much more than apps ‰ www.mkp.com/grids running on multiple computers (which can be „ Perspective on Grids achieved with MPI alone)

‰ “The Anatomy of the Grid: Enabling Scalable ‰ virtual operating system: provides global Virtual Organizations”, IJSA, 2001 workspace/address space via a single login ‰ www.globus.org/research/papers/anatomy.pdf ‰ automatically manages files, data, accounts, and „ All URLs in this section of the presentation, security issues

especially: ‰ connects other resources (archival data facilities, ‰ www.gridforum.org, www.grids-center.org, instruments, devices) and people (collaborative www.globus.org environments)

57 58

Globus Grid Services Grids Are Inevitable

„ Inevitable (at least in HPC): „ The provides a range of basic Grid

‰ leverages computational power of all available systems services ‰ Security, information, fault detection, communication, ‰ manages resources as a single system--easier for users resource management, ... ‰ provides most flexible resource selection and management, load sharing „ These services are simple and orthogonal ‰ Can be used independently, mix and match ‰ researchers’ desire to solve bigger problems will always outpace performance increases of single systems; just as ‰ Programming model independent multiple processors are needed, ‘multiple multiprocessors’ „ For each there are well-defined APIs will be deemed so „ Standards are used extensively ‰ E.g., LDAP, GSS-API, X.509, ... „ You don’t program in Globus, it’s a set of tools like Unix

59 60

10 Basic Grid Building Blocks Maturation of Grid Computing

Computational Resources IBP – Internet „ Research focus moving from building of basic Clusters Backplane Reply Choice MPP infrastructure and application demonstrations to Protocol is Workstations middleware for MPI, Condor,... ‰ Middleware managing and using remote ‰ Usable production environments storage. Client Agent ‰ Application performance Request ‰ Scalability Æ Globalization

RPC-like „ Submission Execution Development, research, and integration happening Customer Owner Agent Agent NetSolve – Solving outside of the original infrastructure groups computational Object Object Files Files problems remotely Application Execution „ Agent Agent Grids becoming a first-class tool for scientific Data & Object Files communities Application Application Condor – Ckpt Process Process Files harnessing idle ‰ Remote GriPhyN (Physics), BIRN (Neuroscience), NVO (Astronomy), I/O & workstations Ckpt for high-throughput Cactus (Physics), … computing 61 62

Broad Acceptance of Grids as a Critical Platform Broad Acceptance of Grids as a Critical Platform for for Computing Computing

„ Widespread interest from industry in „ Widespread interest from government in developing computational Grid platforms developing computational Grid platforms „ IBM, Sun, Entropia, Avaki, Platform, …

On August 2, 2001, IBM announced a new corporate initiative to support and exploit Grid computing. AVAKI AP reported that IBM was investing $4 billion into building 50 NSF’s Cyberinfrastructure computer server farms NASA’s Information Power Grid around the world.

DOE’s Science Grid 63 64

Grids Form the Basis of a National ) s Distributed and Parallel Systems / Information Infrastructure p ) lo f /s p t T c m lo 0 f e e r 3 T n e n m 1 t s ( 7 o t 2 s w ( c s s Massively lu f r i August 9, 2001: NSF Distributed e / e p d c o t D o e l parallel m g f w n l f Awarded $53,000,000 o a s l k i systems i in r s e T h a u r l l p t o a l I b u e i systems to SDSC/NPACI @ o w t a I r d p w c r C hetero- t i o t s T r e a S and NCSA/Alliance n m e e lu p homo- E G o P A geneous S E B N C s for TeraGrid C geneous

TeraGrid will provide in „ Gather (unused) resources „ Bounded set of resources aggregate „ Steal cycles „ Apps grow to consume all cycles „ System SW manages resources „ Application manages resources

• 13.6 trillion calculations per second „ System SW adds value „ System SW gets in the way • Over 600 trillion bytes of immediately accessible data „ 10% - 20% overhead is OK „ 5% overhead is maximum • 40 gigabit per second network speed „ Resources drive applications „ Apps drive purchase of equipment • Provide a new paradigm for data-oriented computing „ Time to completion is not critical „ Real-time constraints • Critical for disaster response, genomics, environmental modeling, etc. „ Time-shared „ Space-shared

65 66

11 Basic Usage NetSolve Scenarios Network Enabled Server

„ Grid based numerical library „ NetSolve is an example of a grid based routines „ “Blue Collar” Grid Based hardware/software server. ‰ User doesn’t have to have Computing „ Easy-of-use paramount software library on their ‰ Does not require deep „ Based on a RPC model but with … machine, LAPACK, SuperLU, knowledge of network ‰ resource discovery, dynamic problem solving ScaLAPACK, PETSc, AZTEC, programming capabilities, load balancing, fault tolerance ARPACK ‰ Level of expressiveness right asynchronicity, security, … „ Task farming applications for many users „ Other examples are NEOS from Argonne and ‰ “Pleasantly parallel” execution ‰ User can set things up, no NINF Japan. ‰ eg Parameter studies “su” required „ Use resources, not tie together geographically „ Remote application execution ‰ In use today, up to 200 distributed resources, for a single application.

‰ Complete applications with user servers in 9 countries specifying input parameters and „ Can plug into Globus, receiving output Condor, NINF, … 67 68

NetSolve: The Big Picture NetSolve: The Big Picture Client Schedule Client Schedule Database Database

AGENT(s) A AGENT(s) , B Matlab Matlab IBP Depot IBP Depot Mathematica Mathematica C, Fortran C, Fortran Web Web

S3 S4 S3 S4 S1 S2 S1 S2 C C A A

No knowledge of the grid required, RPC like.69 No knowledge of the grid required, RPC like.70

NetSolve: The Big Picture NetSolve: The Big Picture Client Schedule Client Schedule Database Database Request

H AGENT(s) AGENT(s) an S2 ! b d a le Matlab ck Matlab IBP Depot IBP Depot Mathematica Mathematica O P C, Fortran C, Fortran , h Web Web A a n nd s l A w e , e B r S3 S4 (C S3 S4 ) Op(C, A, B) S1 S2 S1 S2 C C A A

No knowledge of the grid required, RPC like.71 No knowledge of the grid required, RPC like.72

12 NetSolve Agent Agent NetSolve Agent Agent

„ Name server for the „ Resource Scheduling (cont’d): NetSolve system. ‰ CPU Performance (LINPACK). „ Information Service ‰ Network bandwidth, latency. ‰ Server workload. ‰ client users and administrators can query the hardware and software services available. ‰ Problem size/algorithm complexity. ‰ Calculates a “Time to Compute.” for each appropriate „ Resource scheduler server. ‰ maintains both static and dynamic information ‰ Notifies client of most appropriate server. regarding the NetSolve server components to use for the allocation of resources

73 74

NetSolve - Load Balancing NetSolve Client NetSolve agent : predicts the execution times and sorts the servers Prediction for a server based on : „ Function Based Interface. Client „ Client program embeds call from • Its distance over the network - Latency and Bandwidth NetSolve’s API to access additional - Statistical Averaging resources. • Its performance (LINPACK benchmark) „ Interface available to C, Fortran, Matlab, • Its workload Mathematica, and Java. • The problem size and the algorithm complexity „ Opaque networking interactions. Quick estimate „ NetSolve can be invoked using a variety of Cached data methods: blocking, non-blocking, task farms, workload out of date ? …

75 76

NetSolve Client NetSolve Client

Client i. Client makes request to agent. Client „ Intuitive and easy to use. „ Matlab Matrix multiply e.g.: ii. Agent returns list of servers.

‰ A = matmul(B, C); iii. Client tries each one in turn until A = netsolve(‘matmul’, B, C); one executes successfully or list is • Possible parallelisms hidden. exhausted.

77 78

13 NetSolve - MATLAB Interface NetSolve - FORTRAN Interface Easy to ‘switch’ to NetSolve Synchronous Call >> define sparse matrix A parameter( MAX = 100) >> define rhs double precision A(MAX,MAX), B(MAX) >> [x, its] = netsolve( ‘itmeth’, ‘petsc’, A, rhs ); integer IPIV(MAX), N, INFO, LWORK … integer NSINFO >> [x, its] = netsolve( ‘itmeth’, ‘aztec’, A, rhs ); call DGESV(N,1,A,MAX,IPIV,B,MAX,INFO) >> [x, its] = netsolve( ‘solve’, ‘superlu’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘ma28’, A, rhs ); call NETSL(‘DGESV()’,NSINFO, N,1,A,MAX,IPIV,B,MAX,INFO)

Asynchronous Calls also available 79 80

Hiding the Parallel Processing Problem Description File

„ User maybe unaware of parallel processing „ Problem Description File defines problem specification used to add functional modules to NetSolve server. „ Wrapper to provide binding between the NetSolve client interface and server function being integrated. „ Complex syntax defines input/output objects, calling sequences, libraries to link, etc… „ Parsed by NetSolve to create “service” program. „ NetSolve takes care of the starting the message passing system, data distribution, and returning the results.

81 82

Generating New Services in NetSolve Problem Description Specification

I U G a av „ Add additional functionality „ Specifies the calling interface between GridSolve and the service J routine ‰ Describe the interface (arguments)

‰ Generate wrapper New Service Added! „ Original NetSolve problem description files ‰ Install into server ‰ Strange notation

Service ‰ Difficult for users to understand Server New Service @PROBLEM degsv Service „ Previous attempts to simplify involved GUI front-ends @DESCRIPTION This is a linear solver for Service Service dense matrices from the LAPACK Library. Solves Ax=b. @INPUT 2 NetSolve „ In GridSolve, the format is totally re-designed @OBJECT MATRIX DOUBLE A Double precision matrix Parser/ ‰ Specified in a manner similar to normal function prototypes @OBJECT VECTOR DOUBLE b Right hand side ‰ Similar to Ninf @OUTPUT 1 Compiler @OBJECT VECTOR DOUBLE x …

83 84

14 GridSolve Problem Description (DGESV) GridSolve Interface Definition Language Original Fortran Subroutine: SUBROUTINE DGESV(N,NRHS,A,LDA,IPIV,B,LDB,INFO) „ Data types: int, char, float, double INTEGER INFO, LDA, LDB, N, NRHS „ Argument passing modes: INTEGER IPIV( * ) ‰ IN -- input only; not modified DOUBLE PRECISION A( LDA, * ), B( LDB, * ) ‰ INOUT -- input and output GridSolve IDL Specification: ‰ OUT -- output only SUBROUTINE dgesv(IN int N, IN int NRHS, ‰ VAROUT -- variable length output only data INOUT double A[LDA][N], IN int LDA, ‰ WORKSPACE -- server-side allocation of workspace; OUT int IPIV[N], INOUT double B[LDB][NRHS], not passed as part of calling sequence IN int LDB, OUT int INFO) „ Argument size "This solves Ax=b using LAPACK" ‰ Specified as expression using scalar arguments, e.g. LANGUAGE = "FORTRAN" ddot(IN int n, IN double dx[n*incx], IN int incx, … LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)" ‰ All typical operators supported (+, -, *, /, etc). COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS" MAJOR="COLUMN" 85 86

Problem Attributes Building the Services

„ MAJOR „ cd GridSolve/src/problem ‰ Row or column major; depends on the „ make check implementation of the service routine „ LANGUAGE „ When building a new service, the server should be restarted, but thereafter it is not necessary. ‰ Language in which the service routine is implemented; currently C or Fortran „ LIBS „ For more detailed documentation, consult the manual: ‰ Additional libraries to be linked ‰ cd GridSolve/doc/ug „ COMPLEXITY ‰ make ‰ Theoretical complexity of the service routine, ‰ ghostview ug.ps specified in terms of the arguments

87 88

NetSolve: NetSolve: How It Works

How to NetSolveInstall Software NetSolve Client server daemon Client server daemon Register stubs stubs NetSolve problem NetSolve problem description files description files Computational Computational Modules Modules

Reply

Request Query Java applet • User can install new components • Problem description files • Problem description files • Client download stubs at run-time • Java applet to generate them • Problem description files are portable • Java applet to generate them

89 90

15 Task Farming - Data Persistence Multiple Requests To Single Problem

„ A Solution: „ Chain together a sequence of NetSolve requests. ‰ Many calls to netslnb( ); /* non-blocking */ „ Analyze parameters to determine data dependencies. Essentially a DAG is created where „ Farming Solution: nodes represent computational modules and arcs ‰ Single call to netsl_farm( ); represent data flow.

„ Request iterates over an “array of input parameters.” „ Transmit superset of all input/output parameters and make persistent near server(s) for duration of „ Adaptive scheduling algorithm. sequence execution. „ Schedule individual request modules for execution. „ Useful for parameter sweeping, and independently parallel applications.

91 92

Data Persistence (cont’d) NetSolve Authentication with Kerberos

command1(A,sequence(A, B, B) E) Client Server „ Kerberos used to maintain Access Control result C Lists and manage access to computational netsl_begin_sequence( ); input A, netsl(“command1”, A, B, C); resources. command2(A,intermediate output C) C netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); „ NetSolve properly handles authorized and Client Server netsl(“command3”, D, E, F); non-authorized components together in the result D intermediate output D, netsl_end_sequence(C, D); same system. input E command3(D, E) Client Server result F

93 94

NetSolve Authentication Server Software Repository with Kerberos

Servers register NetSolve servers Client sends ticket and input to HardwareHardware SoftwareSoftware their presence server; server authenticates and with the agent returns the solution set and KDC NetSolve Server

NetSolve Kerberos „ Dynamic downloading of new software. agent KDC „ Enhance servers capabilities without shutdown and restart. Client requests ticket from KDC „ Repository maintained independently of server. Client sends work request to Client issues problem request; server; server replies requesting Agent responds with list of servers authentication credentials NetSolve client 95 96

16 NetSolve: A Plug into the Grid NetSolve: A Plug into the Grid

C Fortran C Fortran

Grid Resource Discovery NetSolve Fault Tolerance Grid Resource Discovery NetSolve Fault Tolerance middleware System Management Resource Scheduling middleware System Management Resource Scheduling

Globus NetSolve Ninf Condor Globus NetSolve Ninf Condor proxy proxy proxy proxy proxy proxy proxy proxy

NetSolve Ninf Globus Condor servers servers Grid back-ends NetSolve NetSolve servers servers

97 98

NetSolve: A Plug into the Grid NPACI Alpha Project - MCell: 3-D Monte-Carlo Simulation of Neuro-Transmitter Release in Between Cells PSE front-ends •UCSD (F. Berman, H. Casanova, M. Ellisman), Salk Institute (T. SCIRun Matlab Mathematica Custom Bartol), CMU (J. Stiles), UTK (Dongarra, R. Wolski) •Study how neurotransmitters diffuse and activate receptors in synapses Remote procedure call •blue unbounded, red singly bounded, green doubly bounded closed, C Fortran yellow doubly bounded open

Grid Resource Discovery NetSolve Fault Tolerance middleware System Management Resource Scheduling

Globus NetSolve Ninf Condor proxy proxy proxy proxy

NetSolve Ninf Globus Condor servers servers Grid back-ends NetSolve NetSolve servers servers

99 100

MCell: 3-D Monte-Carlo Simulation of Neuro- Netsolve and SCIRun SCIRun torso Transmitter Release in Between Cells defibrillator

•Developed at: Salk Institute, CMU application – •In the past, manually run on available workstations Chris Johnson, •Transparent Parallelism, Load balancing, Fault-tolerance U of Utah •Fits the farming semantic and need for NetSolve •Collaboration with AppLeS Project for scheduling tasks List of seeds

script

Scrip

Scrip …… Scrip …… …... …… …... Scrip …... Scrip Scrip …… …... …… Scrip …… Scrip Scrip Scrip …... …… …... …… …… …… …... …... …... …... 101 102

17 IPARS-enabled Web Servers Server IPARS Web NetSolve Interface Client „ Integrated Parallel Accurate Reservoir Simulator. „ TICAM of UT, Austin, Director, Dr. Mary Wheeler. „ Integrated Parallel Accurate Reservoir Simulator. „ Portable and Modular reservoir simulator. ‰ Mary Wheeler’s group, UT-Austin „ Models waterflood, black oil, compositional, well management, recovery „ Reservoir and Environmental Simulation. process … ‰ models black oil, waterflood, compositions „ Reservoir and Environmental Simulation. ‰ 3D transient flow of multiple phase ‰ models black oil, waterflood, compositions „ Integrates Existing Simulators. ‰ 3D transient flow of multiple phase „ Framework simplified development „ Integrates Existing Simulators. ‰ Provides solvers, handling for wells, table lookup. „ Framework simplified development ‰ Provides pre/postprocessor, visualization. ‰ Provides solvers, handling for wells, table lookup. „ Full IPARS access without Installation. ‰ Provides pre/postprocessor, visualization. „ IPARS Interfaces: „ Full IPARS access without Installation. ‰ C, FORTRAN, Matlab, Mathematica, and Web. „ IPARS Interfaces:

‰ C, FORTRAN, Matlab, Mathematica, and Web.

103 104

University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG

IPARS-enabled Servers

Web Server

Web NetSolve Client „ Federated Ownership: CS, Chem Interface Eng., Medical School, Computational Ecology, El. Eng. NetSolve server post-processing for visualization. „ Real applications, Possible rendering of visualization via the internet The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. middleware using web browsers. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, and Oak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2), development, Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA) 105 Regional Information Infrastructure (RII), and Southern Crossroads (SoX). logistical 106 The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002. networking

NetSolve Monitor Demo1 – Blocking Calls

„ http://anaka.cs.utk.edu:8080/monitor/signed.html „ This demo runs through 3 calls, sorting, solving a system of linear equations, and finding the eigenvalues of a matrix.

‰ [c] = netsolve('dqsort',b);

‰ [x,y,z,info]=netsolve('dgesv',a,b);

‰ [a,wr,wi,vl,vr,info]=netsolve('dgeev','N','V',a);

„ This will invoke a quick sort algorithm and the LAPACK routines for Ax=b and Ax=lx.

„ It has one input, the size of the problem.

107 108

18 Demo2 – Non-Blocking Calls Demo7 – Sparse Matrix

„ This example solves a „ This example shows a non-blocking call to sparse matrix problem NetSolve. using the SuperLU, MA28, ‰ [rr1]=netsolve_nb('send','dgesv',a1,b1); PETSc, and AZTEC. ‰ while status1 < 0, [status1]=netsolve_nb('probe',rr1); ‰ [x]=netsolve('sparse_direct_solve', 'SUPERLU',A,rhs,0.3,1); ‰ end ‰ [x]=netsolve('sparse_direct_solve', ‰ [aa1,ipiv1,x1,info]=netsolve_nb('wait',rr1); 'MA28',A,rhs,0.3,1);

‰ [x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500);

‰ [x,its]=netsolve('sparse_iterative_solve', 109 'AZTEC',A,rhs,1.e-6,500); 110

Demo3 – SuperLU Demo4 – MA28

„ This example solves a sparse „ This example solves a „ matrix problem using the sparse matrix problem SuperLU software from using the MA28 software Sherry Li and Jim Demmel at from Harwell-Rutherford Library Berkeley. ‰ [x]=netsolve('sparse_direct_solve',

‰ [x]=netsolve('sparse_direct_solve', 'MA28',A,rhs,0.3,1); 'SUPERLU',A,rhs,0.3,1);

111 112

Demo5 - PETSc Demo6 - AZTEC

„ This example solves a „ This example solves a sparse sparse matrix problem using matrix problem using the the PETSc software from AZTEC software from Argonne National Lab. Sandia National Lab. „ Parallel processing is used in NetSolve to „ Parallel processing is used in NetSolve to solve the problem. solve the problem. ‰ ‰ [x,its]=netsolve('sparse_iterative_solve', [x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500); 'AZTEC',A,rhs,1.e-6,500);

113 114

19 Things Not Touched On Thanks

„ Fran Berman „ Hierarchy of Agents ‰ More scalable configuration ‰ Director, San Diego Center „ Monitor NetSolve Network ‰ Track and monitor usage „ Jay Boisseau „ Network status ‰ ‰ Network Weather Service Director, Texas Advanced Computing Center „ Internet Backplane Protocol ‰ Middleware for managing and using remote storage. „ Fault Tolerance ‰ Volker Strumpen’s Porch „ Local / Global Configurations „ Automated Adaptive Algorithm Selection ‰ Dynamic determine the nest algorithm based on system status and nature of user problem

115 116

20