Between Now and the End
Grid Computing Today March 30th Grids and NetSolve and April 6th Felix Wolf: More on Grid computing, peer-to-peer, grid NetSolve infrastructure etc April 13th Jack Dongarra Performance Measurement with PAPI (Dan Terpstra) April 20th: Debugging (Shirley Moore) April 25th (Monday): Presentation of class projects
1 2
More on the In-Class Presentations Distributed Computing
Start at 1:00 on Monday, 4/25/05 Concept has been around for two decades Roughly 20 minutes each Basic idea: run scheduler across systems to Use slides runs processes on least-used systems first
Describe your project, perhaps motivate via Maximize utilization
application Minimize turnaround time Describe your method/approach Have to load executables and input files to Provide comparison and results selected resource
Shared file system
See me about your topic File transfers upon resource selection
3 4
Examples of Distributed Computing SETI@home: Global Distributed Computing Running on 500,000 PCs, ~1000 CPU Years per Day Workstation farms, Condor flocks, etc. 485,821 CPU Years so far Sophisticated Data & Signal Processing Analysis Generally share file system Distributes Datasets from Arecibo Radio Telescope SETI@home project, United Devices, etc.
Only one source code; copies correct binary code and input data to each system Napster, Gnutella: file/data sharing NetSolve
Runs numerical kernel on any of multiple independent systems, much like a Grid solution
5 6
1 SETI@home Grid Computing - from ET to Use thousands of Internet- connected PCs to help in the Anthrax search for extraterrestrial intelligence. Uses data collected with the Arecibo Radio Telescope, in Puerto Rico Largest distributed When their computer is idle or computation project in being wasted this software will existence download a 300 kilobyte chunk ~ 400,000 machines of data for analysis. Averaging 27 Tflop/s The results of this analysis are sent back to the SETI team, Today many companies trying combined with thousands of this for profit. other participants.
7 8
United Devices and Cancer Screening Grid Computing
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships. Resources (HPC systems, visualization systems & displays, storage systems, sensors, instruments, people) are integrated via ‘middleware’ to facilitate use of all resources.
9 10
Why Grids? Why Grids?
A biochemist exploits 10,000 computers to screen Resources have different functions, but 100,000 compounds in an hour multiple classes resources are necessary for 1,000 physicists worldwide pool resources for most interesting problems. petaop analyses of petabytes of data Power of any single resource is small Civil engineers collaborate to design, execute, & compared to aggregations of resources analyze shake table experiments Network connectivity is increasing rapidly in Climate scientists visualize, annotate, & analyze bandwidth and availability terabyte simulation datasets An emergency response team couples real time Large problems require teamwork and data, weather model, population data computation
11 12
2 Why Grids? (contd.) The Grid Problem
A multidisciplinary analysis in aerospace couples Flexible, secure, coordinated resource code and data in four companies sharing among dynamic collections of A home user invokes architectural design individuals, institutions, and resource From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” functions at an application service provider Enable communities (“virtual organizations”) An application service provider purchases cycles to share geographically distributed from compute cycle providers resources as they pursue common goals -- Scientists working for a multinational soap assuming the absence of… company design a new product central location, central control, A community group pools members’ PCs to analyze alternative designs for a local road omniscience, existing trust relationships.
13 14
Elements of the Problem Grid vs. Internet/Web Services?
Resource sharing We’ve had computers connected by networks Computers, storage, sensors, networks, … for 20 years Sharing always conditional: issues of trust, policy, negotiation, payment, … We’ve had web services for 5-10 years Coordinated problem solving The Grid combines these things, and brings Beyond client-server: distributed data analysis, additional notions computation, collaboration, … Virtual Organizations Dynamic, multi-institutional virtual organisations Infrastructure to enable computation to be carried out Community overlays on classic org structures across these Large or small, static or dynamic Authentication, monitoring, information, resource discovery, status, coordination, etc Can I just plug my application into the Grid? No! Much work to do to get there! 15 16
What does this mean now for users and Network Bandwidth Growth developers? There is a grand vision of the future Network vs. computer performance Collecting resources around the world into Vos Computer speed doubles every 18 months Seamless access to them, with a single signon Network speed doubles every 9 months NEW applications to exploit of them in unique ways! Today we want to help you prepare for this Difference = order of magnitude per 5 years There is a frustrating reality of the present 1986 to 2000
These technologies are not yet fully mature Computers: x 500 Not fully deployed Networks: x 340,000 Not consistent across even single Vos But centers and funding agencies worldwide are 2001 to 2010 pushing this very, very hard Computers: x 60 Better get ready now Networks: x 4000 You can help! Work with your centers to get this deployed
17 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American18(Jan- 2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
3 Computational Grids and The Grid Electric Power Grids
To treat CPU cycles and software like commodities. Why the Computational Why the Computational Grid is like the Electric Grid is different from Napster on steroids. Power Grid the Electric Power Grid
Electric power is Wider spectrum of Enable the coordinated use of geographically distributed resources – in the absence of central control and existing trust relationships. ubiquitous performance Computing power is produced much like utilities such as power and Don’t need to know the Wider spectrum of water are produced for consumers. source of the power services
Users will have access to “power” on demand (transformer, Access governed by “When the Network is as fast as the computer’s internal links, the generator) or the power more complicated machine disintegrates across the Net into a set of special purpose company that serves it issues appliances” Security Gilder Technology Report June 2000 Performance Socio-political factors 19 20
What is Grid Computing? The Computational Grid is… Resource sharing & coordinated problem solving in dynamic, multi-institutional …a distributed control infrastructure that allows virtual organizations applications to treat compute cycles as commodities. Power Grid analogy Power producers: machines, software, networks, storage
DATA ADVANCED systems ACQUISITION VISUALIZATION ,ANALYSIS Power consumers: user applications Applications draw power from the Grid the way appliances draw electricity from the power utility. QuickTime™ and a decompressor are needed to see this picture. COMPUTATIONAL Seamless RESOURCES IMAGING INSTRUMENTS LARGE-SCALE DATABASES High-performance Ubiquitous Dependable 21 22
Introduction Introduction
Grid systems can be classified depending on Computational Grid:
their usage: denotes a system that has a higher aggregate Distributed Computational Supercomputing capacity than any of its constituent machine Grid High it can be further categorized based on how the Throughput overall capacity is used Grid Data Grid Systems Distributed Supercomputing Grid: On Demand executes the application in parallel on multiple Service machines to reduce the completion time of a job Collaborative Grid
Multimedia
23 24
4 Introduction Introduction
Grand challenge problems typically require a Data Grid:
distributed supercomputing Grid – one of the systems that provide an infrastructure for motivating factors of early Grid research – synthesizing new information from data still driving in some quarters repositories such as digital libraries or data High throughput Grid: warehouses increases the completion rate of a stream of jobs applications for these systems would be special arriving in real time purpose data mining that correlates information ASIC or processor design verifications tests would from multiple different high volume data sources be run on a high throughput Grid
25 26
Introduction Introduction
Service Grid: Multimedia Grid:
systems that provide services that are not provides an infrastructure for real time multimedia provided by any single machine applications -- requires the support quality of
subdivided based on the type of service they service across multiple different machines provide whereas a multimedia application on a single dedicated machine can be deployed without QoS collaborative Grid: synchronization between network and end-point connects users and applications into collaborative QoS workgroups -- enable real time interaction between humans and applications via a virtual workspace
27 28
Introduction The Grid
demand Grid:
category dynamically aggregates different resources to provide new services
data visualization workbench that allows a scientist to dynamically increase the fidelity of a simulation by allocating more machines to a simulation would be an example
29 30
5 Atmospheric Sciences Grid The Grid Architecture Picture
Real time data User Portals Problem Solving Application Science Data Fusion Grid Access & Info Environments Portals
General Circulation model
Service Layers Resource Discovery Topography Co- Scheduling & Allocation Fault Tolerance Regional weather model Topography DatabaseDatabase
Authentication Naming & Files Events
VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model DatabaseDatabase EmissionsEmissions Resource Layer Computers Data bases Online instruments Inventory Software Inventory
High speed networks and routers 31 32
Standard Implementation Challenges in Grid Computing
Change Models Real time data Data Fusion Reliable performance
MPI Trust relationships between multiple security MPI MPI domains General Circulation model GASS/GridFTP/GRC Deployment and maintenance of grid middleware Regional weather model TopographyTopography DatabaseDatabase across hundreds or thousands of nodes MPI Access to data across WAN’s Access to state information of remote processes VegetationVegetation Photo-chemical pollution model Particle dispersion model Bushfire model GASS DatabaseDatabase EmissionsEmissions GASS Workflow / dependency management InventoryInventory Distributed software and license management MPI Accounting and billing
33 34
Basic Grid Architecture Advantages of Grid Computing
Clusters and how grids are different than Use Resources Scattered Across the World clusters • Access to more computing power • Better access to data Departmental Grid Model • Utilize unused cycles Enterprise Grid Model Computing at a new level of complexity and scale Global Grid Model Increased Collaboration across Virtual Organizations (VO ) • groups of organizations that use the Grid to share resources
35 36
6 Examples of Grids Teragrid Network
TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org) European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org) Access Grid – collaboration systems using commodity technologies (www.accessgrid.org) Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)
37 38
...and enabling broad based collaborations
39 NLRTeraGrid Gloriad UltraScience Net40
Grid Possibilities Some Grid Usage Models
A biochemist exploits 10,000 computers to screen Distributed computing: job scheduling on Grid 100,000 compounds in an hour resources with secure, automated data 1,000 physicists worldwide pool resources for transfer petaflop analyses of petabytes of data Workflow: synchronized scheduling and Civil engineers collaborate to design, execute, & automated data transfer from one system to analyze shake table experiments next in pipeline (e.g. compute-viz-storage) Climate scientists visualize, annotate, & analyze Coupled codes, with pieces running on terabyte simulation datasets differnet systems simultaneously An emergency response team couples real time data, weather model, population data Meta-applications: parallel apps spanning multiple systems
41 42
7 Grid Usage Models Example Application Projects
Some models are similar to models already Earth Systems Grid: environment (US DOE) being used, but are much simpler due to: EU DataGrid: physics, environment, etc. (EU) EuroGrid: various (EU) single sign-on Fusion Collaboratory (US DOE) automatic process scheduling GridLab: astrophysics, etc. (EU)
automated data transfers Grid Physics Network (US NSF) MetaNEOS: numerical optimization (US NSF) But Grids can encompass new resources NEESgrid: civil engineering (US NSF) likes sensors and instruments, so new usage Particle Physics Data Grid (US DOE) models will arise
43 44
Some Grid Requirements – Some Grid Requirements – Systems/Deployment Perspective User Perspective Identity & authentication Adaptation Single allocation: if any at all Authorization & policy Intrusion detection Resource discovery Single sign-on: authentication to any Grid resources Resource management Resource characterization authenticates for all others Accounting & payment Resource allocation (Co-)reservation, workflow Fault management Single compute space: one scheduler for all Grid Distributed algorithms System evolution resources Remote data access Etc. Single data space: can address files and data from High-speed data transfer Etc. Performance guarantees any Grid resources Monitoring Single development environment: Grid tools and libraries that work on all grid resources
45 46
Programming & Systems Challenges The Systems Challenges: Resource Sharing Mechanisms That… The programming problem
Facilitate development of sophisticated applns Address security and policy concerns of
Facilitate code sharing resource owners and users
Requires prog. envs: APIs, SDKs, tools Are flexible enough to deal with many The systems problem resource types and sharing modalities Facilitate coordinated use of diverse resources Scale to large number of resources, many Facilitate infrastructure sharing: e.g., certificate authorities, info services participants, many program components
Requires systems: protocols, services Operate efficiently when dealing with large E.g., port/service/protocol for accessing information, allocating resources amounts of data & computation
47 48
8 The Security Problem The Resource Management Problem
Resources being used may be extremely valuable & the Enabling secure, controlled remote access to problems being solved extremely sensitive computational resources and management of Resources are often located in distinct administrative domains
Each resource may have own policies & procedures remote computation The set of resources used by a single computation may be large, Authentication and authorization dynamic, and/or unpredictable Resource discovery & characterization Not just client/server It must be broadly available & applicable Reservation and allocation Standard, well-tested, well-understood protocols Computation monitoring and control Integration with wide variety of tools
49 50
Grid Systems Technologies The Programming Problem
Systems and security problems addressed by How does a user develop robust, secure, new protocols & services. E.g., Globus: long-lived applications for dynamic,
Grid Security Infrastructure (GSI) for security heterogeneous, Grids? Globus Metadata Directory Service (MDS) for Presumably need: discovery Abstractions and models to add to Globus Resource Allocations Manager (GRAM) speed/robustness/etc. of development protocol as a basic building block Tools to ease application development and Resource brokering & co-allocation services diagnose common problems GridFTP, IBP for data movement Code/tool sharing to allow reuse of code components developed by others
51 52
Grid Programming Technologies Examples of Grid Programming Technologies “Grid applications” are incredibly diverse (data, MPICH-G2: Grid-enabled message passing collaboration, computing, sensors, …) CoG Kits, GridPort: Portal construction, based on N- Seems unlikely there is one solution tier architectures Most applications have been written “from GDMP, Data Grid Tools, SRB: replica management, scratch,” with or without Grid services collection management Application-specific libraries have been shown to Condor-G: simple workflow management provide significant benefits Legion: object models for Grid computing No new language, programming model, etc., has NetSolve: Network enabled solver yet emerged that transforms things Cactus: Grid-aware numerical solver framework But certainly still quite possible Note tremendous variety, application focus
53 54
9 MPICH-G2: A Grid-Enabled MPI Grid Events
A complete implementation of the Message Passing Global Grid Forum: working meeting Interface (MPI) for heterogeneous, wide area Meets 3 times/year, alternates U.S.-Europe, with environments July meeting as major event Based on the Argonne MPICH implementation of MPI (Gropp and Lusk) HPDC: major academic conference Globus services for authentication, resource HPDC-11 in Scotland with GGF-8, July 2002 allocation, executable staging, output, etc. Other meetings include Programs run in wide area without change IPDPS, CCGrid, EuroGlobus, Globus Retreats See also: MetaMPI, PACX, STAMPI, MAGPIE
www.globus.org/mpi 55 www.gridforum.org, www.hpdc.org 56
Useful References Emergence of Grids
Book (Morgan Kaufman) But Grids enable much more than apps www.mkp.com/grids running on multiple computers (which can be Perspective on Grids achieved with MPI alone)
“The Anatomy of the Grid: Enabling Scalable virtual operating system: provides global Virtual Organizations”, IJSA, 2001 workspace/address space via a single login www.globus.org/research/papers/anatomy.pdf automatically manages files, data, accounts, and All URLs in this section of the presentation, security issues
especially: connects other resources (archival data facilities, www.gridforum.org, www.grids-center.org, instruments, devices) and people (collaborative www.globus.org environments)
57 58
Globus Grid Services Grids Are Inevitable
Inevitable (at least in HPC): The Globus toolkit provides a range of basic Grid
leverages computational power of all available systems services Security, information, fault detection, communication, manages resources as a single system--easier for users resource management, ... provides most flexible resource selection and management, load sharing These services are simple and orthogonal Can be used independently, mix and match researchers’ desire to solve bigger problems will always outpace performance increases of single systems; just as Programming model independent multiple processors are needed, ‘multiple multiprocessors’ For each there are well-defined APIs will be deemed so Standards are used extensively E.g., LDAP, GSS-API, X.509, ... You don’t program in Globus, it’s a set of tools like Unix
59 60
10 Basic Grid Building Blocks Maturation of Grid Computing
Computational Resources IBP – Internet Research focus moving from building of basic Clusters Backplane Reply Choice MPP infrastructure and application demonstrations to Protocol is Workstations middleware for MPI, Condor,... Middleware managing and using remote Usable production environments storage. Client Agent Application performance Request Scalability Æ Globalization
RPC-like Submission Execution Development, research, and integration happening Customer Owner Agent Agent NetSolve – Solving outside of the original infrastructure groups computational Object Object Files Files problems remotely Application Execution Agent Agent Grids becoming a first-class tool for scientific Data & Object Files communities Application Application Condor – Ckpt Process Process Files harnessing idle Remote GriPhyN (Physics), BIRN (Neuroscience), NVO (Astronomy), I/O & workstations Ckpt for high-throughput Cactus (Physics), … computing 61 62
Broad Acceptance of Grids as a Critical Platform Broad Acceptance of Grids as a Critical Platform for for Computing Computing
Widespread interest from industry in Widespread interest from government in developing computational Grid platforms developing computational Grid platforms IBM, Sun, Entropia, Avaki, Platform, …
On August 2, 2001, IBM announced a new corporate initiative to support and exploit Grid computing. AVAKI AP reported that IBM was investing $4 billion into building 50 NSF’s Cyberinfrastructure computer server farms NASA’s Information Power Grid around the world.
DOE’s Science Grid 63 64
Grids Form the Basis of a National ) s Distributed and Parallel Systems / Information Infrastructure p ) lo f /s p t T c m lo 0 f e e r 3 T n e n m 1 t s ( 7 o t 2 s w ( c s s Massively lu f r i August 9, 2001: NSF Distributed e / e p d c o t D o e l parallel m g f w n l f Awarded $53,000,000 o a s l k i systems i in r s e T h a u r l l p t o a l I b u e i systems to SDSC/NPACI @ o w t a I r d p w c r C hetero- t i o t s T r e a S and NCSA/Alliance n m e e lu p homo- E G o P A geneous S E B N C s for TeraGrid C geneous
TeraGrid will provide in Gather (unused) resources Bounded set of resources aggregate Steal cycles Apps grow to consume all cycles System SW manages resources Application manages resources
• 13.6 trillion calculations per second System SW adds value System SW gets in the way • Over 600 trillion bytes of immediately accessible data 10% - 20% overhead is OK 5% overhead is maximum • 40 gigabit per second network speed Resources drive applications Apps drive purchase of equipment • Provide a new paradigm for data-oriented computing Time to completion is not critical Real-time constraints • Critical for disaster response, genomics, environmental modeling, etc. Time-shared Space-shared
65 66
11 Basic Usage NetSolve Scenarios Network Enabled Server
Grid based numerical library NetSolve is an example of a grid based routines “Blue Collar” Grid Based hardware/software server. User doesn’t have to have Computing Easy-of-use paramount software library on their Does not require deep Based on a RPC model but with … machine, LAPACK, SuperLU, knowledge of network resource discovery, dynamic problem solving ScaLAPACK, PETSc, AZTEC, programming capabilities, load balancing, fault tolerance ARPACK Level of expressiveness right asynchronicity, security, … Task farming applications for many users Other examples are NEOS from Argonne and “Pleasantly parallel” execution User can set things up, no NINF Japan. eg Parameter studies “su” required Use resources, not tie together geographically Remote application execution In use today, up to 200 distributed resources, for a single application.
Complete applications with user servers in 9 countries specifying input parameters and Can plug into Globus, receiving output Condor, NINF, … 67 68
NetSolve: The Big Picture NetSolve: The Big Picture Client Schedule Client Schedule Database Database
AGENT(s) A AGENT(s) , B Matlab Matlab IBP Depot IBP Depot Mathematica Mathematica C, Fortran C, Fortran Web Web
S3 S4 S3 S4 S1 S2 S1 S2 C C A A
No knowledge of the grid required, RPC like.69 No knowledge of the grid required, RPC like.70
NetSolve: The Big Picture NetSolve: The Big Picture Client Schedule Client Schedule Database Database Request
H AGENT(s) AGENT(s) an S2 ! b d a le Matlab ck Matlab IBP Depot IBP Depot Mathematica Mathematica O P C, Fortran C, Fortran , h Web Web A a n nd s l A w e , e B r S3 S4 (C S3 S4 ) Op(C, A, B) S1 S2 S1 S2 C C A A
No knowledge of the grid required, RPC like.71 No knowledge of the grid required, RPC like.72
12 NetSolve Agent Agent NetSolve Agent Agent
Name server for the Resource Scheduling (cont’d): NetSolve system. CPU Performance (LINPACK). Information Service Network bandwidth, latency. Server workload. client users and administrators can query the hardware and software services available. Problem size/algorithm complexity. Calculates a “Time to Compute.” for each appropriate Resource scheduler server. maintains both static and dynamic information Notifies client of most appropriate server. regarding the NetSolve server components to use for the allocation of resources
73 74
NetSolve - Load Balancing NetSolve Client NetSolve agent : predicts the execution times and sorts the servers Prediction for a server based on : Function Based Interface. Client Client program embeds call from • Its distance over the network - Latency and Bandwidth NetSolve’s API to access additional - Statistical Averaging resources. • Its performance (LINPACK benchmark) Interface available to C, Fortran, Matlab, • Its workload Mathematica, and Java. • The problem size and the algorithm complexity Opaque networking interactions. Quick estimate NetSolve can be invoked using a variety of Cached data methods: blocking, non-blocking, task farms, workload out of date ? …
75 76
NetSolve Client NetSolve Client
Client i. Client makes request to agent. Client Intuitive and easy to use. Matlab Matrix multiply e.g.: ii. Agent returns list of servers.
A = matmul(B, C); iii. Client tries each one in turn until A = netsolve(‘matmul’, B, C); one executes successfully or list is • Possible parallelisms hidden. exhausted.
77 78
13 NetSolve - MATLAB Interface NetSolve - FORTRAN Interface Easy to ‘switch’ to NetSolve Synchronous Call >> define sparse matrix A parameter( MAX = 100) >> define rhs double precision A(MAX,MAX), B(MAX) >> [x, its] = netsolve( ‘itmeth’, ‘petsc’, A, rhs ); integer IPIV(MAX), N, INFO, LWORK … integer NSINFO >> [x, its] = netsolve( ‘itmeth’, ‘aztec’, A, rhs ); call DGESV(N,1,A,MAX,IPIV,B,MAX,INFO) >> [x, its] = netsolve( ‘solve’, ‘superlu’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘ma28’, A, rhs ); call NETSL(‘DGESV()’,NSINFO, N,1,A,MAX,IPIV,B,MAX,INFO)
Asynchronous Calls also available 79 80
Hiding the Parallel Processing Problem Description File
User maybe unaware of parallel processing Problem Description File defines problem specification used to add functional modules to NetSolve server. Wrapper to provide binding between the NetSolve client interface and server function being integrated. Complex syntax defines input/output objects, calling sequences, libraries to link, etc… Parsed by NetSolve to create “service” program. NetSolve takes care of the starting the message passing system, data distribution, and returning the results.
81 82
Generating New Services in NetSolve Problem Description Specification
I U G a av Add additional functionality Specifies the calling interface between GridSolve and the service J routine Describe the interface (arguments)
Generate wrapper New Service Added! Original NetSolve problem description files Install into server Strange notation
Service Difficult for users to understand Server New Service @PROBLEM degsv Service Previous attempts to simplify involved GUI front-ends @DESCRIPTION This is a linear solver for Service Service dense matrices from the LAPACK Library. Solves Ax=b. @INPUT 2 NetSolve In GridSolve, the format is totally re-designed @OBJECT MATRIX DOUBLE A Double precision matrix Parser/ Specified in a manner similar to normal function prototypes @OBJECT VECTOR DOUBLE b Right hand side Similar to Ninf @OUTPUT 1 Compiler @OBJECT VECTOR DOUBLE x …
83 84
14 GridSolve Problem Description (DGESV) GridSolve Interface Definition Language Original Fortran Subroutine: SUBROUTINE DGESV(N,NRHS,A,LDA,IPIV,B,LDB,INFO) Data types: int, char, float, double INTEGER INFO, LDA, LDB, N, NRHS Argument passing modes: INTEGER IPIV( * ) IN -- input only; not modified DOUBLE PRECISION A( LDA, * ), B( LDB, * ) INOUT -- input and output GridSolve IDL Specification: OUT -- output only SUBROUTINE dgesv(IN int N, IN int NRHS, VAROUT -- variable length output only data INOUT double A[LDA][N], IN int LDA, WORKSPACE -- server-side allocation of workspace; OUT int IPIV[N], INOUT double B[LDB][NRHS], not passed as part of calling sequence IN int LDB, OUT int INFO) Argument size "This solves Ax=b using LAPACK" Specified as expression using scalar arguments, e.g. LANGUAGE = "FORTRAN" ddot(IN int n, IN double dx[n*incx], IN int incx, … LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)" All typical operators supported (+, -, *, /, etc). COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS" MAJOR="COLUMN" 85 86
Problem Attributes Building the Services
MAJOR cd GridSolve/src/problem Row or column major; depends on the make check implementation of the service routine LANGUAGE When building a new service, the server should be restarted, but thereafter it is not necessary. Language in which the service routine is implemented; currently C or Fortran LIBS For more detailed documentation, consult the manual: Additional libraries to be linked cd GridSolve/doc/ug COMPLEXITY make Theoretical complexity of the service routine, ghostview ug.ps specified in terms of the arguments
87 88
NetSolve: NetSolve: How It Works
How to NetSolveInstall Software NetSolve Client server daemon Client server daemon Register stubs stubs NetSolve problem NetSolve problem description files description files Computational Computational Modules Modules
Reply
Request Query Java applet • User can install new components • Problem description files • Problem description files • Client download stubs at run-time • Java applet to generate them • Problem description files are portable • Java applet to generate them
89 90
15 Task Farming - Data Persistence Multiple Requests To Single Problem
A Solution: Chain together a sequence of NetSolve requests. Many calls to netslnb( ); /* non-blocking */ Analyze parameters to determine data dependencies. Essentially a DAG is created where Farming Solution: nodes represent computational modules and arcs Single call to netsl_farm( ); represent data flow.
Request iterates over an “array of input parameters.” Transmit superset of all input/output parameters and make persistent near server(s) for duration of Adaptive scheduling algorithm. sequence execution. Schedule individual request modules for execution. Useful for parameter sweeping, and independently parallel applications.
91 92
Data Persistence (cont’d) NetSolve Authentication with Kerberos
command1(A,sequence(A, B, B) E) Client Server Kerberos used to maintain Access Control result C Lists and manage access to computational netsl_begin_sequence( ); input A, netsl(“command1”, A, B, C); resources. command2(A,intermediate output C) C netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); NetSolve properly handles authorized and Client Server netsl(“command3”, D, E, F); non-authorized components together in the result D intermediate output D, netsl_end_sequence(C, D); same system. input E command3(D, E) Client Server result F
93 94
NetSolve Authentication Server Software Repository with Kerberos
Servers register NetSolve servers Client sends ticket and input to HardwareHardware SoftwareSoftware their presence server; server authenticates and with the agent returns the solution set and KDC NetSolve Server
NetSolve Kerberos Dynamic downloading of new software. agent KDC Enhance servers capabilities without shutdown and restart. Client requests ticket from KDC Repository maintained independently of server. Client sends work request to Client issues problem request; server; server replies requesting Agent responds with list of servers authentication credentials NetSolve client 95 96
16 NetSolve: A Plug into the Grid NetSolve: A Plug into the Grid
C Fortran C Fortran
Grid Resource Discovery NetSolve Fault Tolerance Grid Resource Discovery NetSolve Fault Tolerance middleware System Management Resource Scheduling middleware System Management Resource Scheduling
Globus NetSolve Ninf Condor Globus NetSolve Ninf Condor proxy proxy proxy proxy proxy proxy proxy proxy
NetSolve Ninf Globus Condor servers servers Grid back-ends NetSolve NetSolve servers servers
97 98
NetSolve: A Plug into the Grid NPACI Alpha Project - MCell: 3-D Monte-Carlo Simulation of Neuro-Transmitter Release in Between Cells PSE front-ends •UCSD (F. Berman, H. Casanova, M. Ellisman), Salk Institute (T. SCIRun Matlab Mathematica Custom Bartol), CMU (J. Stiles), UTK (Dongarra, R. Wolski) •Study how neurotransmitters diffuse and activate receptors in synapses Remote procedure call •blue unbounded, red singly bounded, green doubly bounded closed, C Fortran yellow doubly bounded open
Grid Resource Discovery NetSolve Fault Tolerance middleware System Management Resource Scheduling
Globus NetSolve Ninf Condor proxy proxy proxy proxy
NetSolve Ninf Globus Condor servers servers Grid back-ends NetSolve NetSolve servers servers
99 100
MCell: 3-D Monte-Carlo Simulation of Neuro- Netsolve and SCIRun SCIRun torso Transmitter Release in Between Cells defibrillator
•Developed at: Salk Institute, CMU application – •In the past, manually run on available workstations Chris Johnson, •Transparent Parallelism, Load balancing, Fault-tolerance U of Utah •Fits the farming semantic and need for NetSolve •Collaboration with AppLeS Project for scheduling tasks List of seeds
script
Scrip
Scrip …… Scrip …… …... …… …... Scrip …... Scrip Scrip …… …... …… Scrip …… Scrip Scrip Scrip …... …… …... …… …… …… …... …... …... …... 101 102
17 IPARS-enabled Web Servers Server IPARS Web NetSolve Interface Client Integrated Parallel Accurate Reservoir Simulator. TICAM of UT, Austin, Director, Dr. Mary Wheeler. Integrated Parallel Accurate Reservoir Simulator. Portable and Modular reservoir simulator. Mary Wheeler’s group, UT-Austin Models waterflood, black oil, compositional, well management, recovery Reservoir and Environmental Simulation. process … models black oil, waterflood, compositions Reservoir and Environmental Simulation. 3D transient flow of multiple phase models black oil, waterflood, compositions Integrates Existing Simulators. 3D transient flow of multiple phase Framework simplified development Integrates Existing Simulators. Provides solvers, handling for wells, table lookup. Framework simplified development Provides pre/postprocessor, visualization. Provides solvers, handling for wells, table lookup. Full IPARS access without Installation. Provides pre/postprocessor, visualization. IPARS Interfaces: Full IPARS access without Installation. C, FORTRAN, Matlab, Mathematica, and Web. IPARS Interfaces:
C, FORTRAN, Matlab, Mathematica, and Web.
103 104
University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG
IPARS-enabled Servers
Web Server
Web NetSolve Client Federated Ownership: CS, Chem Interface Eng., Medical School, Computational Ecology, El. Eng. NetSolve server post-processing for visualization. Real applications, Possible rendering of visualization via the internet The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. middleware using web browsers. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, and Oak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2), development, Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA) 105 Regional Information Infrastructure (RII), and Southern Crossroads (SoX). logistical 106 The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002. networking
NetSolve Monitor Demo1 – Blocking Calls
http://anaka.cs.utk.edu:8080/monitor/signed.html This demo runs through 3 calls, sorting, solving a system of linear equations, and finding the eigenvalues of a matrix.
[c] = netsolve('dqsort',b);
[x,y,z,info]=netsolve('dgesv',a,b);
[a,wr,wi,vl,vr,info]=netsolve('dgeev','N','V',a);
This will invoke a quick sort algorithm and the LAPACK routines for Ax=b and Ax=lx.
It has one input, the size of the problem.
107 108
18 Demo2 – Non-Blocking Calls Demo7 – Sparse Matrix
This example solves a This example shows a non-blocking call to sparse matrix problem NetSolve. using the SuperLU, MA28, [rr1]=netsolve_nb('send','dgesv',a1,b1); PETSc, and AZTEC. while status1 < 0, [status1]=netsolve_nb('probe',rr1); [x]=netsolve('sparse_direct_solve', 'SUPERLU',A,rhs,0.3,1); end [x]=netsolve('sparse_direct_solve', [aa1,ipiv1,x1,info]=netsolve_nb('wait',rr1); 'MA28',A,rhs,0.3,1);
[x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500);
[x,its]=netsolve('sparse_iterative_solve', 109 'AZTEC',A,rhs,1.e-6,500); 110
Demo3 – SuperLU Demo4 – MA28
This example solves a sparse This example solves a matrix problem using the sparse matrix problem SuperLU software from using the MA28 software Sherry Li and Jim Demmel at from Harwell-Rutherford Library Berkeley. [x]=netsolve('sparse_direct_solve',
[x]=netsolve('sparse_direct_solve', 'MA28',A,rhs,0.3,1); 'SUPERLU',A,rhs,0.3,1);
111 112
Demo5 - PETSc Demo6 - AZTEC
This example solves a This example solves a sparse sparse matrix problem using matrix problem using the the PETSc software from AZTEC software from Argonne National Lab. Sandia National Lab. Parallel processing is used in NetSolve to Parallel processing is used in NetSolve to solve the problem. solve the problem. [x,its]=netsolve('sparse_iterative_solve', [x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500); 'AZTEC',A,rhs,1.e-6,500);
113 114
19 Things Not Touched On Thanks
Fran Berman Hierarchy of Agents More scalable configuration Director, San Diego Supercomputer Center Monitor NetSolve Network Track and monitor usage Jay Boisseau Network status Network Weather Service Director, Texas Advanced Computing Center Internet Backplane Protocol Middleware for managing and using remote storage. Fault Tolerance Volker Strumpen’s Porch Local / Global Configurations Automated Adaptive Algorithm Selection Dynamic determine the nest algorithm based on system status and nature of user problem
115 116
20