Globus: a Metacomputing Infrastructure Toolkit

Globus A Metacomputing Infrastructure Toolkit y Ian Foster Carl Kesselman httpwwwglobusorg Abstract Emerging highp erformance applications require the ability to exploit diverse ge ographically distributed resources These applications use highsp eed networks to in tegrate sup ercomputers large databases archival storage devices advanced visualiza tion devices andor scientic instruments to form networked virtual supercomputers or metacomputers While the physical infrastructure to build such systems is b ecoming widespread the heterogeneous and dynamic nature of the metacomputing environment p oses new challenges for developers of system software parallel to ols and applications In this article we introduce Globus a system that we are developing to address these challenges The Globus system is intended to achieve a vertically integrated treatment of application middleware and network A lowlevel toolkit provides basic mechanisms such as communication authentication network information and data access These mechanisms are used to construct various higherlevel metacomputing services such as parallel programming to ols and schedulers Our longterm goal is to build an Adaptive Wide Area Resource Environment AWARE an integrated set of higherlevel services that enable applications to adapt to heterogeneous and dynamically changing meta computing environments Preliminary versions of Globus comp onents were deployed successfully as part of the IWAY networking exp eriment Introduction New classes of highp erformance applications are b eing developed that require unique capa bilities not available in a single computer Such applications are enabled by the construction of networked virtual supercomputers or metacomputers execution environments in which highsp eed networks are used to connect sup ercomputers databases scientic instruments and advanced display devices p erhaps lo cated at geographically distributed sites In prin ciple networked virtual sup ercomputers can b oth increase accessibility to sup ercomputing capabilities and enable the assembly of unique capabilities that could not otherwise b e created in a costeective manner Exp erience with highsp eed networking testb eds has demonstrated convincingly that there are indeed applications of considerable scientic and economic imp ortance that can b enet from metacomputing capabilities For example the IWAY networking exp eriment which connected sup ercomputers and other resources at dierent sites across North America saw groups develop applications in areas as diverse as largescale scientic simulation collab orative engineering and sup ercomputerenhanced scientic instruments Mathematics and Computer Science Division Argonne National Lab oratory fostermcsanlgov y The Beckman Institute California Institute of Technology carlcompbiocaltechedu Metacomputers have much in common with b oth distributed and parallel systems yet also dier from these two architectures in imp ortant ways Like a distributed system a networked sup ercomputer must integrate resources of widely varying capabilities connected by p otentially unreliable networks and often lo cated in dierent administrative domains However the need for high p erformance can require programming mo dels and interfaces radically dierent from those used in distributed systems As in parallel computing metacomputing applications often need to schedule communications carefully to meet p erformance requirements However the heterogeneous and dynamic nature of metacomputing systems limits the applicability of parallel computing to ols and techniques These considerations suggest that while metacomputing can build on distributed and parallel software technologies it also requires signicant advances in mechanisms techniques and to ols The Globus pro ject is intended to accelerate these advances In a rst phase we are developing and deploying a metacomputing infrastructure toolkit providing basic capabilities and interfaces in areas such as communication information resource lo cation resource scheduling authentication and data access Together these to olkit comp onents dene a metacomputing abstract machine on which can b e constructed a range of alternative infrastructures services and applications Figure We ourselves are building parallel programming to ols and resource discovery and scheduling services and other groups are working in other areas Our longterm goal in the Globus pro ject is to address the problems of conguration and p erformance optimization in metacomputing environments These are challenging issues b ecause of the inherent complexity of metacomputing systems the fact that resources are often only identied at runtime and the dynamic nature of resource characteristics We b elieve that successful applications must b e able to congure themselves to t the execution environment delivered by the metacomputing system and then adapt their b ehavior to subsequent changes in resource characteristics We are investigating the design of higher level services layered on the Globus to olkit that enable the construction of such adaptive applications We refer collectively to these services as forming an Adaptive Wide Area Resource Environment or AWARE The rest of the article is as follows In Section we introduce general characteristics of metacomputing systems and requirements for metacomputing infrastructure In Section we describ e the Globus architecture and the techniques used to supp ort resourceaware applications In Section we describ e ma jor to olkit comp onents In Sections and we describ e higherlevel services constructed with the to olkit and testb eds that have deployed to olkit comp onents We conclude in Section with a discussion of current system status and future plans Metacomputing We use the term metacomputer to denote a networked virtual sup ercomputer constructed dynamically from geographically distributed resources linked by highsp eed networks For example Figure shows a metacomputer constructed during the IWAY exp eriment for realtime pro cessing of image data from a meteorological satellite Figure illustrates an example of such a system a somewhat simplied version of this architecture was used in the IWAY for realtime image pro cessing of a data stream from a meteorological satellite Metacomputing like more mainstream applications of distributed computing is moti vated by a need to access resources not lo cated within a single computer system Frequently Globus services: AWARE Other services (actual/planned) Higher level MPI, CC++, I-Soft . Legion AppLeS . services CAVEcomm scheduler CORBA HPC++ Globus metacomputing abstract machine Globus Comms Resource Information Data access toolkit Authent- . (Nexus) (al)location ication service (RIO) modules Heterogeneous, geographically distributed devices and networks Meta computing I - W A Y G U S T O . testbeds Fig The Globus toolkit the driving force is economic the resources in questionfor example sup ercomputers are to o exp ensive to b e replicated Alternatively an application may require resources that would not normally b e colo cated b ecause a particular conguration is required only rarely for example a collab orative engineering environment that connects the virtual reality systems design databases and sup ercomputers required to work on a particular engineering problem Finally certain unique resourcessuch as sp ecialized databases and p eoplecannot b e replicated In each case the ability to construct networked virtual sup ercomputers can provide qualitatively new capabilities that enable new approaches to problem solving Metacomputing Applications Scientists and engineers are just b eginning to explore the new applications enabled by networked sup ercomputing The IWAY exp eriment identied four signicant application classes Desktop supercomputing These applications couple highend graphics capabilities with remote sup ercomputers andor databases This coupling connects users more tightly with computing capabilities while at the same time achieving distance indep endence b etween resources developers and users Smart instruments These applications connect users to instruments such as microscop es telescop es or satellite downlinks that are themselves coupled with remote sup ercomputers This computational enhancement can enable b oth quasi realtime pro cessing of instrument output and interactive steering Col laborative environments A third set of applications couple multiple virtual environments so that users at dierent lo cations can interact with each other and Fig Networked supercomputers are constructed dynamically from geographically distributed resources This gure shows a collection of parallel supercomputers networkconnected workstations and a satel lite downlink with sup ercomputer simulations Distributed supercomputing These applications couple multiple computers to tackle problems that are to o large for a single computer or that can b enet from executing dierent problem comp onents on dierent computer architectures We can distinguish scheduled and unscheduled mo des of op eration In scheduled mo de resources once acquired are dedicated to an application In unscheduled mo de applications use otherwise idle resources that may b e reclaimed if needed Condor is one system that supp orts this mo de of op eration In general scheduled mo de is required for tightly coupled simulations particularly those with time constraints while unscheduled mo de is appropriate for lo osely coupled applications that can adapt to timevarying resources

Globus: a Metacomputing Infrastructure Toolkit

A Flexible Middleware for Metacomputing a Flexible

Four-Dimensional Model for Describing the Status of Peers in Peer-To-Peer Distributed Systems

From Computational Science to Internetics: Integration of Science with Computer Science

Using Metacomputing Tools to Facilitate Large Scale Analyses Of

Metacomputing Across Intercontinental Networks S.M

G2-P2P: a Fully Decentralised Fault-Tolerant Cycle-Stealing Framework

Web Based Metacomputing

Decision Support System on the Grid 1 Introduction

Kenneth A. Hawick PD Coddington HA James

Sorcer: Computing and Metacomputing Intergrid

Distributed Frameworks and Parallel Algorithms for Processing Large-Scale Geographic Data

Unstructured Peer-To-Peer Networks for Sharing Processor Cycles