Globus

A Metacomputing Infrastructure Toolkit

y

Ian Foster Carl Kesselman

httpwwwglobusorg

Abstract

Emerging highp erformance applications require the ability to exploit diverse ge

ographically distributed resources These applications use highsp eed networks to in

tegrate sup ercomputers large databases archival storage devices advanced visualiza

tion devices andor scientic instruments to form networked virtual or

metacomputers While the physical infrastructure to build such systems is b ecoming

widespread the heterogeneous and dynamic nature of the metacomputing environment

p oses new challenges for developers of system software parallel to ols and applications

In this article we introduce Globus a system that we are developing to address these

challenges The Globus system is intended to achieve a vertically integrated treatment

of application and network A lowlevel toolkit provides basic mechanisms

such as communication authentication network information and data access These

mechanisms are used to construct various higherlevel metacomputing services such as

parallel programming to ols and schedulers Our longterm goal is to build an Adaptive

Wide Area Resource Environment AWARE an integrated set of higherlevel services

that enable applications to adapt to heterogeneous and dynamically changing meta

environments Preliminary versions of Globus comp onents were deployed

successfully as part of the IWAY networking exp eriment

Introduction

New classes of highp erformance applications are b eing developed that require unique capa

bilities not available in a single computer Such applications are enabled by the construction

of networked virtual supercomputers or metacomputers execution environments in which

highsp eed networks are used to connect sup ercomputers databases scientic instruments

and advanced display devices p erhaps lo cated at geographically distributed sites In prin

ciple networked virtual sup ercomputers can b oth increase accessibility to sup ercomputing

capabilities and enable the assembly of unique capabilities that could not otherwise b e

created in a costeective manner

Exp erience with highsp eed networking testb eds has demonstrated convincingly that

there are indeed applications of considerable scientic and economic imp ortance that can

b enet from metacomputing capabilities For example the IWAY networking exp eriment

which connected sup ercomputers and other resources at dierent sites across North

America saw groups develop applications in areas as diverse as largescale scientic

simulation collab orative engineering and sup ercomputerenhanced scientic

instruments

Mathematics and Computer Science Division Argonne National Lab oratory fostermcsanlgov

y

The Beckman Institute California Institute of Technology carlcompbiocaltechedu

Metacomputers have much in common with b oth distributed and parallel systems

yet also dier from these two architectures in imp ortant ways Like a distributed

system a networked sup ercomputer must integrate resources of widely varying capabilities

connected by p otentially unreliable networks and often lo cated in dierent administrative

domains However the need for high p erformance can require programming mo dels

and interfaces radically dierent from those used in distributed systems As in parallel

computing metacomputing applications often need to schedule communications carefully

to meet p erformance requirements However the heterogeneous and dynamic nature of

metacomputing systems limits the applicability of to ols and techniques

These considerations suggest that while metacomputing can build on distributed

and parallel software technologies it also requires signicant advances in mechanisms

techniques and to ols The Globus pro ject is intended to accelerate these advances In

a rst phase we are developing and deploying a metacomputing infrastructure toolkit

providing basic capabilities and interfaces in areas such as communication information

resource lo cation resource scheduling authentication and data access Together these

to olkit comp onents dene a metacomputing abstract machine on which can b e constructed

a range of alternative infrastructures services and applications Figure We ourselves

are building parallel programming to ols and resource discovery and scheduling services and

other groups are working in other areas

Our longterm goal in the Globus pro ject is to address the problems of conguration and

p erformance optimization in metacomputing environments These are challenging issues

b ecause of the inherent complexity of metacomputing systems the fact that resources are

often only identied at runtime and the dynamic nature of resource characteristics We

b elieve that successful applications must b e able to congure themselves to t the execution

environment delivered by the metacomputing system and then adapt their b ehavior to

subsequent changes in resource characteristics We are investigating the design of higher

level services layered on the Globus to olkit that enable the construction of such adaptive

applications We refer collectively to these services as forming an Adaptive Wide Area

Resource Environment or AWARE

The rest of the article is as follows In Section we introduce general characteristics of

metacomputing systems and requirements for metacomputing infrastructure In Section

we describ e the Globus architecture and the techniques used to supp ort resourceaware

applications In Section we describ e ma jor to olkit comp onents In Sections and we

describ e higherlevel services constructed with the to olkit and testb eds that have deployed

to olkit comp onents We conclude in Section with a discussion of current system status

and future plans

Metacomputing

We use the term metacomputer to denote a networked virtual sup ercomputer constructed

dynamically from geographically distributed resources linked by highsp eed networks For

example Figure shows a metacomputer constructed during the IWAY exp eriment for

realtime pro cessing of image data from a meteorological satellite

Figure illustrates an example of such a system a somewhat simplied version of this

architecture was used in the IWAY for realtime image pro cessing of a data stream from a

meteorological satellite

Metacomputing like more mainstream applications of is moti

vated by a need to access resources not lo cated within a single computer system Frequently

Globus services: AWARE Other services (actual/planned) Higher level MPI, CC++, I-Soft . . . Legion AppLeS . . . services CAVEcomm scheduler CORBA HPC++

Globus metacomputing abstract machine Globus Comms Resource Information Data access toolkit Authent- . . . (Nexus) (al)location ication service (RIO) modules

Heterogeneous, geographically distributed devices and networks Meta computing I - W A Y G U S T O . . .

testbeds

Fig The Globus toolkit

the driving force is economic the resources in questionfor example sup ercomputers

are to o exp ensive to b e replicated Alternatively an application may require resources

that would not normally b e colo cated b ecause a particular conguration is required only

rarely for example a collab orative engineering environment that connects the virtual

reality systems design databases and sup ercomputers required to work on a particular

engineering problem Finally certain unique resourcessuch as sp ecialized databases and

p eoplecannot b e replicated In each case the ability to construct networked virtual

sup ercomputers can provide qualitatively new capabilities that enable new approaches to

problem solving

Metacomputing Applications

Scientists and engineers are just b eginning to explore the new applications enabled

by networked sup ercomputing The IWAY exp eriment identied four signicant

application classes

Desktop supercomputing These applications couple highend graphics capabilities

with remote sup ercomputers andor databases This coupling connects users more

tightly with computing capabilities while at the same time achieving distance

indep endence b etween resources developers and users

Smart instruments These applications connect users to instruments such as

microscop es telescop es or satellite downlinks that are themselves coupled with

remote sup ercomputers This computational enhancement can enable b oth quasi

realtime pro cessing of instrument output and interactive steering

Col laborative environments A third set of applications couple multiple virtual

environments so that users at dierent lo cations can interact with each other and

Fig Networked supercomputers are constructed dynamically from geographically distributed

resources This gure shows a collection of parallel supercomputers networkconnected workstations

and a satel lite downlink

with sup ercomputer simulations

Distributed supercomputing These applications couple multiple computers to tackle

problems that are to o large for a single computer or that can b enet from executing

dierent problem comp onents on dierent computer architectures We

can distinguish scheduled and unscheduled mo des of op eration In scheduled mo de

resources once acquired are dedicated to an application In unscheduled mo de

applications use otherwise idle resources that may b e reclaimed if needed Condor

is one system that supp orts this mo de of op eration In general scheduled mo de

is required for tightly coupled simulations particularly those with time constraints

while unscheduled mo de is appropriate for lo osely coupled applications that can adapt

to timevarying resources

Metasystem Characteristics

While the IWAY and other networking testb eds have provided intriguing hints of what

future metacomputing systems may lo ok like and how they may b e used the denition and

application of such systems remain research problems Nevertheless we can make general

observations ab out their characteristics see also

 Scale and the need for selection To date most metacomputing exp eriments have b een

p erformed on relatively small testb eds with the site IWAY b eing the largest In

the future we can exp ect to deal with much larger collections from which resources

will b e selected for particular applications according to criteria such as connectivity

cost security and reliability

 Heterogeneity at multiple levels Both the computing resources used to construct virtual sup ercomputers and the networks that connect these resources are often highly

heterogeneous Heterogeneity can arise at multiple levels ranging from physical

devices through system software to scheduling and usage p olicies

 Unpredictable structure Traditionally highp erformance applications have b een

developed for a single class of system with wellknown characteristicsor even for

one particular computer In contrast metacomputing applications can b e required

to execute in a wide range of environments constructed dynamically from available

resources Geographical distribution and complexity are other factors that make it

dicult to determine system characteristics such as network bandwidth and latency

a priori

 Dynamic and unpredictable behavior Traditional highp erformance systems use

scheduling discipline s such as space sharing or gangscheduling to provide exclusive

and hence predictableaccess to pro cessors and networks In metacomputing

environments resourcesesp ecially networksare more likely to b e shared One

consequence of sharing is that b ehavior and p erformance can vary over time For

example in wide area networks built using the Proto col suite network

characteristics such as latency bandwidth and jitter may vary as trac is rerouted

Largescale metasystems may also suer from network and resource failures In

general it is not p ossible to guarantee even minimum quality of service requirements

 Multiple administrative domains The resources used by metacomputing applications

often are not owned or administered by a single entity The need to deal with multiple

administrative entities complicates the already challenging network security problem

as dierent entities may use dierent authentication mechanisms authorization

schemes and access p olicies The need to execute usersupplied co de at dierent

sites introduces additional concerns

Fundamental to all of these issues is the need for mechanisms that allow applications to

obtain realtime information ab out system structure and state use that information to make

conguration decisions and b e notied when information changes Required information

can include network activity available network interfaces pro cessor characteristics and

authentication mechanisms Decision pro cesses can require complex combinations of these

data in order to achieve ecient endtoend conguration of complex networked systems

The Globus Metacomputing Infrastructure Toolkit

A number of pioneering eorts have pro duced useful services for the metacomputing

application developer For example Parallel Virtual Machine PVM and the

Message Passing Interface MPI provide a machineindepende nt communication layer

Condor provides a uniform view of pro cessor resources Legion builds system

comp onents on a distributed ob jectoriented mo del and the Andrew File System AFS

provides a uniform view of le resources Each of these systems has b een proven eective

in largescale application exp eriments

Our goal in the Globus pro ject is not to comp ete with these and other related eorts

but rather to provide basic infrastructure that can b e used to construct p ortable high

p erformance implementations of a range of such services To this end we fo cus on a the

development of lowlevel mechanisms that can b e used to implement higherlevel services

and b techniques that allow those services to observe and guide the op eration of these

mechanisms If successful this approach can reduce the complexity and improve the quality

of metacomputing software by allowing a single lowlevel infrastructure to b e used for

many purp oses and by providing solutions to the conguration problem in metacomputing

systems

To demonstrate that the Globus approach is workable we must show that it is p ossible

to use a single set of lowlevel mechanisms to construct ecient implementations of diverse

services on multiple platforms As we rep ort b elow we have had some initial successes in

the communication area our Nexus communication library has b een used to construct high

p erformance implementations of diverse parallel programming interfaces However this is

certainly not a conclusive demonstration and the nal verdict on the Globus approach

must await the results of further research

In the following we rst introduce the Globus to olkit and then describ e the mechanisms

that allow higherlevel services to observe and guide the op eration of to olkit comp onents

The Globus Toolkit

The Globus to olkit comprises a set of mo dules Each mo dule denes an interface which

higherlevel services use to invoke that mo dules mechanisms and provides an implemen

tation which uses appropriate lowlevel op erations to implement these mechanisms in dif

ferent environments

Currently identied to olkit mo dules are as follows These descriptions fo cus on

requirements Sections and address implementation status

 Resource location and al location This comp onent provides mechanisms for expressing

application resource requirements for identifying resources that meet these require

ments and for scheduling resources once they have b een lo cated Resource lo cation

mechanisms are required b ecause applications cannot in general b e exp ected to know

the exact lo cation of required resources particularly when load and resource avail

ability can vary Resource allo cation involves scheduling the resource and p erforming

any initialization required for subsequent pro cess creation data access etc In some

situationsfor example on some sup ercomputerslo cation and allo cation must b e

p erformed in a single step

 Communications This comp onent provides basic communication mechanisms These

mechanisms must p ermit the ecient implementation of a wide range of communica

tion metho ds including message passing remote pro cedure call distributed shared

memory streambased and multicast Mechanisms must b e cognizant of network

quality of service parameters such as jitter reliability latency and bandwidth

 Unied resource information service This comp onent provides a uniform mechanism

for obtaining realtime information ab out metasystem structure and status The

mechanism must allow comp onents to p ost as well as receive information Supp ort

for coping and access control is also required

 Authentication interface This comp onent provides basic authentication mechanisms

that can b e used to validate the identity of b oth users and resources These

mechanisms provide building blo cks for other security services such as authorization

and data security that need to know the identity of parties engaged in an op eration

 Process creation This comp onent is used to initiate computation on a resource once

it has b een lo cated and allo cated This task includes setting up executables creating an execution environment starting an executable passing arguments integrating the

new pro cess into the rest of the computation and managing termination and pro cess

shutdown

 Data access This comp onent is resp onsible for providing highsp eed remote access

to p ersistent storage such as les Some data resources such as databases may b e

accessed via distributed database technology or the Common Ob ject Request Broker

Architecture CORBA The Globus data access mo dule addresses the problem of

achieving high p erformance when accessing parallel le systems and networkenabled

IO devices such as the High Performance Storage System HPSS

Together the various Globus to olkit mo dules can b e thought of as dening a

metacomputing virtual machine The denition of this virtual machine simplies application

development and enhances p ortability by allowing programmers to think of geographically

distributed heterogeneous collections of resources as unied entities

Supp ort for ResourceAware Services and Applications

Metacomputing applications often need to op erate networking and computing resources

at close to maximum p erformance Hence metacomputing environments must allow

programmers to observe dierences in system resource characteristics and to guide how

these resources are used to implement higherlevel services Achieving these goals without

compromising p ortability is a signicant challenge for the designer of metacomputing

software

We use the Globus communication mo dule to illustrate some of these issues This

mo dule must select for each call to its communication functions one of several lowlevel

mechanisms On a lo cal area network communication might b e p erformed with TCPIP

while in a parallel computer sp ecialized highp erformance proto cols typically oer higher

bandwidth and lower latencies In a wide area environment sp ecialized ATM proto cols can

b e more ecient The ability to manage proto col parameters TCP packet size network

quality of service further complicates the picture The choice of lowlevel mechanism used

for a particular communication is a nontrivial problem that can have signicant implications

for application p erformance

Globus to olkit mo dules address this problem by providing interfaces that allow the

selection pro cess to b e exp osed to and guided by higherlevel to ols and applications

These interfaces provide rulebased selection resource prop erty inquiry and notication

mechanisms

 Rulebased selection Globus mo dules can identify selection p oints at which choices

from among alternatives resources parameter values etc are made Asso ciated

with each selection p oint is a default selection rule provided by the mo dule developer

eg use TCP packet size X use TCP over ATM A rule replacement

mechanism allows higherlevel services to sp ecify alternative strategies use TCP

packet size Y use sp ecialized ATM proto cols

 Resource property inquiry Information provided by the unied information service

Section can b e used to guide selection pro cesses within b oth Globus mo dules and

applications that use these mo dules For example a user might provide a rule that

states use ATM interface if load is low otherwise Internet hence using information

ab out network load to guide resource selection

 Notication A notication mechanism allows a higherlevel service or application

to sp ecify constraints on the quality of service delivered by a Globus service and to

0 1 2 EP SP EP SP

SP

Fig Nexus communication mechanisms The gure shows three contexts address spaces

and three communication links Three startpoints in context reference endpoints in contexts

and

name a callback function that should b e invoked if these constraints are violated

This mechanism can b e used for example to switch b etween networks when one

b ecomes loaded

Higherlevel services and applications can use Globus selection inquiry and notication

mechanisms to congure computations eciently for available resources andor to adapt

b ehavior when the quantity andor quality of available resources changes dynamically

during execution For example consider an application that p erforms computation on

one computer and transfers data over a wide area network for visualization at remote sites

At startup time the application can determine available computational p ower and network

capacity and congure its computational and communication structures appropriately eg

it might decide to use compression for some data but not others During execution

notication mechanisms allow it to adapt to changes in network quality of service

We use the term Adaptive Wide Area Resource Environment AWARE to denote

a set of application interfaces higherlevel services and adaptation p olicies that enable

sp ecic classes of applications to exploit a metacomputing environment eciently We

are investigating AWARE comp onents for several applications and anticipate developing

AWARE to olkits for dierent classes of metacomputing application

Globus Toolkit Comp onents

We now describ e in more detail the Globus communications information service authen

tication and data access services In each case we outline how the comp onent maps to

dierent implementations is used to implement dierent higherlevel services and supp orts

the development of AWARE services and applications

Communications

The Globus communications mo dule is based on the Nexus communication library

Nexus denes ve basic abstractions no des contexts threads communication links and

remote service requests Figure The Nexus functions that manipulate these abstractions

constitute the Globus communication interface This interface is used extensively by other

Globus mo dules and has also b een used to construct various higherlevel services including

parallel programming to ols Section The Active Messages and Fast Messages

systems have similarities in goals and approach but there are also signicant dierences

Nexus programs bind communication startp oints and endp oints to form communication

links If multiple startp oints are b ound to an endp oint incoming communications are

interleaved in the same manner as messages sent to the same no de in a message passing

system If a startp oint is b ound to multiple endp oints communication results in a multicast

op eration A startp oint can b e copied b etween pro cessors causing new communication links

to b e created that mirror the links asso ciated with the original startp oint This supp ort

for copying means that startp oints can b e used as global names for ob jects These names

can b e communicated and used anywhere in a distributed system

A communication link supp orts a single communication op eration an asynchronous

remote service request RSR An RSR is applied to a startp oint by providing a pro cedure

name and a data buer For each endp oint linked to the startp oint the RSR transfers

the data buer to the address space in which the endp oint is lo cated and remotely invokes

the sp ecied pro cedure passing the endp oint and the data buer as arguments A lo cal

address can b e asso ciated with an endp oint in which case startp oints asso ciated with the

endp oint can b e thought of as global p ointers to that address

The Nexus interface and implementation supp ort rulebased selection Section

of the metho dssuch as proto col compression metho d and quality of serviceused to

p erform communication Dierent communication metho ds can b e asso ciated with

dierent communication links with selection rules determining which metho d should b e

used when a new link is established These mechanisms have b een used to supp ort multiple

communication proto cols and selective use of secure communication in heterogeneous

environments

Metacomputing Information Service

As noted ab ove metacomputing environments dep end critically on access to information

ab out the underlying networked sup ercomputing system Required information can include

the following

 Conguration details ab out resources such as the amount of memory CPU sp eed

number of no des in a parallel computer or the number and type of network interfaces

available

 Instantaneous p erformance information such as p ointtopoint network latency

available network bandwidth and CPU load

 Applicationsp eci c information such as memory requirements or program structures

found eective on previous runs

Dierent data items will have dierent scop es of interest and security requirements

but some information at least may p otentially b e required globally by any system

comp onent Information may b e obtained from multiple sources for example from

standard information services such as the Network Information Service NIS or Simple

Network Management Proto col SNMP Management Information Base MIB from

sp ecialized services such as the Network Weather Service or from external sources

such as the system manager or an application

Globus denes a single unied access mechanism for this wide range of information

called the Metacomputing Information Service MIS Logically each MIS entry comprises

an index and zero or more asso ciated keyvalue pairs Frequently the index is the name of

a metacomputing resource and the asso ciated keyvalue pairs dene resource attributes

Applications can query the MIS to determine for example the value currently asso ciated with a particular key

rs

protocols udp

udpc

startups ss rsh

login foster

ssport

sun sun sun

protocolsatm udp

atminterfacec

udpinterfacew

login itf

startups rsh

Fig RDB entries describing the network interfaces authentication mechanisms and process

creation mechanisms supported by one RS and three Sun workstations

The MIS implementation currently uses a textual database format Figure A proxy

mechanism supp orts the integration of other data services such as SNMP and NIS and

allows remote access Issues such as caching and replication are dened by higherlevel

Globus services built on the MIS We are extending the MIS with a notication interface

that will allow higherlevel services or applications to request notications when sp ecied

conditions in the MIS b ecome true

Authentication Metho ds

The initial version of the Globus authentication mo dule supp orted password Unix RSH

and Secure So cket Layer authentication To increase the degree of abstraction at the to olkit

interface we are moving towards the use of the Generic Security System GSS GSS

denes a standard pro cedure and API for obtaining credentials passwords or certicates

for mutual authentication client and server and for messageoriented encryption and

decryption GSS is indep endent of any particular security mechanism and can b e layered

on top of dierent security metho ds such as Kerb eros and SSL

GSS must b e altered and extended to meet the requirements of metacomputing

environments As a metacomputing system may use dierent authentication mechanisms

in dierent situations and for dierent purp oses we require a GSS implementation that

supp orts the concurrent use of dierent security mechanisms In addition inquiry and

selection functions are needed so that higherlevel services implementing sp ecic security

p olicies can select from available lowlevel security mechanisms

Data Access Services

Services that provide metacomputing applications with access to p ersistent data can face

stringent p erformance requirements and must supp ort access to data lo cated in multiple

administrative domains Distributed le systems such as the Network File System and

Distributed File System address remote access to some extent but have not b een designed

for highp erformance applications Parallel le systems and IO libraries have b een designed

for p erformance but not for distributed execution

To address these problems the Globus data access mo dule denes primitives that

Application programs

ChemIO MPI-IO

A D I O RIO server R I O A D I O Local file systems

Local file systems

Fig RIO mechanisms for highperformance access to remote le systems

provide remote access to parallel le systems This remote IO RIO interface Figure

is based on the abstract IO device ADIO interface ADIO denes an interface for

op ening closing reading and writing parallel les It do es not dene semantics for caching

le replication or parallel le descriptor semantics Several p opular IO systems have b een

implemented eciently on ADIO RIO extends ADIO by adding transparent remote

access and global naming using a URLbased naming scheme RIO remote access features

use Nexus mechanisms

HigherLevel Services

In the preceding section we describ ed the core interfaces and services provided by the

Globus to olkit These interfaces are not intended for application use Rather they are

intended to b e used to construct higherlevel policy comp onents These p olicy comp onents

can serve the function of middleware on which yet higherlevel comp onents are constructed

to create applicationlevel interfaces In the following we describ e two such middleware

comp onents

Parallel Programming Interfaces

Numerous parallel programming interfaces have b een adapted to use Globus authentication

pro cess creation and communication services hence allowing programmers to develop

metacomputing applications using familiar to ols These interfaces include a complete

implementation of MPI and hence to ols layered on top of MPI such as many High

Performance Fortran systems Comp ositional C a parallel extension to C

Fortran M a taskparallel Fortran nPerl a version of the Perl scripting language extended

with remote reference and remote pro cedure call mechanisms and NexusJava a Java class

library that supp orts remote pro cedure calls

We say a few words here ab out the Globus implementation of MPI A Nexus

device supp orts the abstract device interface used within the MPICH implementation

of MPI Globus mechanisms are used for authentication and pro cess startup

These comp onents are suciently integrated that in the IWAY a user could allo cate

a heterogeneous collection of resources and then start a program simply by typing

impirun Under the covers of course Globus mechanisms are used to select appropriate

authentication pro cess creation and communication mechanisms

Unied Certicatebased Authentication

The Globus authentication interface can b e used to implement a range of dierent security

p olicies We are currently investigating a p olicy that denes a global public keybased

authentication space for all users and resources That is we provide a centralized authority

that dene systemwide names accounts for users and resources These names allow

an application to use a single user id and password for all resources They also p ermit

the application to verify the identity of requested resources Note that this p olicy do es not

address authorization resources can use their usual mechanisms to determine the users to

which they will grant access

While not practical for largescale op en environments the use of a centralized authority

to identify users and resources is appropriate for limitedscale testb ed environments such as

the IWAY and GUSTO see b elow The approach has the signicant advantage that it can

b e implemented easily with current certicatebased authentication proto cols such as that

provided in the Secure So cket Library SSL Note that while names that is certicates

are issued by a centralized certicate authority the authentication of users and services

involves only the agents b eing authenticated it do es not require any interaction with the

issuing authority

In the longer term authentication and authorization schemes must address the

requirements of larger dynamic heterogeneous communities in which trust relationships

span multiple administrative domains and can b e irregular and selective Some member

organizations will b e more trusting of sp ecic members than others still others may

b e comp etitors Some members may feel the need to control all trust relationships

explicitly even if it means that fewer community assets are available for their use this may

stimulate the evolution of subcommunities with their own set of trust and authorization

relationships Community members willing to delegate to other members the ability to

extend relationships on their b ehalf may more fully enjoy the b enets of membership in

the greater community

Our certicatebased p olicy can b e extended to supp ort limited forms of trust

delegation Globus resource certicates can b e given to sites with multiple resources or

users These sites can in turn set up a lo cal certicate authority which signs certicates

that it issues with the certicate issued by the Globus authority This situation is acceptable

if the user or resource trusts the administration of the site issuing the Globussigned

certicate

Globus Testbeds and Exp eriences

We have referred ab ove to the IWAY exp eriment in which a number of Globus comp onents

were rst deployed as part of the ISoft software environment These comp onents

included the Nexus communication library Kerb erosbased authentication a pro cess

creation mechanism and a centralized scheduler for compute resources While inadequate in

a number of resp ects eg nonscalable no scheduling of network resources this exp eriment

did demonstrate the advantages of the Globus approach of providing basic mechanisms I

WAY applications developed with a variety of parallel to ols MPI CAVEcomm CC

etc were p orted to the IWAY environment by adapting those to ols to use Globus

mechanisms for authentication pro cess creation and communication If Globus had

enforced a particular parallel programming interface considerable eort would have b een

required to adapt existing applications to use this interface

We are currently working with the highp erformance computing community to dene

additional testb eds We describ e here just one of these the Globus Ubiquitous Sup er

computing Testbed GUSTO GUSTO is intended initially at least as a computer science

rather than an application testb ed meaning that the initial development fo cus is on de

ploying and evaluating basic mechanisms for authentication scheduling communication

and information infrastructure In this resp ect it complements eorts such as the IWAY

that fo cus more on applications

GUSTO is intended to span approximately fteen sites ve core sites are active as of

November encompassing a number of sup ercomputers IBM SP Silicon Graphics

Power Challenge etc and workstations Basic connectivity is via the Internet although

some machines are also connected via OC Mbsec ATM networks Eventually we

exp ect most GUSTO sites to b e accessible over the National Science Foundations OC

vBNS network Authentication is based on the uniform certicate mechanism describ ed

ab ove The initial information service is provided by a centralized information server that

maintains information ab out resource conguration and current network characteristics

This information is up dated dynamically providing a more accurate view of system

conguration than was available in ISoft

Conclusions and Future Work

The Globus pro ject is attacking the metacomputing software problem from the b ottom up

by developing basic mechanisms that can b e used to implement a variety of higherlevel ser

vices Communication resource lo cation resource allo cation information authentication

data access and other services have b een identied and considerable progress has b een

made toward constructing quality implementations The denition development applica

tion evaluation and renement of these comp onents are ongoing pro cesses that we exp ect

to pro ceed for the next two years at least We hop e to involve more of the metacomputing

community in this pro cess by adapting relevant higherlevel services eg applicationlevel

scheduling p erformance steering to use Globus mechanisms and by participating

in the construction of additional testb eds eg GUSTO

The Globus pro ject is also addressing the conguration problem in metacomputing sys

tems with the goal of pro ducing an Adaptive Wide Area Resource Environment that sup

p orts the construction of adaptive services and applications We have introduced selection

information and notication mechanisms and have dened Globus comp onent interfaces so

that these mechanisms can b e used to guide the conguration pro cess Preliminary exp eri

ments with dynamic communication selection suggest that these conguration mechanisms

can have considerable value

In summary we list three areas in which we b elieve the Globus pro ject has already

made contributions and in which we hop e to see considerable further progress

 The denition of a core metacomputing system architecture on which a range of

alternative metacomputing environments can b e built

 The development of a framework that allows applications to resp ond to dynamic

b ehaviors in the underlying metacomputing environment and the denition and

evaluation of various adaptation p olicies

 The demonstration in testb eds such as the IWAY and GUSTO that useful higher

level services can b e layered eectively on top of the interfaces dened by the

Globus to olkit and that automatic conguration mechanisms can b e used to enhance

p ortability and p erformance

Acknowledgments

We gratefully acknowledge the contributions made by Steve Tuecke Jonathan Geisler

Craig Lee Steve Schwab Warren Smith Jace Mogill and John Garnett to the design

and implementation of Globus comp onents It is also a pleasure to acknowledge the

contributions of Tom DeFanti and Rick Stevens and the many other participants in the

IWAY pro ject

This work was supp orted by the National Science Foundations Center for Research in

Parallel Computation under Contract CCR by the Defense Advanced Research

Pro jects Agency under contract NC and by the Mathematical Information

and Computational Sciences Division subprogram of the Oce of Computational and

Technology Research US Department of Energy under Contract WEng

References

F Berman R Wolski S Figueira J Schopf and G Shao Applicationlevel scheduling on

distributed heterogeneous networks in Pro ceedings of Sup ercomputing ACM

C Catlett and L Smarr Metacomputing Communications of the ACM pp

K M Chandy and C Kesselman CC A declarative concurrent object oriented program

ming notation in Research Directions in Ob ject Oriented Programming The MIT Press

pp

T DeFanti I Foster M Papka R Stevens and T Kuhfuss Overview of the IWAY Wide

area visual supercomputing International Journal of Sup ercomputer Applications

pp

D Diachin L Freitag D Heath J Herzog W Michels and P Plassmann Remote engineering

tools for the design of pollution control systems for commercial boilers International Journal

of Sup ercomputer Applications pp

T L Disz M E Papka M Pellegrino and R Stevens Sharing visualization experiences among

remote virtual environments in International Workshop on High Performance Computing for

Computer Graphics and Visualization SpringerVerlag pp

I Foster J Geisler C Kesselman and S Tuecke Managing multiple communication

methods in highperformance networked computing systems Journal of Parallel and Distributed

Computing To app ear

I Foster J Geisler W Nickless W Smith and S Tuecke Software infrastructure for the

IWAY highperformance distributed computing experiment in Pro c th IEEE Symp on High

Performance Distributed Computing IEEE Computer So ciety Press pp

I Foster J Geisler and S Tuecke MPI on the IWAY A widearea multimethod implementa

tion of the Message Passing Interface in Pro ceedings of the MPI Developers Conference

IEEE Computer So ciety Press pp

I Foster N Karonis C Kesselman G Ko enig and S Tuecke A secure communications in

frastructure for highperformance distributed computing preprint Mathematics and Computer

Science Division Argonne National Lab oratory Argonne Ill

I Foster C Kesselman and S Tuecke The Nexus approach to integrating multithreading and

communication Journal of Parallel and Distributed Computing To app ear

A Geist A Beguelin J Dongarra W Jiang B Manchek and V Sunderam PVM Parallel

Virtual MachineA Users Guide and Tutorial for Network Parallel Computing MIT Press

A Grimshaw and W Wolf Legion a view from feet in Pro c th IEEE Symp on

High Performance Distributed Computing IEEE Computer So ciety Press pp

A Grimshaw W Wulf J French A Weaver and P Reynolds Jr Legion The next logical

step toward a nationwide virtual computer Tech Rep CS Department of Computer

Science University of Virginia

W Gropp E Lusk N Doss and A Skjellum A highperformance portable implementation

of the MPI message passing interface standard Technical Rep ort ANLMCSTM Math

ematics and Computer Science Division Argonne National Lab oratory Argonne Ill

W Gropp E Lusk and A Skjellum Using MPI Portable Parallel Programming with the

Message Passing Interface MIT Press

C Lee C Kesselman and S Schwab Nearrealtime satel lite image processing Metacomputing

in CC Computer Graphics and Applications pp

J Linn Generic security service application program interface Internet RFC

M Litzkow M Livney and M Mutka Condor a hunter of id le workstations in Pro c th

Intl Conf on Distributed Computing Systems pp

A Mainwaring Active Message applications programming interface and communication sub

system organization tech rep Dept of Computer Science UC Berkeley Berkeley CA

C Mechoso et al Distribution of a Coupledocean General Circulation Model across high

speed networks in Pro ceedings of the th International Symp osium on Computational Fluid

Dynamics

J Morris et al Andrew A distributed personal computing environment Communications of

the ACM

J Nieplo cha and R Harrison Shared memory NUMA programming on the IWAY in Pro c

th IEEE Symp on High Performance Distributed Computing IEEE Computer So ciety Press

pp

M Norman et al Galaxies collide on the IWAY An example of heterogeneous widearea

collaborative supercomputing International Journal of Sup ercomputer Applications

pp

S Pakin M Lauria and A Chien High performance messaging on workstations Il linois Fast

Messages fm for Myrinet in Pro ceedings of Sup ercomputing IEEE Computer So ciety

Press

D Reed C Elford T Madhyastha E Smirni and S Lamm The Next Frontier Interactive

and Closed Loop Performance Steering in Pro ceedings of the ICPP Workshop on

Challenges for Parallel Pro cessing Aug pp

R Thakur W Gropp and E Lusk An AbstractDevice Interface for Implementing Portable

ParallelIO Interfaces in Pro ceedings of The th Symp osium on the Frontiers of Massively Parallel Computation Octob er