Digital Software and Data Rep ositories for



Supp ort of Scienti Computing

Ronald Boisvert

National Institute of Standards and Technology

y

Shirley Browne

UniversityofTennessee

Jack Dongarra

UniversityofTennessee and Oak Ridge National Lab

Eric Grosse

AT&T Bell Lab oratories

Abstract

This pap er discusses the sp ecial characteristics and needs of software

rep ositories and describ es how these needs have b een met by some exist-

ing rep ositories. These rep ositories include Netlib, the National HPCC

Software Exchange, and the GAMS Virtual Rep ository.We also describ e

some systems that provide on-line access to various typ es of scienti c data.

Finally,we outline a prop osal for integrating software and data rep osito-

ries into the world of digital do cument libraries, in particular CNRI's

ARPA-sp onsored Digital Library pro ject.



The work describ ed in this pap er was sp onsored by NASA under Grant No. NAG5-

2736, by the National Science Foundation under Grant No. ASC-9103853, and byAT&T Bell

Lab oratories.

y

Author to whom corresp ondence should b e directed. 107 Ayres Hall, Computer Sci-

ence Department, UniversityofTennessee, Knoxville, TN 37996-1301, 615 974-5886,

[email protected] 1

1 Intro duction

Most work on digital libraries has fo cused on storage, retrieval, and display

of digital forms of do cuments. A numb er of on-line software rep ositories have

b een develop ed that provide access to software and software artifacts. These

do cument and software e orts have b een mostly indep endent, with little atten-

tion paid to integrating the twotyp es of libraries and to developing common

principles for organization and op eration.

This pap er discusses the sp ecial characteristics and needs of software rep os-

itories and describ es how these needs have b een met by some existing rep osi-

tories. These rep ositories include Netlib [20, 12], the National HPCC Software

Exchange [13], and the GAMS Virtual Rep ository [8]. We also describ e some

systems that provide on-line access to various typ es of scienti c data. Finally,

we outline a prop osal for integrating software and data rep ositories into the

world of digital do cument libraries, in particular CNRI's ARPA-sp onsored Dig-

ital Library pro ject [28, 26].

2 Characteristics of Some Existing Software

Rep ositories

2.1 Netlib

Netlib b egan services in 1985 to ll a need for cost-e ective, timely distribution

of high-quality mathematical software to the research community. Some of the

libraries Netlib distributes { such as EISPACK, LINPACK, FFTPACK, and

LAPACK{have long b een used as imp ortant to ols in scienti c computation and

are widely recognized to b e of high quality. The Netlib collection also includes

a large numb er of newer, less well-established co des. Most of the software is

written in Fortran, but programs in other languages, such as C and C++, are

also available.

Netlib sends, by return electronic mail, requested routines together with

subsidiary routines and any requested do cuments or test programs supplied by

the software authors [20]. Xnetlib, an interactive to ol for software and do cument

distribution [19], use an X Windowinterface and TCP/IP connections to allow

users to receive replies to their requests within a matter of seconds. The interface

provides a numb er of mo des and searching mechanisms to facilitate searching

through a large distributed collection of software and do cuments. World Wide

Web browsers such as Mosaic and Netscap e can also b e used to access Netlib

1

via HTTP and FTP .

Although the original fo cus of the Netlib rep ository was on mathematical

software, the collection has grown to include other software such as networking

1

Netlib is accessible from a WWW browser at http://www.netlib.org/ 2

to ols and to ols for visualization of multipro cessor p erformance data, technical

rep orts and pap ers, a Whitepages Database, b enchmark p erformance data, and

information ab out conferences and meetings. The numb er of Netlib servers has

grown from the original two, at Oak Ridge National Lab oratory initially at

Argonne National Lab oratory and , to servers in Norway, the United

Kingdom, Germany, Australia, Japan, and Taiwan. A mirroring mechanism

keeps the rep ository contents at the di erent sites consistent on a daily ba-

sis, as well as automatically picking up new material from distributed editorial

sites[24].

Netlib di ers from other publicly available software distribution systems,

suchasArchie, in that the collection is mo derated by an editorial b oard and

the software contained in it is widely recognized to b e of high quality.However,

the Netlib rep ository is not intended to replace commercial software. Commer-

cial software companies provide value-added services in the form of supp ort.

Although the Netlib collection is mo derated, its software comes with no guar-

antee of reliability or supp ort. Rather, the lack of bureaucratic, legal, and

nancial imp ediments encourages researchers to submit their co des by ensuring

that their work will b e made available quickly to a wide audience.

2.2 The National HPCC Software Exchange NHSE

The National HPCC Software Exchange NHSE is an Internet-accessible re-

source that will facilitate the exchange of software and information among

research and computational scientists involved with High Performance Com-

puting and Communications HPCC [13]. The purp ose of the NHSE is to

promote the development of discipline-oriented software and do cument rep osi-

tories and of contributions to and use of such rep ositories by Grand Challenge

teams, as well as by other memb ers of the high p erformance computing com-

munity. The target audiences for the NHSE include HPCC application and

computer scientists, users of government sup ercomputer centers, and p otential

industrial users. A prototyp e of the NHSE is accessible from a WWW browser

at http://www.netlib.org/nse/.

The scop e of the NHSE is software and software-related artifacts pro duced

by and for the HPCC Program. Software-related artifacts include algorithms,

sp eci cations, designs, and software do cumentation. The following typ es of

software are to b e made available:

 Systems software and software to ols. This category includes parallel pro-

cessing to ols such as parallel compilers, message-passing communication

subsystems, and parallel monitors and debuggers.

 Data analysis and visualization to ols.

 Basic building blo cks for accomplishing common computational and com-

munication tasks. These building blo cks will b e of high quality and trans- 3

p ortable across platforms. Building blo cks are meant to b e used by Grand

Challenge teams and other researchers in implementing programs to solve

computational problems. Use of high-quality transp ortable comp onents

will sp eed implementation, as well as increase the reliability of computed

results.

 Research co des that have b een develop ed to solve dicult computational

problems. Many of these co des will have b een develop ed to solve sp eci c

problems and thus will not b e reusable as is. Rather, they will serveas

pro ofs of concept and as mo dels for developing general-purp ose reusable

software for solving broader classes of problems. The development of this

reusable software is exp ected to b e undertaken by commercial companies,

rather than by academic researchers.

A catalog of the software currently available from the NHSE is accessible at

http://www.netlib.org/nse/sw survey.html.

Although the di erent disciplines will maintain their own software rep os-

itories, users should not need to access each of these rep ositories separately.

Rather, the NHSE will provide a uniform interface to a virtual HPCC software

rep ository which will b e built on top of the distributed set of discipline-oriented

rep ositories. The interface will assist the user in lo cating relevant resources

and in retrieving these resources. A combined browse/searchinterface will al-

low the user to explore the various HPCC areas and b ecome familiar with the

available resources. A longer term goal of the NHSE is to provide users with

domain-sp eci c exp ert help in lo cating and understanding relevant resources.

2.3 GAMS Virtual Rep ository

The Guide to Available Mathematical Software GAMS pro ject of the National

Institute of Standards and Technology NIST studies techniques to provide

scientists and engineers with improved access to reusable computer software

comp onents available to them for use in mathematical mo deling and statistical

analysis. One of the pro ducts of this work is the GAMS system, an on-line cross-

index and virtual rep ository of mathematical software [8]. GAMS p erforms the

function of an interrep ository and interpackage cross-index, collecting and main-

taining data ab out software available from external rep ositories and presenting

it as a homogeneous whole. It also provides the functions of a rep ository itself

i.e., retrieval. However, instead of maintaining the cataloged software itself,

it provides transparent on-demand access to rep ositories managed by others.

GAMS currently contains information on more than 9800 problem-solving

software mo dules from ab out 85 packages found in four physically distributed

software rep ositories three maintained at NIST and Netlib. In addition to most

of the software in the Netlib collection, GAMS cross indexes individual comp o-

nents in large multipurp ose libraries such as IMSL, NAG, PORT, STARPAC 4

and SLATEC, as well as capabilities of statistical analysis systems suchasDAT-

APLOT and SAS. Both public-domain and commercial software is cataloged,

and although source co de of proprietary software pro ducts are not available

through GAMS, related items such as do cumentation and example programs

often are.

All problem-solving software mo dules in GAMS are assigned one or more

problem classi cations from a 736-no de tree-structure d taxonomy of mathe-

matical and statistical problems develop ed as part of the pro ject [9]. Users

can browse through mo dules in any given problem class. To nd an appropri-

ate class, one can utilize the taxonomy as a decision tree, or enter keywords

which are then mapp ed to problem classes. Search lters can b e declared which

allow users to sp ecify preferences such computing precision or programming lan-

guage. In addition, users can browse through all mo dules in a given package,

or all mo dules with a given name. Each mo dule's abstract lists the retrievable

ob jects asso ciated with the mo dule, such as do cumentation, examples, test pro-

grams, source co de and dep endencies. More than 32,000 such ob jects can b e

retrieved.

At the core of the GAMS system is a relational database of information

ab out available software. This database is maintained at NIST, which provides

a classi cation service for the rep ositories it indexes. The GAMS network server

provides this information to network clients using a sp ecialized proto col over

TCP/IP connections. In addition, a gateway to the World Wide Web has b een

develop ed using the Common GatewayInterface CGI facility of NCSA's http d

2

server .

3 Indexing and Searching of Software Ob jects

Cataloguing information for software ob jects serves two purp oses { 1 to supply

material from which a searchable index may b e constructed, and 2 to supply

information needed by the user to select/reject search hits and to obtain and

use selected software. The eld names and de nitions used for cataloguing in

a particular library are describ ed in the data model used by that library.For

example, for do cument libraries, the CSTR pro ject [26] has adopted RFC1357

[14] as its data mo del.

3.1 Data Mo dels

Data mo dels used for do cument digital libraries are not in general suitable for

use by software rep ositories. Although some elds are useful in b oth settings {

e.g., author,title, abstract { software cataloguing requires a numb er of additional

elds. A eld that app ears in most ma jor catalogs is a requirements eld that

lists the hardware, op erating system, and other software needed to use the

2

Accessible at http://gams.nist.gov/ 5

catalogued item. Another imp ortant eld in the case of software

rep ositories such as Netlib, where software is author-supp orted, is a contact

eld giving an electronic mail address to which questions and bug rep orts may

b e sent. Still another eld used by many software rep ositories is a certi cation

eld that tells at what level the software has b een certi ed and p ossibly includes

p ointers to certi cation artifacts such as completed checklists and testing results.

The Reuse Library Interop erability Group RIG has develop ed and ap-

proved the Basic Interop erability Data Mo del BIDM as a standard data mo del

for software rep ository interop erability, and the BIDM has b een submitted for

balloting as an IEEE standard [3]. The BIDM will b e used as a lowest com-

mon denominator for interop eration b etween software rep ositories. However,

b ecause of considerable variation in the purp ose, contents, and application do-

mains of di erent software rep ositories, no single data mo del will b e suitable

for all, and imp ortant cataloguing information will b e lost in exp orting to the

BIDM. An example where such loss o ccurs is the certi cation eld. This eld

was not included in the BIDM b ecause of the wide variety of certi cation and

evaluation metho ds in use at di erent rep ositories. The RIG also encountered

diculty in developing controlled vo cabulary lists, b ecause again di erent sets

of terms were appropriate for di erent rep ositories. The approachnow b eing

taken by the RIG is to de ne a standard for an Extensible Uniform Data Mo del

EUDM, which will b e a meta-mo del a rep ository can use to describ e the data

mo del it is using. For example, using the EUDM, a rep ository will b e able to

de ne its certi cation metho ds and the meanings of di erent certi cation levels.

As a memb er of the RIG, Netlib is participating in development and promotion

of standard data mo dels for software rep ositories.

3.2 Software Classi cation

Anumb er of studies have shown that prop er classi cation of software ob jects

contributes to e ective lo cation of the ob jects by p otential reusers [15,31, 22].

Classi cation is carried out by assigning co des and/or keywords from a classi -

cation scheme or thesaurus. Classi cations and thesauri develop ed for indexing

do cuments, such as the INSPEC classi cation [1] and INSPEC thesaurus [2],

are inadequate for indexing software ob jects. Firstly, these to ols cover a broad

range of topics and cover software-related topics only sup er cially.Thus, they

do not allow the user to discriminate nely enough among the available software

ob jects. Secondly, e ective classi cation of software ob jects requires that the

function of the ob ject b e indexed [10]. Because do cuments are not used as soft-

ware is, terms related to function have not generally b een included in thesauri

develop ed for do cument indexing.

A classi cation scheme that has b een develop ed sp eci cally for use in index-

ing mathematical software is the GAMS hierarchy[9]. The GAMS scheme has

b een widely adopted by network-accessible rep ositories such as Netlib and by

commercial mathematical software libraries suchasNAG and IMSL. A succes- 6

sor to GAMS is currently under development. The new scheme will re ne areas

needing b etter discrimination and will add new categories to encompass recent

developments in numerical algorithms. The new scheme will also b e reorganized

so as to b e less cumb ersome.

An HPCC software thesaurus is under development as part of the NHSE.

This thesaurus uses GAMS categories for mathematical software but de nes

new terms for other areas of high p erformance computing. Some of these terms

are drawn from an existing HPCC glossary [25] and from a b o ok that gives

an overview of parallel computing [21]. Terms from the INSPEC thesaurus [2]

are b eing used for application areas. The thesaurus has a faceted structure

with facets for application area, problem area, function, algorithm, and target

architecture. The thesaurus will b e made available on-line in hyp ertext form to

assist with searching the NHSE, similar to [30]. Hyp ertext links from terms to

scop e and de nition notes will also b e provided.

3.3 SearchInterfaces

3

Netlib currently provides a searchinterface that allows the user to do eld-

sp eci c keyword searching. For example, the user may searchby author, le-

name, abstract keywords, or GAMS classi cation. Search results are returned

as a hyp ertext list of catalog records from which the user may select les to

view or download.

Searching bykeywords or classi cation co des often returns a large number

of search hits, leaving the user to sort through them. Further discrimination

often cannot b e provided byanoverall classi cation scheme, but requires use of

a domain sp eci c knowledge base. Such knowledge bases have b een constructed

for sp ecialized domains, including di erential equations [29,27, 4] and approx-

imation [23]. We are exp erimenting with providing on-line hyp ertext interfaces

to such knowledge bases. For example, wehave provided a hyp ertext version of

4

a decision tree for approximation algorithms .

Wehave also develop ed a prototyp e exp ert help system to assist users in

selecting software within sp eci c domains [7, 8]. An advisory system for a

given problem class helps the user discriminate b etween problem-solving soft-

ware mo dules for that class. For a given problem class, a set of features are

partitioned into a small set of feature classes, and information is enco ded ab out

how each feature applies to each software mo dule. Prototyp e user interfaces

have b een develop ed that allow the user to interact with choice widgets for

the various features. The system provides more sp eci c and e ective help in

selecting software than a domain-indep endent searchinterface.

3

Accessible from a WWW browser at http://www.netlib.org/nse/netlib query.html

4

Accessible from a WWW browser at http://www.netlib.org/a/catalog.html 7

4 Retrieval of Software Ob jects

Once a user has lo cated relevant software ob jects, he needs to b e able to make

use of them. Mo des of use include the following:

1. Downloading, con guring, compiling, and executing a complete program

or package.

2. Downloading routines and combining them with a user-supplied main pro-

gram and other user-supplied routines b efore compiling and executing.

3. Using retrieved source co de as a starting p oint for developing software

with similar functionality.

4. Downloading templates or archetyp es that provide a framework for writing

actual co de.

5. Using a remote execution service { i.e., shipping input data over a network

to a remote execution server and then retrieving the resulting output data.

4.1 Downloading Files

A user maydownload les from Netlib by sending an email request or by click-

ing on the lenames from Xnetlib, GAMS, or WWW interfaces. Netlib les

are organized into directories. Some directories contain a single package, such

as the LAPACK directory, while others contain programs for a particular do-

main, such as the OPTimization directory. Each directory contains an index

le containing catalog records for the les in the directory. Most directories also

contain a readme le giving an overview of the directory. Some directories have

sub directories, for example for the di erentFortran precisions available and for

test and example programs. A user may initially do a keyword search to lo cate

relevant directories and then browse the index les for those directories to lo cate

relevant les whichmay then b e downloaded.

When downloading a routine from Netlib, a user may make use of a dep endency-

checking mechanism that allows retrieval of the entire dep endency tree for that

routine. The user may sp ecify that a subtree b e omitted, however, if those rou-

tines have b een retrieved previously. There is also an automatic tar facility that

builds and returns a tar le of any Netlib directory or sub directory up on user

request. Binary executable les for several Netlib packages are also available {

for example, for the Xnetlib and HeNCE packages.

Users maydownload les from the NHSE via a WWW browser. The NHSE

provides a searchable catalog of HPCC software, as well as a browseable listing

5

by category . Eachentry in the catalog includes either a URL that maybe

clicked on to retrieve the software or more information ab out it. Clicking on

5

Accessible at http://www.netlib.org/nse/sw survey.html 8

these URLs connects the user to the software provider's home site { i.e., the

NHSE provides an interface to a virtual distributed rep ository consisting of a

large numb er of indep endently maintained physical rep ositories. In the near

future, the NHSE will switch from using URLs to using lo cation indep endent

names [11]. Use of lo cation indep endent names will allow les to b e moved

without requires references to them to b e changed and will p ermit transparent

mirroring and reliable cacheing.

Care should always b e exercised when downloading and using les obtained

over a network, esp ecially tar les and executables. Although Netlib is re-

garded by most users as a trusted source, it would b e p ossible for someone to

imp ersonate a Netlib server and make dangerous tar or executable les avail-

able, purp ortedly from Netlib. Because source co de is unlikely to b e examined

closely by the user, delib erately intro duced bugs or other malicious mo di ca-

tions might also slip past in source co de form. In the near future, b oth Netlib

and the NHSE will use public key cryptography to allow users to authenticate

the source of downloaded les.

4.2 Templates and Archetyp es

A template is a description of a general algorithm rather than the executable

ob ject co de or the source co de more commonly found in a conventional software

library [5]. Templates may b e customized by the user { for example, they can

b e con gured for the sp eci c data structure of a problem or for the sp eci c

computing system on which the problem is to run. Templates are written in a

language-indep endent Algol-like structure, which is readily translatable into a

target language suchasFortran or C. A collection of templates fo cusing on iter-

6

ative metho ds for solving large sparse linear systems is available from Netlib .

For each template, the following is provided: a mathematical description of the

ow of the iteration; discussion of convergence and stopping criteria; suggestions

for applying a metho d to sp ecial matrix typ es e.g., banded systems; advice for

tuning for example, which preconditioners are applicable and which are not;

tips on parallel implementations; and hints as to when to use a metho d, and

why.

A program archetyp e is a a program design strategy appropriate for a

restricted class of problems, and b a collection of program designs with c

implementations of exemplar problems in one or more programming languages

and optimized for a collection of target machines. The program design strategy

includes archetyp e-sp eci c information ab out metho ds of deriving a program

from a sp eci cation, metho ds of parallelizing sequential programs, the program

structure, metho ds of reasoning ab out correctness and p erformance, empirical

data on p erformance measurements and tuning for di erent kinds of machines,

and suggestions for test suites. A pro ject at Caltech is exploring the question

6

Accessible at http://www.netlib.org/linalg/html templates/report.html 9

of whether a library of parallel program archetyp es b e used to reduce the e ort

7

required to pro duce correct ecient programs .

4.3 Remote Execution

If a user needs to run a program only infrequently, and if compiling and in-

stalling the program involves considerable overhead, the user may prefer to

take advantage of a remote execution service if one is available. Netlib has ex-

p erimented with making remote execution of the Fortran-to-C converter f2c

and Fortran checking ftnchek programs available. We con gured the Mosaic

WWW browser to invoke the Tcl language interpreter to execute downloaded

les of typ e application/x-safe-tcl. We then made downloaded user interfaces

for the f2c and ftnchek programs available on a Netlib server. Users could then

download the interface mo dules and use them to interact with the remote execu-

tion services. The user interfaces allowed the user to select les to b e transferred

for pro cessing and to set various options. Our work on remote execution is ex-

p erimental at this stage b ecause a safe client execution environment for Tcl has

not yet b een rigorously de ned, although researchers at Sun Microsystems are

working on it.

Software for using ORNL's GRAIL and GENQUEST remote execution ser-

vices for doing DNA sequence analysis, gene assembly, and sequence comparison

8

is available through the NHSE . These services cannot b e used from a WWW

browser, but require downloading sp ecialized X-Windows client software. In the

future, the NHSE plans to supp ort use of such services from a WWW browser

by means of a safe execution environment for downloadable Safe-Tcl co de, so

that the client mo dule may b e executed directly from the browser.

Another p ossibility for remote execution is to allow users to upload exe-

cutable co de to a Netlib server and run it there. For example, the user might

want to send an agent that would sift through and summarize computer p er-

formance data residing on Netlib. A search service such as Harvest might send

an agent that would summarize the contents of Netlib and stream the summary

back to an indexing engine. We are investigating the provision of this kind of

user-directed remote execution using the Plan 9 op erating system develop ed at

AT&T Bell Labs.

4.4 Change Noti cation

Some digital do cument libraries have a noti cation service that informs sub-

scrib ers of newly available do cuments. The noti cation service for a software

rep ository is somewhat di erent, b ecause it informs subscrib ers of changes and

bug xes to the software as much or more than of additions of new software.

7

More information is available at

http://www.etext.caltech.edu/Papers2/ArchetypeOverview/ArchPaper.html

8

More information is available at http://avalon.epm.ornl.gov/ 10

In the early days of the Netlib rep ository, when all access was by email and the

trac was mostly from professional numerical analysts, we relied on log les

to send out noti cation of imp ortant bug xes to everyone who had retrieved

a ected les. Now, b ecause access is more anonymous and a wider sp ectrum of

users are involved, the old scheme has b een replaced by explicit subscription.

People may indicate interest in sp eci c Netlib directories, using subscribe and

unsubscribe commands. Automatic noti cation is sent, on a daily basis, when

les in the directory are changed. The subscrib er lists also give the authors

and editors a way to judge what community is particularly interested in a given

Netlib collection.

5 Access to Scienti c Data

The Netlib Performance Database provides an on-line catalog of public-domain

computer b enchmarks such as the Linpack Benchmark, Perfect Benchmarks,

and the Genesis Benchmarks [6]. A b enchmark co de is a program designed to

b e run on an architecture so as to pro duce a relative measure of its execution.

Benchmarks tend to evolve from individual applications that do not necessarily

stress all features of a given architecture. Thus, b enchmark numb ers do not

imply general machine p erformance but instead describ e the p erformance of a

machine on an algorithm or application class.

Although b enchmarking has b ecome very p opular b ecause of the diversity

and comp etition in the computer hardware business, there was, previous to

development of our database, no central rep ository for b enchmark data. The

9

WWW interface to our Performance Database allows the user to

 view complete b enchmark rep orts that disply sorted data from various

published b enchmark rep orts,

 browse the p erformance data tree by selecting the b enchmark and ma-

chines ab out which information is desired,

 search the p erformance database

There are also p ointers to b enchmark pap ers and other b enchmark and p erformance-

related literature.

Various archives of scienti c data are accessible from the NHSE { for ex-

10 11

ample, NASA's Planetary Data System and Astrophysics Data System ,

12

NIST's Atomic Sp ectroscopic Database , and NOAA's Environmental Data

13

Centers . There is no uniform cataloguing metho d or searchinterface for these

9

Accessible at http://performance.netlib.org/performance/html/PDStop.html

10

Accessible at http://stardust.jpl.nasa.gov/pds home.html

11

Accessible at http://adswww.harvard.edu

12

Accessible at http://aeldata.phy.nist.gov/nist beta.html

13

Accessible at http://www.esdim.noaa.gov/ 11

databases, nor a standard way of describing the contents and services o ered.

Thus, the user has no way of systematically discovering relevant databases and

must learn a di erentinterface for each one.

6 Integration with Do cument Digital Libraries

The Corp oration for National Research Initiatives CNRI is working with ve

ma jor universities CMU, Cornell, UC-Berkeley, Stanford, and MIT on an

ARPA-sp onsored pro ject to develop concepts for digital libraries. As part of

this pro ject, each university is placing its Computer Science Technical Rep orts

on-line and providing access to the distributed CSTR collection. Technologies

develop ed for the CSTR pro ject include the Dienst distributed search system

[18] and a Handle Management Service for assigning, maintaining, and using

unique identi ers for digital library ob jects [16].

The basic architecture b eing develop ed by CNRI for distributed digital li-

braries includes the following concepts [26]:

 A digital object which consists of a sequence of bits plus a unique identi er

known as a hand le the binding b etween the handle and the sequence of

bits maychange over time.

 Naming authorities who are resp onsible for assigning unique identi ers

within their p ortions of the handle namespace.

 Repositories from which digital ob jects are available.

 Information and Reference IR servers that provide reference information

ab out collections of digital ob jects.

The CNRI work is closely related to the IETF Uniform Resource Identi er

URI Working Group's work on Uniform Resource Names URNs [32] and

Uniform Resource Citations [17]. CNRI's handle is the equivalent of IETF's

URN, and CNRI's IR server serves a similar function to IETF's URC server.

The Netlib and NHSE Development Group has b een engaged in a parallel

e ort to implement a lo cation-indep endent naming architecture [11]. We provide

for twotyp es of lo cation-indep endent names:

 a Uniform Resource Name URN, for which the contents it refers to may

change { e.g., the \currentversion of LAPACK".

 a Lo cation Indep endent File Name LIFN, for which the binding b etween

the name and the byte contents of the le it refers to is xed, once assigned.

This typ e of name is needed for unambiguous references when attaching

critical reviews or rep orting scienti c results obtained using a particular

version of a piece of software. LIFNs also p ermit reliable and ecient

cacheing and mirroring of les. 12

Atany given time, a URN is asso ciated with exactly one LIFN. By lo oking

up the LIFN asso ciated with a URN and then retrieving the le corresp onding

to that LIFN, the user is assured of retrieving the most recent copy,even if

some mirrored copies are out-of-date. Thus, we obtain consistency of replicated

copies without the overhead of a replica control proto col.

We are also developing a URC server that provides supp ort for the following:

 Provision by the publisher of attribute-value pairs for a given URN in the

form of cryptographically signed assertions.

 Retrieval and authentication of assertions by users.

 Sp eci cation of the data mo del used for a particular URN.

 Choice of encryption algorithm, including none.

We prop ose to integrate our software rep ository naming architecture with

CNRI's digital library architecture in the following manner:

1. Both URNs and LIFNs will b e expressible as handles, and URN and LIFN

lo okup will b e merged with the Handle Management Service.

2. Our URC server will b e an implementation of CNRI's IR server that may

b e used for cataloguing general Web resources, including software and

data archives.

3. Similar to the Dienst proto col for do cument rep ositories, we will develop

service sp eci cations and retrieval proto cols appropriate for software and

data rep ositories. In addition, similar to the Digital Library Do cument

Architecture that de nes requirements for digital do cument structure [33],

we will de ne requirements for software and data archive structures.

References

[1] INSPEC Classi cation. Institution of Electrical Engineers, 1992.

[2] INSPEC Thesaurus. Institution of Electrical Engineers, 1993.

[3] Standard reuse library Basic Data Interop erability Mo del BIDM. Tech-

nical Rep ort RPS-0001, Reuse Library Interop erability Group, 1993.

[4] C. A. Addison, W. H. Enright, P. W. Ga ney, I. Gladwell, and P.M.

Hanson. Algorithm 687: A decision tree for the numerical solution of

initial value ordinary di erential equations. ACM Trans. Math. Softw.,

171:1{11, Mar. 1991. 13

[5] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Dunato, J. Dongarra,

V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the

Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM,

1994.

[6] M. W. Berry, J. J. Dongarra, B. H. Larose, and T. Letsche. PDS: A

p erformance database server. Scienti c Computing, 32:147{156, 1994.

[7] R. F. Boisvert. Toward an intelligent system for mathematical software

selection. In P. W. Ga ney and E. N. Houstis, editors, Programming En-

vironments for High-Level Scienti c Problem Solving, pages 79{92. North-

Holland, Amsterdam, 1992.

[8] R. F. Boisvert. The architecture of an intelligent virtual mathematical

software rep ository system. Math. & Comp. in Simul., 36:269{279, 1994.

[9] R. F. Boisvert, S. E. Howe, and D. K. Kahaner. The Guide to Avail-

able Mathematical Software problem classi cation system. Comm. Stat. {

Simul. Comp., 204:811{842, 1991.

[10] E. J. Breton. Indexing for invention. Journal of the American Society for

Information Science, 423:173{177, 1991.

[11] S. Browne, J. Dongarra, S. Green, K. Mo ore, T. Pepin, T. Rowan, and

R. Wade. Lo cation-indep endent naming for virtual distributed software

rep ositories. In ACM-SIGSOFT Symposium on SoftwareReusability, Apr.

1995. to app ear.

[12] S. Browne, J. Dongarra, S. Green, K. Mo ore, T. Rowan, and R. Wade.

Netlib services and resources. Technical Rep ort UT-CS-94-222, University

of Tennessee Computer Science Department, Feb. 1994.

[13] S. Browne, J. Dongarra, S. Green, K. Mo ore, T. Rowan, R. Wade, G. Fox,

K. Hawick, K. Kennedy,J.Po ol, and R. Stevens. The national HPCC

software exchange. IEEE Computational Science and Engineering, 1995.

to app ear.

[14] D. Cohen. A format for e-mailing bibliographic records. Internet RFC

1357, July 1992.

[15] P. Constantop oulos and E. Pataki. A browser for software reuse. In

Advanced Information Systems Engineering, 4th International Conference

CAiSE '92, Porceedings, pages 304{326. Springer-Verlag, Berlin, Germany,

May 12{15 1992.

[16] Corp oration for National Research Initiatives. Interfacing to the handle

management system. Accessible at

http://www.cnri.reston.va.us/home/cstr/handle-intro.html, 1994. 14

[17] R. Daniel Jr. and M. Mealling. URC scenarios and requirements. Internet

Draft draft-ietf-uri-urc-req-00.txt, Nov. 1994.

[18] J. R. Davis and C. Lagoze. Dienst, a proto col for a distributed

digital do cument library. Internet Draft accessible at http://cs-

tr.cs.cornell.edu/Info/dienst proto col.html, July 1994.

[19] J. Dongarra, T. Rowan, and R. Wade. Software distribution using

XNETLIB. ACM Trans. Math. Softw., 211, 1995 1995.

[20] J. J. Dongarra and E. Grosse. Distribution of mathematical software via

electronic mail. Commun. ACM, 305:403{407, May 1987.

[21] G. Fox, R. D. Williams, and P. Messina. Paral lel Computing Works. Mor-

gan Kaufmann, 1994.

[22] W. B. Frakes and T. P.Pole. An empirical study of representation metho ds

for reusable software comp onents. IEEE Trans. on Software Engineering,

208:617{630, Aug. 1994.

[23] E. Grosse. A catalogue of algorithms for approximation. In J. Mason and

M. Cox, editors, Algorithms for Approximation II, pages 479{514 of 514,

London, England, 1990. Chapman and Halll.

[24] E. Grosse. Rep ository mirroring. ACM Trans. Math. Softw., 211, Mar.

1995.

[25] K. A. Hawick. High Performance Computing and Communications glos-

sary.Technical rep ort, Northeast Parallel Architectures Center at Syracuse

University, July 1994.

[26] R. Kahn and R. Wilensky. Lo cating electronic library services and ob jects:

A frame of reference for the CS-TR pro ject. Draft for discussion purp oses,

Feb. 1994.

[27] M. S. Kamel, K. S. Ma, and W. H. Enright. ODEXPERT: An exp ert

system to select numerical solvers for initial value ODE systems. ACM

Trans. Math. Softw., 191:44{62, Mar. 1993.

[28] R. R. Larson. Design and developmentofanetwork-based electronic library.

In Proc. ASIS Mid-Year Meeting, pages 95{114, Portland, Oregon, May

1994.

[29] M. Lucks and I. Gladwell. Automated selection of mathematical software.

ACM Trans. Math. Softw., 181:11{54, Mar. 1992.

[30] R. Pollard. A hyp ertext-based thesaurus as a sub ject browsing aid for bib-

liographic databases. Information Processing and Management, 293:345{

357, 1993. 15

[31] R. Prieto-Diaz. Implementing faceted classi cation for software reuse.

Commun. ACM, 345:88{97, May 1991.

[32] K. Sollins and L. Masinter. Functional requirements for uniform resource

names. Internet RFC 1737, Dec. 1994.

[33] W. Turner. The do cument architecture for the Cornell Digital Library.

Internet RFC 1691, Aug. 1994. 16