Digital Software and Data Rep ositories for
Supp ort of Scienti c Computing
Ronald Boisvert
National Institute of Standards and Technology
y
Shirley Browne
UniversityofTennessee
Jack Dongarra
UniversityofTennessee and Oak Ridge National Lab
Eric Grosse
AT&T Bell Lab oratories
Abstract
This pap er discusses the sp ecial characteristics and needs of software
rep ositories and describ es how these needs have b een met by some exist-
ing rep ositories. These rep ositories include Netlib, the National HPCC
Software Exchange, and the GAMS Virtual Rep ository.We also describ e
some systems that provide on-line access to various typ es of scienti c data.
Finally,we outline a prop osal for integrating software and data rep osito-
ries into the world of digital do cument libraries, in particular CNRI's
ARPA-sp onsored Digital Library pro ject.
The work describ ed in this pap er was sp onsored by NASA under Grant No. NAG5-
2736, by the National Science Foundation under Grant No. ASC-9103853, and byAT&T Bell
Lab oratories.
y
Author to whom corresp ondence should b e directed. 107 Ayres Hall, Computer Sci-
ence Department, UniversityofTennessee, Knoxville, TN 37996-1301, 615 974-5886,
1 Intro duction
Most work on digital libraries has fo cused on storage, retrieval, and display
of digital forms of do cuments. A numb er of on-line software rep ositories have
b een develop ed that provide access to software and software artifacts. These
do cument and software e orts have b een mostly indep endent, with little atten-
tion paid to integrating the twotyp es of libraries and to developing common
principles for organization and op eration.
This pap er discusses the sp ecial characteristics and needs of software rep os-
itories and describ es how these needs have b een met by some existing rep osi-
tories. These rep ositories include Netlib [20, 12], the National HPCC Software
Exchange [13], and the GAMS Virtual Rep ository [8]. We also describ e some
systems that provide on-line access to various typ es of scienti c data. Finally,
we outline a prop osal for integrating software and data rep ositories into the
world of digital do cument libraries, in particular CNRI's ARPA-sp onsored Dig-
ital Library pro ject [28, 26].
2 Characteristics of Some Existing Software
Rep ositories
2.1 Netlib
Netlib b egan services in 1985 to ll a need for cost-e ective, timely distribution
of high-quality mathematical software to the research community. Some of the
libraries Netlib distributes { such as EISPACK, LINPACK, FFTPACK, and
LAPACK{have long b een used as imp ortant to ols in scienti c computation and
are widely recognized to b e of high quality. The Netlib collection also includes
a large numb er of newer, less well-established co des. Most of the software is
written in Fortran, but programs in other languages, such as C and C++, are
also available.
Netlib sends, by return electronic mail, requested routines together with
subsidiary routines and any requested do cuments or test programs supplied by
the software authors [20]. Xnetlib, an interactive to ol for software and do cument
distribution [19], use an X Windowinterface and TCP/IP connections to allow
users to receive replies to their requests within a matter of seconds. The interface
provides a numb er of mo des and searching mechanisms to facilitate searching
through a large distributed collection of software and do cuments. World Wide
Web browsers such as Mosaic and Netscap e can also b e used to access Netlib
1
via HTTP and FTP .
Although the original fo cus of the Netlib rep ository was on mathematical
software, the collection has grown to include other software such as networking
1
Netlib is accessible from a WWW browser at http://www.netlib.org/ 2
to ols and to ols for visualization of multipro cessor p erformance data, technical
rep orts and pap ers, a Whitepages Database, b enchmark p erformance data, and
information ab out conferences and meetings. The numb er of Netlib servers has
grown from the original two, at Oak Ridge National Lab oratory initially at
Argonne National Lab oratory and Bell Labs, to servers in Norway, the United
Kingdom, Germany, Australia, Japan, and Taiwan. A mirroring mechanism
keeps the rep ository contents at the di erent sites consistent on a daily ba-
sis, as well as automatically picking up new material from distributed editorial
sites[24].
Netlib di ers from other publicly available software distribution systems,
suchasArchie, in that the collection is mo derated by an editorial b oard and
the software contained in it is widely recognized to b e of high quality.However,
the Netlib rep ository is not intended to replace commercial software. Commer-
cial software companies provide value-added services in the form of supp ort.
Although the Netlib collection is mo derated, its software comes with no guar-
antee of reliability or supp ort. Rather, the lack of bureaucratic, legal, and
nancial imp ediments encourages researchers to submit their co des by ensuring
that their work will b e made available quickly to a wide audience.
2.2 The National HPCC Software Exchange NHSE
The National HPCC Software Exchange NHSE is an Internet-accessible re-
source that will facilitate the exchange of software and information among
research and computational scientists involved with High Performance Com-
puting and Communications HPCC [13]. The purp ose of the NHSE is to
promote the development of discipline-oriented software and do cument rep osi-
tories and of contributions to and use of such rep ositories by Grand Challenge
teams, as well as by other memb ers of the high p erformance computing com-
munity. The target audiences for the NHSE include HPCC application and
computer scientists, users of government sup ercomputer centers, and p otential
industrial users. A prototyp e of the NHSE is accessible from a WWW browser
at http://www.netlib.org/nse/.
The scop e of the NHSE is software and software-related artifacts pro duced
by and for the HPCC Program. Software-related artifacts include algorithms,
sp eci cations, designs, and software do cumentation. The following typ es of
software are to b e made available:
Systems software and software to ols. This category includes parallel pro-
cessing to ols such as parallel compilers, message-passing communication
subsystems, and parallel monitors and debuggers.
Data analysis and visualization to ols.
Basic building blo cks for accomplishing common computational and com-
munication tasks. These building blo cks will b e of high quality and trans- 3
p ortable across platforms. Building blo cks are meant to b e used by Grand
Challenge teams and other researchers in implementing programs to solve
computational problems. Use of high-quality transp ortable comp onents
will sp eed implementation, as well as increase the reliability of computed
results.
Research co des that have b een develop ed to solve dicult computational
problems. Many of these co des will have b een develop ed to solve sp eci c
problems and thus will not b e reusable as is. Rather, they will serveas
pro ofs of concept and as mo dels for developing general-purp ose reusable
software for solving broader classes of problems. The development of this
reusable software is exp ected to b e undertaken by commercial companies,
rather than by academic researchers.
A catalog of the software currently available from the NHSE is accessible at
http://www.netlib.org/nse/sw survey.html.
Although the di erent disciplines will maintain their own software rep os-
itories, users should not need to access each of these rep ositories separately.
Rather, the NHSE will provide a uniform interface to a virtual HPCC software
rep ository which will b e built on top of the distributed set of discipline-oriented
rep ositories. The interface will assist the user in lo cating relevant resources
and in retrieving these resources. A combined browse/searchinterface will al-
low the user to explore the various HPCC areas and b ecome familiar with the
available resources. A longer term goal of the NHSE is to provide users with
domain-sp eci c exp ert help in lo cating and understanding relevant resources.
2.3 GAMS Virtual Rep ository
The Guide to Available Mathematical Software GAMS pro ject of the National
Institute of Standards and Technology NIST studies techniques to provide
scientists and engineers with improved access to reusable computer software
comp onents available to them for use in mathematical mo deling and statistical
analysis. One of the pro ducts of this work is the GAMS system, an on-line cross-
index and virtual rep ository of mathematical software [8]. GAMS p erforms the
function of an interrep ository and interpackage cross-index, collecting and main-
taining data ab out software available from external rep ositories and presenting
it as a homogeneous whole. It also provides the functions of a rep ository itself
i.e., retrieval. However, instead of maintaining the cataloged software itself,
it provides transparent on-demand access to rep ositories managed by others.
GAMS currently contains information on more than 9800 problem-solving
software mo dules from ab out 85 packages found in four physically distributed
software rep ositories three maintained at NIST and Netlib. In addition to most
of the software in the Netlib collection, GAMS cross indexes individual comp o-
nents in large multipurp ose libraries such as IMSL, NAG, PORT, STARPAC 4
and SLATEC, as well as capabilities of statistical analysis systems suchasDAT-
APLOT and SAS. Both public-domain and commercial software is cataloged,
and although source co de of proprietary software pro ducts are not available
through GAMS, related items such as do cumentation and example programs
often are.
All problem-solving software mo dules in GAMS are assigned one or more
problem classi cations from a 736-no de tree-structure d taxonomy of mathe-
matical and statistical problems develop ed as part of the pro ject [9]. Users
can browse through mo dules in any given problem class. To nd an appropri-
ate class, one can utilize the taxonomy as a decision tree, or enter keywords
which are then mapp ed to problem classes. Search lters can b e declared which
allow users to sp ecify preferences such computing precision or programming lan-
guage. In addition, users can browse through all mo dules in a given package,
or all mo dules with a given name. Each mo dule's abstract lists the retrievable
ob jects asso ciated with the mo dule, such as do cumentation, examples, test pro-
grams, source co de and dep endencies. More than 32,000 such ob jects can b e
retrieved.
At the core of the GAMS system is a relational database of information
ab out available software. This database is maintained at NIST, which provides
a classi cation service for the rep ositories it indexes. The GAMS network server
provides this information to network clients using a sp ecialized proto col over
TCP/IP connections. In addition, a gateway to the World Wide Web has b een
develop ed using the Common GatewayInterface CGI facility of NCSA's http d
2
server .
3 Indexing and Searching of Software Ob jects
Cataloguing information for software ob jects serves two purp oses { 1 to supply
material from which a searchable index may b e constructed, and 2 to supply
information needed by the user to select/reject search hits and to obtain and
use selected software. The eld names and de nitions used for cataloguing in
a particular library are describ ed in the data model used by that library.For
example, for do cument libraries, the CSTR pro ject [26] has adopted RFC1357
[14] as its data mo del.
3.1 Data Mo dels
Data mo dels used for do cument digital libraries are not in general suitable for
use by software rep ositories. Although some elds are useful in b oth settings {
e.g., author,title, abstract { software cataloguing requires a numb er of additional
elds. A eld that app ears in most ma jor catalogs is a requirements eld that
lists the hardware, op erating system, and other software needed to use the
2
Accessible at http://gams.nist.gov/ 5
catalogued item. Another imp ortant eld in the case of public domain software
rep ositories such as Netlib, where software is author-supp orted, is a contact
eld giving an electronic mail address to which questions and bug rep orts may
b e sent. Still another eld used by many software rep ositories is a certi cation
eld that tells at what level the software has b een certi ed and p ossibly includes
p ointers to certi cation artifacts such as completed checklists and testing results.
The Reuse Library Interop erability Group RIG has develop ed and ap-
proved the Basic Interop erability Data Mo del BIDM as a standard data mo del
for software rep ository interop erability, and the BIDM has b een submitted for
balloting as an IEEE standard [3]. The BIDM will b e used as a lowest com-
mon denominator for interop eration b etween software rep ositories. However,
b ecause of considerable variation in the purp ose, contents, and application do-
mains of di erent software rep ositories, no single data mo del will b e suitable
for all, and imp ortant cataloguing information will b e lost in exp orting to the
BIDM. An example where such loss o ccurs is the certi cation eld. This eld
was not included in the BIDM b ecause of the wide variety of certi cation and
evaluation metho ds in use at di erent rep ositories. The RIG also encountered
diculty in developing controlled vo cabulary lists, b ecause again di erent sets
of terms were appropriate for di erent rep ositories. The approachnow b eing
taken by the RIG is to de ne a standard for an Extensible Uniform Data Mo del
EUDM, which will b e a meta-mo del a rep ository can use to describ e the data
mo del it is using. For example, using the EUDM, a rep ository will b e able to
de ne its certi cation metho ds and the meanings of di erent certi cation levels.
As a memb er of the RIG, Netlib is participating in development and promotion
of standard data mo dels for software rep ositories.
3.2 Software Classi cation
Anumb er of studies have shown that prop er classi cation of software ob jects
contributes to e ective lo cation of the ob jects by p otential reusers [15,31, 22].
Classi cation is carried out by assigning co des and/or keywords from a classi -
cation scheme or thesaurus. Classi cations and thesauri develop ed for indexing
do cuments, such as the INSPEC classi cation [1] and INSPEC thesaurus [2],
are inadequate for indexing software ob jects. Firstly, these to ols cover a broad
range of topics and cover software-related topics only sup er cially.Thus, they
do not allow the user to discriminate nely enough among the available software
ob jects. Secondly, e ective classi cation of software ob jects requires that the
function of the ob ject b e indexed [10]. Because do cuments are not used as soft-
ware is, terms related to function have not generally b een included in thesauri
develop ed for do cument indexing.
A classi cation scheme that has b een develop ed sp eci cally for use in index-
ing mathematical software is the GAMS hierarchy[9]. The GAMS scheme has
b een widely adopted by network-accessible rep ositories such as Netlib and by
commercial mathematical software libraries suchasNAG and IMSL. A succes- 6
sor to GAMS is currently under development. The new scheme will re ne areas
needing b etter discrimination and will add new categories to encompass recent
developments in numerical algorithms. The new scheme will also b e reorganized
so as to b e less cumb ersome.
An HPCC software thesaurus is under development as part of the NHSE.
This thesaurus uses GAMS categories for mathematical software but de nes
new terms for other areas of high p erformance computing. Some of these terms
are drawn from an existing HPCC glossary [25] and from a b o ok that gives
an overview of parallel computing [21]. Terms from the INSPEC thesaurus [2]
are b eing used for application areas. The thesaurus has a faceted structure
with facets for application area, problem area, function, algorithm, and target
architecture. The thesaurus will b e made available on-line in hyp ertext form to
assist with searching the NHSE, similar to [30]. Hyp ertext links from terms to
scop e and de nition notes will also b e provided.
3.3 SearchInterfaces
3
Netlib currently provides a searchinterface that allows the user to do eld-
sp eci c keyword searching. For example, the user may searchby author, le-
name, abstract keywords, or GAMS classi cation. Search results are returned
as a hyp ertext list of catalog records from which the user may select les to
view or download.
Searching bykeywords or classi cation co des often returns a large number
of search hits, leaving the user to sort through them. Further discrimination
often cannot b e provided byanoverall classi cation scheme, but requires use of
a domain sp eci c knowledge base. Such knowledge bases have b een constructed
for sp ecialized domains, including di erential equations [29,27, 4] and approx-
imation [23]. We are exp erimenting with providing on-line hyp ertext interfaces
to such knowledge bases. For example, wehave provided a hyp ertext version of
4
a decision tree for approximation algorithms .
Wehave also develop ed a prototyp e exp ert help system to assist users in
selecting software within sp eci c domains [7, 8]. An advisory system for a
given problem class helps the user discriminate b etween problem-solving soft-
ware mo dules for that class. For a given problem class, a set of features are
partitioned into a small set of feature classes, and information is enco ded ab out
how each feature applies to each software mo dule. Prototyp e user interfaces
have b een develop ed that allow the user to interact with choice widgets for
the various features. The system provides more sp eci c and e ective help in
selecting software than a domain-indep endent searchinterface.
3
Accessible from a WWW browser at http://www.netlib.org/nse/netlib query.html
4
Accessible from a WWW browser at http://www.netlib.org/a/catalog.html 7
4 Retrieval of Software Ob jects
Once a user has lo cated relevant software ob jects, he needs to b e able to make
use of them. Mo des of use include the following:
1. Downloading, con guring, compiling, and executing a complete program
or package.
2. Downloading routines and combining them with a user-supplied main pro-
gram and other user-supplied routines b efore compiling and executing.
3. Using retrieved source co de as a starting p oint for developing software
with similar functionality.
4. Downloading templates or archetyp es that provide a framework for writing
actual co de.
5. Using a remote execution service { i.e., shipping input data over a network
to a remote execution server and then retrieving the resulting output data.
4.1 Downloading Files
A user maydownload les from Netlib by sending an email request or by click-
ing on the lenames from Xnetlib, GAMS, or WWW interfaces. Netlib les
are organized into directories. Some directories contain a single package, such
as the LAPACK directory, while others contain programs for a particular do-
main, such as the OPTimization directory. Each directory contains an index
le containing catalog records for the les in the directory. Most directories also
contain a readme le giving an overview of the directory. Some directories have
sub directories, for example for the di erentFortran precisions available and for
test and example programs. A user may initially do a keyword search to lo cate
relevant directories and then browse the index les for those directories to lo cate
relevant les whichmay then b e downloaded.
When downloading a routine from Netlib, a user may make use of a dep endency-
checking mechanism that allows retrieval of the entire dep endency tree for that
routine. The user may sp ecify that a subtree b e omitted, however, if those rou-
tines have b een retrieved previously. There is also an automatic tar facility that
builds and returns a tar le of any Netlib directory or sub directory up on user
request. Binary executable les for several Netlib packages are also available {
for example, for the Xnetlib and HeNCE packages.
Users maydownload les from the NHSE via a WWW browser. The NHSE
provides a searchable catalog of HPCC software, as well as a browseable listing
5
by category . Eachentry in the catalog includes either a URL that maybe
clicked on to retrieve the software or more information ab out it. Clicking on
5
Accessible at http://www.netlib.org/nse/sw survey.html 8
these URLs connects the user to the software provider's home site { i.e., the
NHSE provides an interface to a virtual distributed rep ository consisting of a
large numb er of indep endently maintained physical rep ositories. In the near
future, the NHSE will switch from using URLs to using lo cation indep endent
names [11]. Use of lo cation indep endent names will allow les to b e moved
without requires references to them to b e changed and will p ermit transparent
mirroring and reliable cacheing.
Care should always b e exercised when downloading and using les obtained
over a network, esp ecially tar les and executables. Although Netlib is re-
garded by most users as a trusted source, it would b e p ossible for someone to
imp ersonate a Netlib server and make dangerous tar or executable les avail-
able, purp ortedly from Netlib. Because source co de is unlikely to b e examined
closely by the user, delib erately intro duced bugs or other malicious mo di ca-
tions might also slip past in source co de form. In the near future, b oth Netlib
and the NHSE will use public key cryptography to allow users to authenticate
the source of downloaded les.
4.2 Templates and Archetyp es
A template is a description of a general algorithm rather than the executable
ob ject co de or the source co de more commonly found in a conventional software
library [5]. Templates may b e customized by the user { for example, they can
b e con gured for the sp eci c data structure of a problem or for the sp eci c
computing system on which the problem is to run. Templates are written in a
language-indep endent Algol-like structure, which is readily translatable into a
target language suchasFortran or C. A collection of templates fo cusing on iter-
6
ative metho ds for solving large sparse linear systems is available from Netlib .
For each template, the following is provided: a mathematical description of the
ow of the iteration; discussion of convergence and stopping criteria; suggestions
for applying a metho d to sp ecial matrix typ es e.g., banded systems; advice for
tuning for example, which preconditioners are applicable and which are not;
tips on parallel implementations; and hints as to when to use a metho d, and
why.
A program archetyp e is a a program design strategy appropriate for a
restricted class of problems, and b a collection of program designs with c
implementations of exemplar problems in one or more programming languages
and optimized for a collection of target machines. The program design strategy
includes archetyp e-sp eci c information ab out metho ds of deriving a program
from a sp eci cation, metho ds of parallelizing sequential programs, the program
structure, metho ds of reasoning ab out correctness and p erformance, empirical
data on p erformance measurements and tuning for di erent kinds of machines,
and suggestions for test suites. A pro ject at Caltech is exploring the question
6
Accessible at http://www.netlib.org/linalg/html templates/report.html 9
of whether a library of parallel program archetyp es b e used to reduce the e ort
7
required to pro duce correct ecient programs .
4.3 Remote Execution
If a user needs to run a program only infrequently, and if compiling and in-
stalling the program involves considerable overhead, the user may prefer to
take advantage of a remote execution service if one is available. Netlib has ex-
p erimented with making remote execution of the Fortran-to-C converter f2c
and Fortran checking ftnchek programs available. We con gured the Mosaic
WWW browser to invoke the Tcl language interpreter to execute downloaded
les of typ e application/x-safe-tcl. We then made downloaded user interfaces
for the f2c and ftnchek programs available on a Netlib server. Users could then
download the interface mo dules and use them to interact with the remote execu-
tion services. The user interfaces allowed the user to select les to b e transferred
for pro cessing and to set various options. Our work on remote execution is ex-
p erimental at this stage b ecause a safe client execution environment for Tcl has
not yet b een rigorously de ned, although researchers at Sun Microsystems are
working on it.
Software for using ORNL's GRAIL and GENQUEST remote execution ser-
vices for doing DNA sequence analysis, gene assembly, and sequence comparison
8
is available through the NHSE . These services cannot b e used from a WWW
browser, but require downloading sp ecialized X-Windows client software. In the
future, the NHSE plans to supp ort use of such services from a WWW browser
by means of a safe execution environment for downloadable Safe-Tcl co de, so
that the client mo dule may b e executed directly from the browser.
Another p ossibility for remote execution is to allow users to upload exe-
cutable co de to a Netlib server and run it there. For example, the user might
want to send an agent that would sift through and summarize computer p er-
formance data residing on Netlib. A search service such as Harvest might send
an agent that would summarize the contents of Netlib and stream the summary
back to an indexing engine. We are investigating the provision of this kind of
user-directed remote execution using the Plan 9 op erating system develop ed at
AT&T Bell Labs.
4.4 Change Noti cation
Some digital do cument libraries have a noti cation service that informs sub-
scrib ers of newly available do cuments. The noti cation service for a software
rep ository is somewhat di erent, b ecause it informs subscrib ers of changes and
bug xes to the software as much or more than of additions of new software.
7
More information is available at
http://www.etext.caltech.edu/Papers2/ArchetypeOverview/ArchPaper.html
8
More information is available at http://avalon.epm.ornl.gov/ 10
In the early days of the Netlib rep ository, when all access was by email and the
trac was mostly from professional numerical analysts, we relied on log les
to send out noti cation of imp ortant bug xes to everyone who had retrieved
a ected les. Now, b ecause access is more anonymous and a wider sp ectrum of
users are involved, the old scheme has b een replaced by explicit subscription.
People may indicate interest in sp eci c Netlib directories, using subscribe and
unsubscribe commands. Automatic noti cation is sent, on a daily basis, when
les in the directory are changed. The subscrib er lists also give the authors
and editors a way to judge what community is particularly interested in a given
Netlib collection.
5 Access to Scienti c Data
The Netlib Performance Database provides an on-line catalog of public-domain
computer b enchmarks such as the Linpack Benchmark, Perfect Benchmarks,
and the Genesis Benchmarks [6]. A b enchmark co de is a program designed to
b e run on an architecture so as to pro duce a relative measure of its execution.
Benchmarks tend to evolve from individual applications that do not necessarily
stress all features of a given architecture. Thus, b enchmark numb ers do not
imply general machine p erformance but instead describ e the p erformance of a
machine on an algorithm or application class.
Although b enchmarking has b ecome very p opular b ecause of the diversity
and comp etition in the computer hardware business, there was, previous to
development of our database, no central rep ository for b enchmark data. The
9
WWW interface to our Performance Database allows the user to
view complete b enchmark rep orts that disply sorted data from various
published b enchmark rep orts,
browse the p erformance data tree by selecting the b enchmark and ma-
chines ab out which information is desired,
search the p erformance database
There are also p ointers to b enchmark pap ers and other b enchmark and p erformance-
related literature.
Various archives of scienti c data are accessible from the NHSE { for ex-
10 11
ample, NASA's Planetary Data System and Astrophysics Data System ,
12
NIST's Atomic Sp ectroscopic Database , and NOAA's Environmental Data
13
Centers . There is no uniform cataloguing metho d or searchinterface for these
9
Accessible at http://performance.netlib.org/performance/html/PDStop.html
10
Accessible at http://stardust.jpl.nasa.gov/pds home.html
11
Accessible at http://adswww.harvard.edu
12
Accessible at http://aeldata.phy.nist.gov/nist beta.html
13
Accessible at http://www.esdim.noaa.gov/ 11
databases, nor a standard way of describing the contents and services o ered.
Thus, the user has no way of systematically discovering relevant databases and
must learn a di erentinterface for each one.
6 Integration with Do cument Digital Libraries
The Corp oration for National Research Initiatives CNRI is working with ve
ma jor universities CMU, Cornell, UC-Berkeley, Stanford, and MIT on an
ARPA-sp onsored pro ject to develop concepts for digital libraries. As part of
this pro ject, each university is placing its Computer Science Technical Rep orts
on-line and providing access to the distributed CSTR collection. Technologies
develop ed for the CSTR pro ject include the Dienst distributed search system
[18] and a Handle Management Service for assigning, maintaining, and using
unique identi ers for digital library ob jects [16].
The basic architecture b eing develop ed by CNRI for distributed digital li-
braries includes the following concepts [26]:
A digital object which consists of a sequence of bits plus a unique identi er
known as a hand le the binding b etween the handle and the sequence of
bits maychange over time.
Naming authorities who are resp onsible for assigning unique identi ers
within their p ortions of the handle namespace.
Repositories from which digital ob jects are available.
Information and Reference IR servers that provide reference information
ab out collections of digital ob jects.
The CNRI work is closely related to the IETF Uniform Resource Identi er
URI Working Group's work on Uniform Resource Names URNs [32] and
Uniform Resource Citations [17]. CNRI's handle is the equivalent of IETF's
URN, and CNRI's IR server serves a similar function to IETF's URC server.
The Netlib and NHSE Development Group has b een engaged in a parallel
e ort to implement a lo cation-indep endent naming architecture [11]. We provide
for twotyp es of lo cation-indep endent names:
a Uniform Resource Name URN, for which the contents it refers to may
change { e.g., the \currentversion of LAPACK".
a Lo cation Indep endent File Name LIFN, for which the binding b etween
the name and the byte contents of the le it refers to is xed, once assigned.
This typ e of name is needed for unambiguous references when attaching
critical reviews or rep orting scienti c results obtained using a particular
version of a piece of software. LIFNs also p ermit reliable and ecient
cacheing and mirroring of les. 12
Atany given time, a URN is asso ciated with exactly one LIFN. By lo oking
up the LIFN asso ciated with a URN and then retrieving the le corresp onding
to that LIFN, the user is assured of retrieving the most recent copy,even if
some mirrored copies are out-of-date. Thus, we obtain consistency of replicated
copies without the overhead of a replica control proto col.
We are also developing a URC server that provides supp ort for the following:
Provision by the publisher of attribute-value pairs for a given URN in the
form of cryptographically signed assertions.
Retrieval and authentication of assertions by users.
Sp eci cation of the data mo del used for a particular URN.
Choice of encryption algorithm, including none.
We prop ose to integrate our software rep ository naming architecture with
CNRI's digital library architecture in the following manner:
1. Both URNs and LIFNs will b e expressible as handles, and URN and LIFN
lo okup will b e merged with the Handle Management Service.
2. Our URC server will b e an implementation of CNRI's IR server that may
b e used for cataloguing general Web resources, including software and
data archives.
3. Similar to the Dienst proto col for do cument rep ositories, we will develop
service sp eci cations and retrieval proto cols appropriate for software and
data rep ositories. In addition, similar to the Digital Library Do cument
Architecture that de nes requirements for digital do cument structure [33],
we will de ne requirements for software and data archive structures.
References
[1] INSPEC Classi cation. Institution of Electrical Engineers, 1992.
[2] INSPEC Thesaurus. Institution of Electrical Engineers, 1993.
[3] Standard reuse library Basic Data Interop erability Mo del BIDM. Tech-
nical Rep ort RPS-0001, Reuse Library Interop erability Group, 1993.
[4] C. A. Addison, W. H. Enright, P. W. Ga ney, I. Gladwell, and P.M.
Hanson. Algorithm 687: A decision tree for the numerical solution of
initial value ordinary di erential equations. ACM Trans. Math. Softw.,
171:1{11, Mar. 1991. 13
[5] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Dunato, J. Dongarra,
V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the
Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM,
1994.
[6] M. W. Berry, J. J. Dongarra, B. H. Larose, and T. Letsche. PDS: A
p erformance database server. Scienti c Computing, 32:147{156, 1994.
[7] R. F. Boisvert. Toward an intelligent system for mathematical software
selection. In P. W. Ga ney and E. N. Houstis, editors, Programming En-
vironments for High-Level Scienti c Problem Solving, pages 79{92. North-
Holland, Amsterdam, 1992.
[8] R. F. Boisvert. The architecture of an intelligent virtual mathematical
software rep ository system. Math. & Comp. in Simul., 36:269{279, 1994.
[9] R. F. Boisvert, S. E. Howe, and D. K. Kahaner. The Guide to Avail-
able Mathematical Software problem classi cation system. Comm. Stat. {
Simul. Comp., 204:811{842, 1991.
[10] E. J. Breton. Indexing for invention. Journal of the American Society for
Information Science, 423:173{177, 1991.
[11] S. Browne, J. Dongarra, S. Green, K. Mo ore, T. Pepin, T. Rowan, and
R. Wade. Lo cation-indep endent naming for virtual distributed software
rep ositories. In ACM-SIGSOFT Symposium on SoftwareReusability, Apr.
1995. to app ear.
[12] S. Browne, J. Dongarra, S. Green, K. Mo ore, T. Rowan, and R. Wade.
Netlib services and resources. Technical Rep ort UT-CS-94-222, University
of Tennessee Computer Science Department, Feb. 1994.
[13] S. Browne, J. Dongarra, S. Green, K. Mo ore, T. Rowan, R. Wade, G. Fox,
K. Hawick, K. Kennedy,J.Po ol, and R. Stevens. The national HPCC
software exchange. IEEE Computational Science and Engineering, 1995.
to app ear.
[14] D. Cohen. A format for e-mailing bibliographic records. Internet RFC
1357, July 1992.
[15] P. Constantop oulos and E. Pataki. A browser for software reuse. In
Advanced Information Systems Engineering, 4th International Conference
CAiSE '92, Porceedings, pages 304{326. Springer-Verlag, Berlin, Germany,
May 12{15 1992.
[16] Corp oration for National Research Initiatives. Interfacing to the handle
management system. Accessible at
http://www.cnri.reston.va.us/home/cstr/handle-intro.html, 1994. 14
[17] R. Daniel Jr. and M. Mealling. URC scenarios and requirements. Internet
Draft draft-ietf-uri-urc-req-00.txt, Nov. 1994.
[18] J. R. Davis and C. Lagoze. Dienst, a proto col for a distributed
digital do cument library. Internet Draft accessible at http://cs-
tr.cs.cornell.edu/Info/dienst proto col.html, July 1994.
[19] J. Dongarra, T. Rowan, and R. Wade. Software distribution using
XNETLIB. ACM Trans. Math. Softw., 211, 1995 1995.
[20] J. J. Dongarra and E. Grosse. Distribution of mathematical software via
electronic mail. Commun. ACM, 305:403{407, May 1987.
[21] G. Fox, R. D. Williams, and P. Messina. Paral lel Computing Works. Mor-
gan Kaufmann, 1994.
[22] W. B. Frakes and T. P.Pole. An empirical study of representation metho ds
for reusable software comp onents. IEEE Trans. on Software Engineering,
208:617{630, Aug. 1994.
[23] E. Grosse. A catalogue of algorithms for approximation. In J. Mason and
M. Cox, editors, Algorithms for Approximation II, pages 479{514 of 514,
London, England, 1990. Chapman and Halll.
[24] E. Grosse. Rep ository mirroring. ACM Trans. Math. Softw., 211, Mar.
1995.
[25] K. A. Hawick. High Performance Computing and Communications glos-
sary.Technical rep ort, Northeast Parallel Architectures Center at Syracuse
University, July 1994.
[26] R. Kahn and R. Wilensky. Lo cating electronic library services and ob jects:
A frame of reference for the CS-TR pro ject. Draft for discussion purp oses,
Feb. 1994.
[27] M. S. Kamel, K. S. Ma, and W. H. Enright. ODEXPERT: An exp ert
system to select numerical solvers for initial value ODE systems. ACM
Trans. Math. Softw., 191:44{62, Mar. 1993.
[28] R. R. Larson. Design and developmentofanetwork-based electronic library.
In Proc. ASIS Mid-Year Meeting, pages 95{114, Portland, Oregon, May
1994.
[29] M. Lucks and I. Gladwell. Automated selection of mathematical software.
ACM Trans. Math. Softw., 181:11{54, Mar. 1992.
[30] R. Pollard. A hyp ertext-based thesaurus as a sub ject browsing aid for bib-
liographic databases. Information Processing and Management, 293:345{
357, 1993. 15
[31] R. Prieto-Diaz. Implementing faceted classi cation for software reuse.
Commun. ACM, 345:88{97, May 1991.
[32] K. Sollins and L. Masinter. Functional requirements for uniform resource
names. Internet RFC 1737, Dec. 1994.
[33] W. Turner. The do cument architecture for the Cornell Digital Library.
Internet RFC 1691, Aug. 1994. 16