Digital Software and Data Repositories for Support of Scienti C
Total Page:16
File Type:pdf, Size:1020Kb
Digital Software and Data Rep ositories for Supp ort of Scienti c Computing Ronald Boisvert National Institute of Standards and Technology y Shirley Browne UniversityofTennessee Jack Dongarra UniversityofTennessee and Oak Ridge National Lab Eric Grosse AT&T Bell Lab oratories Abstract This pap er discusses the sp ecial characteristics and needs of software rep ositories and describ es how these needs have b een met by some exist- ing rep ositories. These rep ositories include Netlib, the National HPCC Software Exchange, and the GAMS Virtual Rep ository.We also describ e some systems that provide on-line access to various typ es of scienti c data. Finally,we outline a prop osal for integrating software and data rep osito- ries into the world of digital do cument libraries, in particular CNRI's ARPA-sp onsored Digital Library pro ject. The work describ ed in this pap er was sp onsored by NASA under Grant No. NAG5- 2736, by the National Science Foundation under Grant No. ASC-9103853, and byAT&T Bell Lab oratories. y Author to whom corresp ondence should b e directed. 107 Ayres Hall, Computer Sci- ence Department, UniversityofTennessee, Knoxville, TN 37996-1301, 615 974-5886, [email protected] 1 1 Intro duction Most work on digital libraries has fo cused on storage, retrieval, and display of digital forms of do cuments. A numb er of on-line software rep ositories have b een develop ed that provide access to software and software artifacts. These do cument and software e orts have b een mostly indep endent, with little atten- tion paid to integrating the twotyp es of libraries and to developing common principles for organization and op eration. This pap er discusses the sp ecial characteristics and needs of software rep os- itories and describ es how these needs have b een met by some existing rep osi- tories. These rep ositories include Netlib [20, 12], the National HPCC Software Exchange [13], and the GAMS Virtual Rep ository [8]. We also describ e some systems that provide on-line access to various typ es of scienti c data. Finally, we outline a prop osal for integrating software and data rep ositories into the world of digital do cument libraries, in particular CNRI's ARPA-sp onsored Dig- ital Library pro ject [28, 26]. 2 Characteristics of Some Existing Software Rep ositories 2.1 Netlib Netlib b egan services in 1985 to ll a need for cost-e ective, timely distribution of high-quality mathematical software to the research community. Some of the libraries Netlib distributes { such as EISPACK, LINPACK, FFTPACK, and LAPACK{have long b een used as imp ortant to ols in scienti c computation and are widely recognized to b e of high quality. The Netlib collection also includes a large numb er of newer, less well-established co des. Most of the software is written in Fortran, but programs in other languages, such as C and C++, are also available. Netlib sends, by return electronic mail, requested routines together with subsidiary routines and any requested do cuments or test programs supplied by the software authors [20]. Xnetlib, an interactive to ol for software and do cument distribution [19], use an X Windowinterface and TCP/IP connections to allow users to receive replies to their requests within a matter of seconds. The interface provides a numb er of mo des and searching mechanisms to facilitate searching through a large distributed collection of software and do cuments. World Wide Web browsers such as Mosaic and Netscap e can also b e used to access Netlib 1 via HTTP and FTP . Although the original fo cus of the Netlib rep ository was on mathematical software, the collection has grown to include other software such as networking 1 Netlib is accessible from a WWW browser at http://www.netlib.org/ 2 to ols and to ols for visualization of multipro cessor p erformance data, technical rep orts and pap ers, a Whitepages Database, b enchmark p erformance data, and information ab out conferences and meetings. The numb er of Netlib servers has grown from the original two, at Oak Ridge National Lab oratory initially at Argonne National Lab oratory and Bell Labs, to servers in Norway, the United Kingdom, Germany, Australia, Japan, and Taiwan. A mirroring mechanism keeps the rep ository contents at the di erent sites consistent on a daily ba- sis, as well as automatically picking up new material from distributed editorial sites[24]. Netlib di ers from other publicly available software distribution systems, suchasArchie, in that the collection is mo derated by an editorial b oard and the software contained in it is widely recognized to b e of high quality.However, the Netlib rep ository is not intended to replace commercial software. Commer- cial software companies provide value-added services in the form of supp ort. Although the Netlib collection is mo derated, its software comes with no guar- antee of reliability or supp ort. Rather, the lack of bureaucratic, legal, and nancial imp ediments encourages researchers to submit their co des by ensuring that their work will b e made available quickly to a wide audience. 2.2 The National HPCC Software Exchange NHSE The National HPCC Software Exchange NHSE is an Internet-accessible re- source that will facilitate the exchange of software and information among research and computational scientists involved with High Performance Com- puting and Communications HPCC [13]. The purp ose of the NHSE is to promote the development of discipline-oriented software and do cument rep osi- tories and of contributions to and use of such rep ositories by Grand Challenge teams, as well as by other memb ers of the high p erformance computing com- munity. The target audiences for the NHSE include HPCC application and computer scientists, users of government sup ercomputer centers, and p otential industrial users. A prototyp e of the NHSE is accessible from a WWW browser at http://www.netlib.org/nse/. The scop e of the NHSE is software and software-related artifacts pro duced by and for the HPCC Program. Software-related artifacts include algorithms, sp eci cations, designs, and software do cumentation. The following typ es of software are to b e made available: Systems software and software to ols. This category includes parallel pro- cessing to ols such as parallel compilers, message-passing communication subsystems, and parallel monitors and debuggers. Data analysis and visualization to ols. Basic building blo cks for accomplishing common computational and com- munication tasks. These building blo cks will b e of high quality and trans- 3 p ortable across platforms. Building blo cks are meant to b e used by Grand Challenge teams and other researchers in implementing programs to solve computational problems. Use of high-quality transp ortable comp onents will sp eed implementation, as well as increase the reliability of computed results. Research co des that have b een develop ed to solve dicult computational problems. Many of these co des will have b een develop ed to solve sp eci c problems and thus will not b e reusable as is. Rather, they will serveas pro ofs of concept and as mo dels for developing general-purp ose reusable software for solving broader classes of problems. The development of this reusable software is exp ected to b e undertaken by commercial companies, rather than by academic researchers. A catalog of the software currently available from the NHSE is accessible at http://www.netlib.org/nse/sw survey.html. Although the di erent disciplines will maintain their own software rep os- itories, users should not need to access each of these rep ositories separately. Rather, the NHSE will provide a uniform interface to a virtual HPCC software rep ository which will b e built on top of the distributed set of discipline-oriented rep ositories. The interface will assist the user in lo cating relevant resources and in retrieving these resources. A combined browse/searchinterface will al- low the user to explore the various HPCC areas and b ecome familiar with the available resources. A longer term goal of the NHSE is to provide users with domain-sp eci c exp ert help in lo cating and understanding relevant resources. 2.3 GAMS Virtual Rep ository The Guide to Available Mathematical Software GAMS pro ject of the National Institute of Standards and Technology NIST studies techniques to provide scientists and engineers with improved access to reusable computer software comp onents available to them for use in mathematical mo deling and statistical analysis. One of the pro ducts of this work is the GAMS system, an on-line cross- index and virtual rep ository of mathematical software [8]. GAMS p erforms the function of an interrep ository and interpackage cross-index, collecting and main- taining data ab out software available from external rep ositories and presenting it as a homogeneous whole. It also provides the functions of a rep ository itself i.e., retrieval. However, instead of maintaining the cataloged software itself, it provides transparent on-demand access to rep ositories managed by others. GAMS currently contains information on more than 9800 problem-solving software mo dules from ab out 85 packages found in four physically distributed software rep ositories three maintained at NIST and Netlib. In addition to most of the software in the Netlib collection, GAMS cross indexes individual comp o- nents in large multipurp ose libraries such as IMSL, NAG, PORT, STARPAC 4 and SLATEC, as well as capabilities of statistical analysis systems suchasDAT- APLOT and SAS. Both public-domain and commercial software is cataloged, and although source co de of proprietary software pro ducts are not available through GAMS, related items such as do cumentation and example programs often are.