1 2
NHSE
Netlib, NHSE and other Sources
National HPCC Software Exchange
NASA funded CRPC pro ject
Jack Dongarra
Center for ResearchonParallel Computation CRPC
{ Argonne National Lab oratory
Computer Science Department
{ California Institute of Technology
UniversityofTennessee
{ Rice University
{ Syracuse University
and
{ UniversityofTennessee
Mathematical Sciences Section
Uniform interface to distributed HPCC software rep os-
Oak Ridge National Lab oratory
itories
Facilitation of cross-agency and interdisciplinary soft-
ware reuse
http://w ww .ne tl ib. org /ut k/p e opl e/J ackDo ngarra.html
Material from ASTA, HPCS, and I ITA comp onents of
the HPCC program
http://www.netlib.org/nhse/
3 4
Goals:
NHSE Comp onents
Outreach and technology transition
{To the HPCC user community and industry
To facilitate an active exchange program for HPCC
software and enabling technologies via the National In-
Distribution via the WWW
formation Infrastructure.
Discipline oriented rep ositories
Uniform user interface
Software review
To promote contributions and use by Grand Challenge
teams, as well as other memb ers of the high p erfor-
{ Simple submission
mance computing community.
{ On going review
Measurement
Hyp ertext road map
Software here includes algorithms, sp eci cations, designs,
do cumentations, rep orts...
Publishing to ols
{ Rep ository-in-a-b ox
Naming & authentication architecture
Selective capitalization of emerging technologies
5 6
Physical Physical Repository Physical Repository 2 Repository Bene ts 1
n
1. Faster development of high-quality software so that sci-
entists can sp end less time writing and debugging pro-
catalog catalog h problems.
info info grams and more time on researc
ware development e ort by shar- catalog 2. Less duplication of soft
info file
ware mo dules.
request ing of soft
3. Less time and e ort sp ent in lo cating relevant software
and information through the use of appropriate index-
h mechanisms and domain-sp eci c exp ert retrieved ing and searc
file help systems.
NHSE
verload through the use of lters
Search/Browse 4. Reducing information o
Interface
and automatic search mechanisms. search results
search request
Virtual Repository Architecture
7 8
NHSE
Based on Existing Technologies
{ WWW Browser Mosaic / Netscap e / etc
Intended Audience
Distributed / Scalable
HPCC application and computer science community
URL: http://www.netlib.org/nhse/
{ Source of material for NHSE
{ Netlib
Rep ository for math software since 1985
Users of NASA, NSF, DOE and other sup ercomputer
centers
Rep ositories Currently Available
{ Go o d targets for NHSE
{ Netlib, Softlib, CITlib
{ Natural supp ort organization: sup ercomputer cen-
{ ASSET - Asset Source for SW Engineering Tech.
ter sta
{ CARDS - Comprehensive Approachto
Other users of high p erformance computers
Reusable Defense SW
{ ELSA - Electronic Library Services and Appl.
{ Current and p otential industrial users
{ GAMS Virtual Software Rep ository
{ No natural supp ort organization
{STARS - SW Technology for Adaptable,
Applicable to other domains
Reliable Systems
{ Many examples related to GC problems
Currently Available Information
{ NHSE currently p oints to 200+ mo dules
software catalog
tech rep orts and pap ers
parallel pro cessing to ols
libraries of reuseable comp onents
Grand Challenge prototyp e co des
9 10
Netlib - Network Access to
Mathematical Software and Data
Began in 1985
{ JD and Eric Grosse, AT&T Bell Labs
Motivated by the need for cost-e ective, timely dis-
Requests Made to the Netlib Repositories
y mathematical software to the
tribution of high-qualit at the Univ. of Tennessee & ORNL
community.
8,345,418 total requests to these repositories as of Dec 20, 1995
Designed to send, by return electronic mail, requested items.
4,959,282
Automatic mechanism for the disseminate of public do-
ware.
main soft 4000
{ Still in use and growing
3000
{ Mirrored at a numb er of sites
netlib2.cs.utk.edu 2000
Requests x 1,000
netlib1.epm.ornl.gov
research.att.com
1000
netlib.no
unix.hensa.ac.uk 0
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
ftp.zip-b erlin.de Year
HTTP Gopher FTP XNetlib EMail
Data as of - Dec 20, 1995 at 02:21:52
11 12
wing typ es of software are b eing made
The follo
vailable: Breakdown of requests to each Netlib library a
(an alphabetical listing is also available.)
Systems software and software to ols. Data as of 12/01/95 at 02:42:11 Library Name Number of accesses
lapack 469,634 { compilers
pvm3 375,582
{ message-passing communication subsystems linpack 254,992
slatec 245,724 { parallel monitors and debuggers. blas 176,606
clapack 128,081
Basic building blo cks for accomplishing common com-
linalg 126,834 unication tasks. eispack 125,404 putational and comm
slatec/src 117,374
ks are meant to b e used by Grand Chal- toms 115,825 { Building blo c
f2c 96,255 lenge teams
c++ 96,182
Research co des that have b een develop ed to solve di- benchmark 85,217 master 69,587 f2c/src 65,913 cult computational problems.
minpack 60,158
yhave b een develop ed to solve sp eci c problems fn 59,266 { Man
fftpack 58,083
{ Serve as pro ofs of concept
na-digest 50,750 eloping general-purp ose reusable soft- port 48,820 { Mo dels for dev
slatec/lin 46,232
ware hence 45,103 confdb 42,631 c++/answerbook 37,298 slatec/chk 37,203
13 14
NHSE Software Catalog
Review Pro cedure
Benchmark and example programs 4
Treat the review and submission as indep endent pro-
Data analysis and visualization 22
cesses.
Numerical libraries and routines 57
Submissions are \sub ject" to ongoing review.
{ Computational geometry 7
Review status abstract for each submission.
{ Linear algebra 18
{ Based on author comments
{ Optimization 4
{Package do cumentation
{Partial di erential equations 3
{ Our indep endent reviewer testing
{ Other 25
{ Comments from users.
Parallel pro cessing to ols
{ Communication libraries 25
{ Execution and p erformance analyzers 31
{Parallel I/O systems 5
{Parallel programming environments 12
{Parallel programming languages and compilers 26
{Parallel runtime systems 10
{ Source co de analyzers and restructurers 7
{ Miscellaneous 16
Scienti c and engineering applications 66
15 16
NHSE Software Submission
Technical and Political Issues
Goals
How will the naive user nd the right software?
Ensure xity of publication
le ngerprints, unique name
{ Answer: Via the NHSE search/browser interface
and the Road Map
Prevent imp ersonation and unauthorized changes
How will authentication, integrity, and version control
digital signatures
b e implemented?
Exercise quality control
{ Answer: By a publishing system that includes unique
review classi cation
naming & digital signatures.
Promote interop erability
Will the NHSE supp ort distribution of software that is
RIG Basic Interop erability Data Mo del
not free or cannot b e freely distributed?
{ Answer: Yes, but...
only provide classi cation, review, and access
use of encryption and separate key distribution
NHSE will not have an accounting department
Will the NHSE b e resp onsible for supp ort of software?
{ Answer: NO!
Any supp ort will be by author or appropriate
agent
17 18
User Pro les
Currently pro cessed manually
Aliations, Collab orations, and Intimacies
{ Click on Submit: User Pro le on NHSE home page
{ Fill in email address, software needs, information
needs
3
W3 Consortium aliate member W C
{ NHSE Librarian sends you search results and keeps
you p osted on future items of interest enditemiza
Reuse Library Interop erablity Group RIG Member
{Future - automate pro cessing
Active within IETF
Handle larger volume more quickly
Interaction with Corp oration for National Research Ini-
Intelligent information agents
tiatives CNRI
Semantic ltering
19 20
Road Map
Di erent disciplines will maintain their own soft-
ware rep ositories
Structured Knowledge Base on Software Comp onents
Users should not need to access each of these rep osito-
{ Similar to a hyp ertext encyclop edia
ries separately
{ Assembled with the help of panels of exp erts
NHSE will provide a uniform interface to a virtual HPCC
various-p ersp ective catalog
software rep ository which will b e built on top of the dis-
{ Glossaries tributed set of discipline-oriented rep ositories.
{ Algorithms
The interface will assist the user in lo cating relevant
{ Applications categories resources and in retrieving these resources.
{ Enabling software
A combined browse/searchinterface will allow the user
to explore the various HPCC areas and b ecome familiar { Benchmarks
with the available resources.
{ Hardware
A longer term goal of the NHSE is to provide users with { Education and training
domain-sp eci c exp ert help in lo cating and understand-
{ Links to other sites
ing relevant resources.
Prototyp e Currently Available
{Topic: available applications co des and their comp o-
nent HPCC technologies
21 22
Research Issues Needs for Browsing and Searching
Rep ository in a Box CurrentWeb interfaces are dicult and frustrating for
the user who is attempting to lo cate sp eci c informa-
{Everything required for a site to setup a rep ository
tion.
and get connected to the NHSE
Browsing by following hyp ertext links is slow and can
publishing to ols
b e disorienting.
ftp, http, gopher, email servers
Keyword searching su ers from the vo cabulary mis-
logging and statistical to ols
match problem and is unsuitable for users with impre-
integration with replication service
cise information and software needs.
guidance and supp ort
LIFN
{ assigned to a particular sequence of bytes,
{ once the assignment has b een made, the same LIFN
cannot subsequently b e used to name any other se-
quence of bytes.
Indexing and Searching
{ Currently limited to syntatic keyword matching
{ Harvest system provides a set of customizable to ols
23 24
Capitalization of Emerging Technologies
Combined Searching and Browsing Interface
Three Levels of Software Development
Augment a users mental p erception and knowledge of
Pasadena I Workshop
sp eci c domains,
{ Research prototyp e
Improve a users understanding of the problem to be
{ Advanced development prototyp e
solved, and
{ Commercial pro duct
Form successively more fo cused and b etter articulated
queries.
Fo cus:
The interface will be in the form of a thesaurus-based
{Move from researchto advanced development pro-
roadmap.
totyp e
Strategy
{ Convene outside review panel
{ Select as many pro jects as budget will p ermit
{ Imp ose requirements for inclusion at Level 2, Re-
viewed Level
25 26
HPCC Thesaurus HPCC Cataloging
The NHSE will de ne the top levels of an HPCC the- To enable searching, cataloging information must be
saurus, drawing on an existing HPCC glossary and on made available for NHSE assets.
the current NHSE contents to generate thesaurus terms.
Eachphysical rep ository will b e resp onsible for main-
Sub ject area sp ecialists will b e called up on to re ne the taining a network-accessible le containing such cata-
lower levels. loging information.
The thesaurus, along with a high-level classi cation scheme, These les will b e retrieved and indexed by an NHSE
will form the basis of a hyp ertext roadmap. indexer on a regular basis, and the resulting searchable
index will b e replicated for reliability.
The roadmap will include scop e notes and annotations
to familiarize users with various HPCC areas and will The NHSE will use the Harvest system from University
serve as a springb oard for thesaurus-assisted searches. of Colorado to do the collection, indexing, and index
replication.
27 28
Naming and Authentication
Quality Control
The main advantage of distributing the rep ository is to
Users of the NHSE need to have con dence that the
allow the software to b e maintained by those in the b est
software they obtain is high-quality and well-tested.
p osition to keep it up-to-date.
Quality control will be imp ossible to automate com-
Copies of p opular software packages may b e transpar-
pletely b ecause it requires human judgement and eval-
ently mirrored to increase availability, improve resp onse
uation.
time, and prevent b ottlenecks.
To ols and pro cedures need to b e develop ed to facilitate
Research Issues
the testing and classi cation or lab eling of software with
Assignment of unique identi ers to retrievable assets
resp ect to its quality and reliability.
Collection and merging of cataloging information, so
Research is needed to determine what quality control
that, veri cation of the authenticity, integrity of re-
information is most useful and can reasonably be ob-
trieved assets, and tracking can take place
tained, how to acquire this information, and in what
NHSE's approach to these issues
format it should b e stored for easy access.
Implement a lo cation-indep endent naming architecture
Academic journal mo del
that unambiguously asso ciates a unique name with the
byte contents of a published asset and that includes
mechanisms for authentication and integritychecking.
Publishing to ols will be made available to assist pub-
lishers with naming and cryptographically signing pub-
lished assets, and with exp orting asset descriptions to
an NHSE search service.
Authentication of assets will be handled by an asym-
metric public-private key encryption system.
29 30
Interop eration Measurement
The NHSE will catalog and provide access to software Keep statistics on downloads from NHSE pages
and software-related artifacts from all the HPCC soft-
{ Including userid of request source
ware rep ositories.
not always p ossible
Assets accessible from other existing software rep osi-
invasion of privacy
tories, such as ASSET, CARDS, DSRS, and ESLA, to
name just a few, may also b e of interest to NHSE users.
Statistical Survey of Unreviewed and Partially Reviewed
Software
The NHSE will b e participating in a small-scale inter-
op erability exp eriment with the ab ove rep ositories to
{ Identi cation of candidates for full review
help de ne requirements for further interop eration ef-
{ Email inquiries to determine usage pattern
forts.
Systematic Survey of Reviewed Software
The NHSE will also b e working with the Reuse Library
Interop erability Group RIG on establishing standards
{ Application
for unique naming, asset description and classi cation,
{ Usage pattern
and asset evaluation.
{ Industry versus academia
In the future, the NHSE will interop erate with these
other rep ositories so that software from them maybe
retrieved directly from the NHSE interface.
31 32
Software Survey
Reliable File Mirrowing
200+ software items from NHSE collection whichhave
Current mirroring systems inadequate
b een abstracted and keyword indexed
Replication service to provide:
testb ed for search strategies
{Fast selective le transfer
software review candidates
{ Indep endent of machine architecture and op erating
system dep endenices
Integrated with lo cation lo okup mechanisms
33 34
NHSE at ANL
Mo dular web rob ot
Parallel web indexing engine
DNS/geographical mapping
Autonomous agents for collecting information on the
web
35 36
Overall Strategy for the NHSE:
Naming of Distributed Resources
E ectiveness of the NHSE will dep end on discipline-
oriented groups and Grand Challenge teams having own-
ership of the discipline-oriented software rep ositories.
Need single name to refer to an ob ject
The information and software residing in these rep osi-
Mayhavemultiple lo cations
tories will b e b est maintained and kept up-to-date by
the individual disciplines, rather than by centralized
Maychange over time
administration.
2 kinds
Central administration will be used instead to handle
{ urn:netlib:Welcome Dynamic
interop eration and meet common needs.
{ lifn:netlib:Welcome_940328 Fixed
Although the various disciplines will haveownership of
Fixed reference to a published ob ject
the rep ositories, they should not b e exp ected to develop
the software and to ols for building, managing, and in-
terfacing to their rep ositories.
Much useful information retrieval IR software is cur-
rently available, b oth in the form of client and server
programs e.g., http servers and WWW browsers, as
and this software should b e incorp orated into the NHSE.
37 38
Summary Summary continued
Initial implementation built on existing technologies Measurement and evaluation
{ WWW { Statistics for unreviewed and partially reviewed
Distributed {Evaluations for reviewed
Scalable
Outreach and technology transition
{ Netlib, etc
{ Educational activities aimed at the user community
{ Rapid deployment
{Fostering technology developmentby industry
Multilevel review and classi cation scheme
Capitalization of emerging technologies
{ Unreviewed
{Foster transition from research to advanced devel-
{Partially reviewed
opment prototyp e
{ Reviewed
Working on standardization within WWW community
Road Map
{ memb er of RIG, IETF, IESG, WWW consort
39
Some Url's Related to Scienti c Computing
National HPCC Software Exchange
http://www.netlib.org/nhse/
NHSE Software Catalog
http://www.netlib.org/nhse/sw
catalog/index.html
Netlib Rep ository
http://www.netlib.org/
Computational Science Education
http://www.netlib.org/nhse/cse
edu.html
Bo oks, Course Materials, and Tutorials
http://www.netlib.org/nhse/cse
edu.htmlb o oks
CS267 - Applications of Parallel Computers
U.C. Berkeley CS267: Spring 1995 Jim Demmel
http://www.icsi.b erkeley.edu/cs267/
18.337 Parallel Scienti c Computing
MIT Spring, 1995 Alan Edelman
http://web.mit.edu/18.337/WWW/home.html
North Carolina State University
Visualizations in Materials Science
http://vims.ncsu.edu/cgi/index.acgi
Computational Science Education Pro ject
http://csep1.phy.ornl.gov/csep.html