Prosp ero Pro c. INET '93 Neuman & Augart
Prosp ero: A Base for Building Information Infrastructure
B. Cli ord Neuman Steven Seger Augart
Information Sciences Institute
University of Southern California
Abstract I I. The Four Functions
The recent introduction of new network infor-
The services of existing Internet information
mation services has brought with it the need for
retrieval to ols can b e broken into four functions:
an information architecture to integrate informa-
storage, access, search, and organization.
tion from diverse sources. This paper describes
The storage function is the maintenance of the
how Prosperoprovides a framework within which
data that may subsequently b e provided to and
such services can be interconnected. The func-
interpreted by remote applications. File systems
tions of several existing information storage and
supp ort storage, providing a rep ository where
retrieval tools are described and we show how they
data may b e stored and subsequently retrieved.
t the framework. Prospero has been used since
Do cument servers including the Wide Area In-
1991 by the archie service and work is underway
formation Service WAIS [5], menu servers in-
to develop application interfaces similar to those
cluding Gopher [7], and hyp ertext servers includ-
provided by other popular information tools.
ing World Wide Web [1] also provide the stor-
I. Intro duction
age function since they manage do cuments that
are subsequently retrieved and displayed by their
The past several years has brought the intro-
clients.
duction of a large numb er of information services
to the Internet. These services collectively pro-
Whereas storage is primarily a function of the
vide huge stores of information, if only one knows
server, access involves b oth the client and the
what to ask for, and where to lo ok. Unfortu-
server. The access function is the metho d by
nately, knowing what and where to ask is a ma-
which the client reads and p ossibly writes the
jor problem; available information is scattered
data stored on a server. The access metho d is
across many services, and di erent applications
typically de ned by network proto cols including
are needed to access the data in each.
the File Transfer Proto col FTP [11], Sun's Net-
While there have b een many attempts to cre-
work File System NFS [12], the Andrew File
ate gateways b etween these services, such gate-
System AFS [4], and the application sp eci c
ways have not b een as useful as needed. Gate-
proto cols used byWAIS, Gopher, and World
ways havetypically taken one of two forms. The
Wide Web. The client and the server must use
rst is a p ortal b etween two services: one ser-
the same proto col.
vice is used to nd an instance of a second ser-
vice, to which the user is then connected, leaving
The search function involves the iterativeor
the user to gure out how to query the second
recursive retrieval and analysis of information
service. The second form of gateway translates
ab out data meta-information, p ossibly across
queries from one service into those of another,
multiple rep ositories, with a sp eci c goal of iden-
and returns the result of the query in a form that
tifying or lo cating data that satis es the search
a user of the rst service can understand. The
criteria. A search heuristic de nes the metho d
second kind of gateway is easier for the user, but
and strategy used to conduct the search. Ex-
it is still limited since, as typically implemented,
amples of search to ols include Knowb ots [6] and
such gateways do not allow complete integration
NetFind [13]. Though searches are initiated by
of the two information spaces.
a client when the need for sp eci c data is real-
This pap er shows how the Prosp ero Directory
ized, parts of the searchmay execute remotely.
Service provides a framework for interconnecting
For example, the lo okup of an entry in a remote
information services. The pap er b egins by di-
index or directory on a WAIS, Gopher, WWW,
viding the functions of information systems into
or Prosp ero server constitutes a search. The Dy-
four categories: storage, retrieval, organization,
namic WAIS interface to NetFind [3] takes this a
and search. The role of existing services in this
step further; the server initiates and manages, in
taxonomy is describ ed, and the role of Prosp ero
real-time, a search across multiple hosts on b ehalf
in interconnecting existing information services
of the client. using this taxonomy is presented. DGC-1
Figure 1: A framework for network information services
Brute force search in a system as large as Awell de ned interface is needed that will
the Internet is not practical. To b e e ective, allow information maintained by one service to
search heuristics must b e applied. These heuris- b e used by another. This interface would ap-
tics rely on meta-information describing the data p ear b etween search and organization, providing
that is available. The more structured this meta- a common metho d for applications to query the
information, the more useful it is in directing meta-information maintained by other services.
searches. The organization function involves A similar interface would separate storage and ac-
the collection, maintenance, and structuring of cess, providing access by all applications to data
such meta-information. Examples of organiza- maintained by di erent services.
tion mechanisms include directories in Prosp ero,
Figure 1 shows the lo cation of existing services
menus in Gopher, links in hyp ertext do cuments,
in such a framework. Meta-information avail-
indices of data lo cal to a WAIS server, and the
able from archie, WAIS, World Wide Web, Pros-
le name index maintained by archie [2] for les
p ero, and Gopher menus are available through
scattered across the Internet.
a common query mechanism. Menu browsers,
The di erence b etween search and organiza-
le system to ols, hyp ertext browsers, and ded-
tion is that a search is initiated when sp eci c data
icated applications such as the command line
is needed, whereas data is organized in advance to
archie client and NetFind use this mechanism to
supp ort subsequent searches. Search mechanisms
query services. Up date of meta-informationis
play a role in the organization of data when the
also supp orted, providing a common interface b e-
results of p ossibly complex or exp ensive searches
tween hyp ertext and menu editors and rep osito-
are recorded for subsequent use by applications.
ries. Search engines can use the query mechanism
to identify ob jects of interest. They can then
I I I. A Flexible Framework
record the result of the search for subsequent use
by using the up date mechanism to add new links
Most information services presently available
to the information fabric. Section IV discusses
on the Internet provide their own mechanisms for
how Prosp ero provides this functionality.
each of the four functions, whereas their novel
The separation of storage and access is a bit features are usually con ned to their user inter-
more complicated. There are already numerous faces and/or at most one of the four functions.
access metho ds in use, and many of the les of As a result, gateways are required to allow appli-
interest in the Internet exist on servers that will cations from one service to use information main-
only supp ort a single proto col. The problem can tained for another. DGC-2
Figure 2: The Prosp ero naming network
Meta-information is represented by Prosp ero initially b e addressed by supp orting a common
in a naming network, a directed graph with la- application library that accepts a reference to an
b eled edges shown in gure 2. This naming ob ject obtained from the meta-information ser-
network provides the fabric through which appli- vices, and automatically invokes the appropri-
cations navigate. Each no de in the graph repre- ate access metho d to retrieve the ob ject. Pros-
sents an ob ject. An ob ject can b e a le, a di- p ero currently uses this metho d. While not de-
rectory, b oth, or neither if neither, it only has scrib ed in this pap er, we are also working on a
attributes. Each ob ject has asso ciated with it a common data access proto col for information ser-
set of user extensible attributes. These attributes vices. That proto col will supp ort gateway ser-
can haveintrinsic meaning, such as the length of vices to existing data access proto cols.
a le, or they can b e application sp eci c.
IV. The Mo del
If an ob ject serves as a container for data,
then it is a le and has one or more access-
The information services available on the In-
method attributes asso ciated with it. Each
ternet eachhave their advantages. In order to
access-method attribute enco des the informa-
accommo date the needs of each, a mechanism to
tion needed to retrieve the data from a data
integrate these services must b e exible. How-
rep ository using a particular access metho d.
ever, to b e practical and ecient, such a mech-
anism must not b ecome b ogged down providing
If an ob ject is a directory it contains a set of
functionality that will only b e used o ccasionally.
named links to other ob jects. The lab eled edges
The solution to these comp eting goals is to pro-
in the naming network corresp ond to these links.
vide a simple but extensible framework on which
The names on the links serve as sign p osts to
the mechanisms sp eci c to individual systems can
help the user, or the user's application, navigate
b e built.
through the naming network in search of the de-
sired information. Applications that treat the These characteristics are present in the Pros-
naming network as a lesystem use the names p ero Directory Service [8]. Wechose to concen-
of the links as comp onents of lenames, allowing trate initially on integrating to ols for organizing
les in sub directories to b e named by the concate- information, applying Prosp ero as a common pro-
nation of the names of the links that are followed to col for access to meta-information from multi-
to reach them. ple information services. DGC-3
Prosp ero Pro c. INET '93 Neuman & Augart
When searching for information, users and query. These arguments include a string that is
applications also use attributes. In addition to matched against the lenames in the index. The
ob ject attributes, attributes can b e stored with result of a query is represented as the contents of
links. Such link attributes either store additional the Prosp ero directory. Up on receipt of the di-
information ab out the link such as annotations, rectory query, the archie server extracts the argu-
or they cache information ab out the ob ject to ments, p erforms a query on the archie database,
which the link refers. Cached attributes allowan and returns the results to the client. Each le
application to retrieve attributes for the ob jects matching the query is represented as a link in the
in a directory in a single query, rather than mak- directory to the le on the remote FTP site. The
ing individual requests for the attributes of each information needed to retrieve the matching le
ob ject referenced from the directory. is returned as part of the access-method at-
tribute asso ciated with the link. Other attributes
A lter can also b e asso ciated with a link. A
are also returned, including le size and the time
lter, and a second kind of link called a union
of last mo di cation. One can apply a Prosp ero
link, allow a view of a directory to b e created that
lter to a directory query to restrict the links
is a function of another view. Filters and union
that are returned to those for les on hosts in a
links provide a mechanism for users to enco de,
particular part of the network.
on a link, metho ds to b e applied when search-
ing. Because they are applied when a directory
Gopher [7] allows users to browse menus con-
is listed, the resulting view is kept up-to-date,
gured by Gopher service providers. By select-
even when the underlying information changes.
ing menu items, users are able to run applica-
Filters and union links are describ ed in greater
tions, search lo cal databases, select and display
detail elsewhere [9].
les and sub-menus, and connect to other ser-
vices available on the Internet. A Gopher menu
V. Implementation Notes
can b e represented as a Prosp ero directory. The
individual items in the menu corresp ond to links
Prosp ero was designed to b e simple but exten-
in the directory. Submenus are links to other
sible. The proto col is simple, supp orting stateless
directories. Links to les have an asso ciated
queries for attributes and the contents of directo-
access-method attribute that provides the in-
ries. The proto col is layered on top of a reliable
formation needed to retrieve the le. Attributes
delivery mechanism implemented using UDP.In
asso ciated with links also sp ecify how the data
the normal case, a query requires a single packet
in a le should b e presented, for example how
for the request and a single packet resp onse, elim-
to display images or play back an audio record-
inating the cost asso ciated with setting up and
ing. Links to Internet services reachable by tel-
breaking down a connection. This low cost is par-
net havean access-method attribute of typ e
ticularly imp ortant for clients that contact mul-
telnet; this access-method contains the in-
tiple servers in series as is often the case when
formation needed to telnet to the service.
resolving names.
The WAIS server [5] maintains a full text
Despite the simplicity of the proto col, Pros-
index to a set of do cuments on a single sys-
p ero is extremely exible. Functions sp eci c to
tem, allows that index to b e queried remotely,
an application rely on the extensible attribute
and allows remote read access to the do cuments.
mechanism. Applications not needing that func-
The WAIS client provides a common front end
tionality ignore the additional attributes, while
for queries and access to do cuments on multiple
applications that know ab out it can take action
WAIS servers. A WAIS query can b e represented
based on the value of the attributes.
as a Prosp ero directory in much the same man-
ner as an archie query. Like archie, the result
VI. Integrating Services
of the query would b e represented by the links
in the directory. Each of these links would b e a
This section describ es how the meta-
reference to a do cument on the server itself and
information maintained by several Internet in-
the access-method asso ciated with the link
formation services can b e represented using the
would sp ecify the WAIS access metho d. Other
mo del from section IV.
attributes of the link could include the relative
con dence in the match. To retrieve a matched Archie [2] constructs an index of les from In-
do cument the application would pass the selected ternet anonymous FTP sites and makes the in-
link to the Prosp ero library function pfs open dex available to applications using the Prosp ero
whichwould automatically invoke the WAIS ac- proto col. A query to the archie database corre-
cess metho d to retrieve the do cument from the sp onds to a Prosp ero directory query for a direc-
WAIS server. tory whose name includes the arguments of the DGC-4
Prosp ero Pro c. INET '93 Neuman & Augart
World Wide Web is a hyp ertext browser that VI I I. Status
supp orts the interconnection of collections of do c-
Prosp ero has b een available since December
uments with links that originate within the do cu-
1990. Recent revisions to the proto col allow more
ments themselves. Links to non-hyp ertext do cu-
exible integration of information services. Pros-
ments are also supp orted. A hyp ertext do cument
p ero has b een used since Spring of 1991 to make
may b e represented in Prosp ero as a no de that is
available meta-information maintained by archie.
b oth a le and a directory. The do cument that is
Work is underway to develop application inter-
displayed to the user would b e retrieved accord-
faces for Prosp ero that provide the functionality
ing to the access-method attribute asso ciated
of Gopher, WAIS, and World Wide Web. To nd
with the no de. The links in the directory would
out more ab out Prosp ero, or for directions on re-
represent links to other do cuments. The name of
trieving the latest distribution, send a message to
each link would contain the text of a tag within
info-prosp [email protected].
the do cument from which the link is to originate.
O sets in the target do cument can b e represented
IX. Summary
as attributes of the link.
Ahuge amount of information is available on
Prosp ero allows users to create their own di-
the Internet. Unfortunately, this information is
rectory hierarchies from which they can make
scattered across many services and di erent ap-
links to directories and ob jects of interest. Since
plications are needed to access the data in each.
queries to other services and the ob jects stored
A framework was de ned within which such in-
by them are represented as no des in the Pros-
formation services can interop erate. By provid-
p ero naming network, users can create directo-
ing a common metho d for applications to query
ries in which the linked ob jects reside in di erent
the meta-information maintained by the services
services, but can b e accessed using a common
that organize data on the Internet, information
proto col, thus supp orting the integration of in-
maintained by one service can used by all. Pros-
formation across services.
p ero provides suchaninterface. A service that
uses Prosp ero to exp ort meta-information ab out
VI I. Security
the data it provides will b e usable by many appli-
Security is an imp ortant feature that is miss-
cations. Similarly, an application that uses Pros-
ing from most information systems on the Inter-
p ero to nd les will b e able to access information
net. Without ne grained control for access to in-
from many services.
formation, the owners of certain information will
not makeitavailable using such systems. Since
Acknowledgments
our goal is to allow the integration of information
from all sources on the Internet, securitymust
Many individuals contributed to the design
b e an imp ortant consideration. Security is esp e-
and implementation of Prosp ero. Ed Lazowska,
cially imp ortant in our mo del since we supp ort
John Zahorjan, David Notkin, Hank Levy, and
the remote mo di cation of information.
Alfred Sp ector help ed re ne the ideas that ul-
timately led to the development of Prosp ero.
In Prosp ero, access control lists ACLs may
Kwynn Buess, Steve Cli e, Alan Emtage, George
b e asso ciated with directories in the naming net-
Ferguson, Bill Griswold, Sanjay Joshi, Bren-
work, and with individual links within a direc-
dan Keho e, Dan King, and Prasad Upasani
tory. The p ermissions supp orted include read,
help ed with the implementation of Prosp ero and
mo dify, insert, delete, list, and administer. Pros-
Prosp ero-based applications. Gennady Medvin-
p ero requests are authenticated and the iden-
sky and Stuart Stubblebine commented on drafts
tity of the client is used to determine the access
of this pap er.
p ermissions that apply. Several authentication
metho ds are supp orted including authentication
References
based on the client's Internet address, the use of
passwords, and Version 5 of Kerb eros.
[1] Tim Berners-Lee, Rob ert Cailliau, Jean-
Francois Gro , and Bernd Pollermann.
Ho oks are also present to supp ort distributed
World-wide web: The information universe.
authorization and accounting mechanisms [10].
Electronic Networking: Research, Applica-
Supp ort for accounting will b ecome critical as
tions and Policy, 21, Spring 1992.
new for-hire information services are intro duced. DGC-5
Prosp ero Pro c. INET '93 Neuman & Augart
[2] Alan Emtage and Peter Deutsch. archie: An [11] Jon B. Postel and J. K. Reynolds. File trans-
electronic directory service for the Internet. fer proto col. DARPAInternet RFC 959, Oc-
In Proceedings of the Winter 1992 Usenix tob er 1985.
Conference, pages 93{110, January 1992.
[12] R. Sandb erg, D. Goldb erg, S. Kleiman,
[3] Darren R. Hardy. Scalable internet resource D. Walsh, and B. Lyon. Design and imple-
discovery among diverse information. Tech- mentation of the Sun Network File System.
nical Rep ort CU-CS-650-93, Departmentof In Proceedings of the Summer 1985 Usenix
Computer Science, University of Colorado, Conference, pages 119{130, June 1985.
Boulder, April 1993. M.S. Thesis.
[13] Michael F. Schwartz and P. G. Tsirigotis.
[4] John H. Howard, Michael L. Kazar, Exp erience with a semantically cognizant in-
Sherri G. Menees, David A. Nichols, ternet white pages directory to ol. Journal of
M. Satyanarayanan, Rob ert N. Sideb otham, Internetworking: Research and Experience,
and Michael J. West. Scale and p erformance 21:23{50, 1991.
in a distributed le system. ACM Trans-
Author Information
actions on Computer Systems, 61:51{81,
February 1988.
Cli ord Neuman is a scientist at the Informa-
tion Sciences Institute of the University of South-
[5] Brewster Kahle and Art Medlar. An in-
ern California. After receiving a Bachelor's de-
formation system for corp orate users: Wide
gree from the Massachusetts Institute of Technol-
area information systems. Technical Rep ort
ogy in 1985 he sp entayear working for Pro ject
TMC-199, Thinking Machines Corp oration,
Athena where he was one of the principal design-
April 1991.
ers of the Kerb eros authentication system. He
holds M.S. and Ph.D. degrees from the Univer-
[6] Rob ert E. Kahn and Vinton G. Cerf. The
sityofWashington, where he initially develop ed
Digital Library Pro ject; Volume 1: The
Prosp ero as part of his dissertation. His research
world of Knowb ots draft. Corp oration for
fo cuses on problems of system organization and
National Research Initiatives, 1988.
security in distributed systems.
[7] Mark McCahill. The Internet gopher:
Steven Seger Augart is memb er of the research
A distributed server information system.
sta at the Information Sciences Institute of the
ConneXions { The Interoperability Report,
University of Southern California. He received
67:10{14, July 1992.
a Bachelor's degree from Harvard Universityin
1989 and an M.S. from the University of Califor-
[8] B. Cli ord Neuman. Prosp ero: A to ol for or-
nia at Irvine in 1992, with various b outs work-
ganizing Internet resources. Electronic Net-
ing as a systems programmer along the way. His
working: Research, Applications and Policy,
work at ISI has fo cused on the developmentof
21:30{37, Spring 1992.
Prosp ero as a scalable information infrastructure
for large distributed systems.
[9] B. Cli ord Neuman. The Prosp ero File Sys-
tem: A global le system based on the Vir-
This researchwas supp orted in part by the National Sci-
ence Foundation Grant No. CCR-8619663, the Washington
tual System Mo del. Computing Systems,
Technology Centers, Digital Equipment Corp oration, and the
54:407{432, Fall 1992.
Advanced Research Pro jects Agency under NASA Co op era-
tive Agreement NCC-2-539. The views and conclusions con-
tained in this pap er are those of the authors and should not
[10] B. Cli ord Neuman. Proxy-based autho-
be interpreted as representing the ocial p olicies, either ex-
rization and accounting for distributed sys-
pressed or implied, of any of the funding agencies. Figures
and descriptions in this pap er were provided by the authors
tems. In Proceedings of the 13th Interna-
and are used with p ermission. The authors may b e reached
tional Conference on Distributed Computing
at USC/ISI, 4676 AdmiraltyWay, Marina del Rey, CA 90292-
6695, USA. Telephone +1 310 822-1511, email b [email protected],
Systems, pages 283{291, May 1993.
[email protected]. DGC-6