Prosp ero Pro c. INET '93 Neuman & Augart

Prosp ero: A Base for Building Information Infrastructure

B. Cli ord Neuman Steven Seger Augart

Information Sciences Institute

University of Southern California

Abstract I I. The Four Functions

The recent introduction of new network infor-

The services of existing information

mation services has brought with it the need for

retrieval to ols can b e broken into four functions:

an information architecture to integrate informa-

storage, access, search, and organization.

tion from diverse sources. This paper describes

The storage function is the maintenance of the

how Prosperoprovides a framework within which

data that may subsequently b e provided to and

such services can be interconnected. The func-

interpreted by remote applications. File systems

tions of several existing information storage and

supp ort storage, providing a rep ository where

retrieval tools are described and we show how they

data may b e stored and subsequently retrieved.

t the framework. Prospero has been used since

Do cument servers including the Wide Area In-

1991 by the service and work is underway

formation Service WAIS [5], menu servers in-

to develop application interfaces similar to those

cluding [7], and hyp ertext servers includ-

provided by other popular information tools.

ing [1] also provide the stor-

I. Intro duction

age function since they manage do cuments that

are subsequently retrieved and displayed by their

The past several years has brought the intro-

clients.

duction of a large numb er of information services

to the Internet. These services collectively pro-

Whereas storage is primarily a function of the

vide huge stores of information, if only one knows

server, access involves b oth the client and the

what to ask for, and where to lo ok. Unfortu-

server. The access function is the metho d by

nately, knowing what and where to ask is a ma-

which the client reads and p ossibly writes the

jor problem; available information is scattered

data stored on a server. The access metho d is

across many services, and di erent applications

typically de ned by network proto cols including

are needed to access the data in each.

the File Transfer Proto col FTP [11], Sun's Net-

While there have b een many attempts to cre-

work File System NFS [12], the Andrew File

ate gateways b etween these services, such gate-

System AFS [4], and the application sp eci c

ways have not b een as useful as needed. Gate-

proto cols used byWAIS, Gopher, and World

ways havetypically taken one of two forms. The

Wide Web. The client and the server must use

rst is a p ortal b etween two services: one ser-

the same proto col.

vice is used to nd an instance of a second ser-

vice, to which the user is then connected, leaving

The search function involves the iterativeor

the user to gure out how to query the second

recursive retrieval and analysis of information

service. The second form of gateway translates

ab out data meta-information, p ossibly across

queries from one service into those of another,

multiple rep ositories, with a sp eci c goal of iden-

and returns the result of the query in a form that

tifying or lo cating data that satis es the search

a user of the rst service can understand. The

criteria. A search heuristic de nes the metho d

second kind of gateway is easier for the user, but

and strategy used to conduct the search. Ex-

it is still limited since, as typically implemented,

amples of search to ols include Knowb ots [6] and

such gateways do not allow complete integration

NetFind [13]. Though searches are initiated by

of the two information spaces.

a client when the need for sp eci c data is real-

This pap er shows how the Prosp ero Directory

ized, parts of the searchmay execute remotely.

Service provides a framework for interconnecting

For example, the lo okup of an entry in a remote

information services. The pap er b egins by di-

index or directory on a WAIS, Gopher, WWW,

viding the functions of information systems into

or Prosp ero server constitutes a search. The Dy-

four categories: storage, retrieval, organization,

namic WAIS interface to NetFind [3] takes this a

and search. The role of existing services in this

step further; the server initiates and manages, in

taxonomy is describ ed, and the role of Prosp ero

real-time, a search across multiple hosts on b ehalf

in interconnecting existing information services

of the client. using this taxonomy is presented. DGC-1

Figure 1: A framework for network information services

Brute force search in a system as large as Awell de ned interface is needed that will

the Internet is not practical. To b e e ective, allow information maintained by one service to

search heuristics must b e applied. These heuris- b e used by another. This interface would ap-

tics rely on meta-information describing the data p ear b etween search and organization, providing

that is available. The more structured this meta- a common metho d for applications to query the

information, the more useful it is in directing meta-information maintained by other services.

searches. The organization function involves A similar interface would separate storage and ac-

the collection, maintenance, and structuring of cess, providing access by all applications to data

such meta-information. Examples of organiza- maintained by di erent services.

tion mechanisms include directories in Prosp ero,

Figure 1 shows the lo cation of existing services

menus in Gopher, links in hyp ertext do cuments,

in such a framework. Meta-information avail-

indices of data lo cal to a WAIS server, and the

able from archie, WAIS, World Wide Web, Pros-

le name index maintained by archie [2] for les

p ero, and Gopher menus are available through

scattered across the Internet.

a common query mechanism. Menu browsers,

The di erence b etween search and organiza-

le system to ols, hyp ertext browsers, and ded-

tion is that a search is initiated when sp eci c data

icated applications such as the command line

is needed, whereas data is organized in advance to

archie client and NetFind use this mechanism to

supp ort subsequent searches. Search mechanisms

query services. Up date of meta-informationis

play a role in the organization of data when the

also supp orted, providing a common interface b e-

results of p ossibly complex or exp ensive searches

tween hyp ertext and menu editors and rep osito-

are recorded for subsequent use by applications.

ries. Search engines can use the query mechanism

to identify ob jects of interest. They can then

I I I. A Flexible Framework

record the result of the search for subsequent use

by using the up date mechanism to add new links

Most information services presently available

to the information fabric. Section IV discusses

on the Internet provide their own mechanisms for

how Prosp ero provides this functionality.

each of the four functions, whereas their novel

The separation of storage and access is a bit features are usually con ned to their user inter-

more complicated. There are already numerous faces and/or at most one of the four functions.

access metho ds in use, and many of the les of As a result, gateways are required to allow appli-

interest in the Internet exist on servers that will cations from one service to use information main-

only supp ort a single proto col. The problem can tained for another. DGC-2

Figure 2: The Prosp ero naming network

Meta-information is represented by Prosp ero initially b e addressed by supp orting a common

in a naming network, a directed graph with la- application library that accepts a reference to an

b eled edges shown in gure 2. This naming ob ject obtained from the meta-information ser-

network provides the fabric through which appli- vices, and automatically invokes the appropri-

cations navigate. Each no de in the graph repre- ate access metho d to retrieve the ob ject. Pros-

sents an ob ject. An ob ject can b e a le, a di- p ero currently uses this metho d. While not de-

rectory, b oth, or neither if neither, it only has scrib ed in this pap er, we are also working on a

attributes. Each ob ject has asso ciated with it a common data access proto col for information ser-

set of user extensible attributes. These attributes vices. That proto col will supp ort gateway ser-

can haveintrinsic meaning, such as the length of vices to existing data access proto cols.

a le, or they can b e application sp eci c.

IV. The Mo del

If an ob ject serves as a container for data,

then it is a le and has one or more access-

The information services available on the In-

method attributes asso ciated with it. Each

ternet eachhave their advantages. In order to

access-method attribute enco des the informa-

accommo date the needs of each, a mechanism to

tion needed to retrieve the data from a data

integrate these services must b e exible. How-

rep ository using a particular access metho d.

ever, to b e practical and ecient, such a mech-

anism must not b ecome b ogged down providing

If an ob ject is a directory it contains a set of

functionality that will only b e used o ccasionally.

named links to other ob jects. The lab eled edges

The solution to these comp eting goals is to pro-

in the naming network corresp ond to these links.

vide a simple but extensible framework on which

The names on the links serve as sign p osts to

the mechanisms sp eci c to individual systems can

help the user, or the user's application, navigate

b e built.

through the naming network in search of the de-

sired information. Applications that treat the These characteristics are present in the Pros-

naming network as a lesystem use the names p ero Directory Service [8]. Wechose to concen-

of the links as comp onents of lenames, allowing trate initially on integrating to ols for organizing

les in sub directories to b e named by the concate- information, applying Prosp ero as a common pro-

nation of the names of the links that are followed to col for access to meta-information from multi-

to reach them. ple information services. DGC-3

Prosp ero Pro c. INET '93 Neuman & Augart

When searching for information, users and query. These arguments include a string that is

applications also use attributes. In addition to matched against the lenames in the index. The

ob ject attributes, attributes can b e stored with result of a query is represented as the contents of

links. Such link attributes either store additional the Prosp ero directory. Up on receipt of the di-

information ab out the link such as annotations, rectory query, the archie server extracts the argu-

or they cache information ab out the ob ject to ments, p erforms a query on the archie database,

which the link refers. Cached attributes allowan and returns the results to the client. Each le

application to retrieve attributes for the ob jects matching the query is represented as a link in the

in a directory in a single query, rather than mak- directory to the le on the remote FTP site. The

ing individual requests for the attributes of each information needed to retrieve the matching le

ob ject referenced from the directory. is returned as part of the access-method at-

tribute asso ciated with the link. Other attributes

A lter can also b e asso ciated with a link. A

are also returned, including le size and the time

lter, and a second kind of link called a union

of last mo di cation. One can apply a Prosp ero

link, allow a view of a directory to b e created that

lter to a directory query to restrict the links

is a function of another view. Filters and union

that are returned to those for les on hosts in a

links provide a mechanism for users to enco de,

particular part of the network.

on a link, metho ds to b e applied when search-

ing. Because they are applied when a directory

Gopher [7] allows users to browse menus con-

is listed, the resulting view is kept up-to-date,

gured by Gopher service providers. By select-

even when the underlying information changes.

ing menu items, users are able to run applica-

Filters and union links are describ ed in greater

tions, search lo cal databases, select and display

detail elsewhere [9].

les and sub-menus, and connect to other ser-

vices available on the Internet. A Gopher menu

V. Implementation Notes

can b e represented as a Prosp ero directory. The

individual items in the menu corresp ond to links

Prosp ero was designed to b e simple but exten-

in the directory. Submenus are links to other

sible. The proto col is simple, supp orting stateless

directories. Links to les have an asso ciated

queries for attributes and the contents of directo-

access-method attribute that provides the in-

ries. The proto col is layered on top of a reliable

formation needed to retrieve the le. Attributes

delivery mechanism implemented using UDP.In

asso ciated with links also sp ecify how the data

the normal case, a query requires a single packet

in a le should b e presented, for example how

for the request and a single packet resp onse, elim-

to display images or play back an audio record-

inating the cost asso ciated with setting up and

ing. Links to Internet services reachable by tel-

breaking down a connection. This low cost is par-

net havean access-method attribute of typ e

ticularly imp ortant for clients that contact mul-

telnet; this access-method contains the in-

tiple servers in series as is often the case when

formation needed to telnet to the service.

resolving names.

The WAIS server [5] maintains a full text

Despite the simplicity of the proto col, Pros-

index to a set of do cuments on a single sys-

p ero is extremely exible. Functions sp eci c to

tem, allows that index to b e queried remotely,

an application rely on the extensible attribute

and allows remote read access to the do cuments.

mechanism. Applications not needing that func-

The WAIS client provides a common front end

tionality ignore the additional attributes, while

for queries and access to do cuments on multiple

applications that know ab out it can take action

WAIS servers. A WAIS query can b e represented

based on the value of the attributes.

as a Prosp ero directory in much the same man-

ner as an archie query. Like archie, the result

VI. Integrating Services

of the query would b e represented by the links

in the directory. Each of these links would b e a

This section describ es how the meta-

reference to a do cument on the server itself and

information maintained by several Internet in-

the access-method asso ciated with the link

formation services can b e represented using the

would sp ecify the WAIS access metho d. Other

mo del from section IV.

attributes of the link could include the relative

con dence in the match. To retrieve a matched Archie [2] constructs an index of les from In-

do cument the application would pass the selected ternet anonymous FTP sites and makes the in-

link to the Prosp ero library function pfs open dex available to applications using the Prosp ero

whichwould automatically invoke the WAIS ac- proto col. A query to the archie database corre-

cess metho d to retrieve the do cument from the sp onds to a Prosp ero directory query for a direc-

WAIS server. tory whose name includes the arguments of the DGC-4

Prosp ero Pro c. INET '93 Neuman & Augart

World Wide Web is a hyp ertext browser that VI I I. Status

supp orts the interconnection of collections of do c-

Prosp ero has b een available since December

uments with links that originate within the do cu-

1990. Recent revisions to the proto col allow more

ments themselves. Links to non-hyp ertext do cu-

exible integration of information services. Pros-

ments are also supp orted. A hyp ertext do cument

p ero has b een used since Spring of 1991 to make

may b e represented in Prosp ero as a no de that is

available meta-information maintained by archie.

b oth a le and a directory. The do cument that is

Work is underway to develop application inter-

displayed to the user would b e retrieved accord-

faces for Prosp ero that provide the functionality

ing to the access-method attribute asso ciated

of Gopher, WAIS, and World Wide Web. To nd

with the no de. The links in the directory would

out more ab out Prosp ero, or for directions on re-

represent links to other do cuments. The name of

trieving the latest distribution, send a message to

each link would contain the text of a tag within

info-prosp [email protected].

the do cument from which the link is to originate.

O sets in the target do cument can b e represented

IX. Summary

as attributes of the link.

Ahuge amount of information is available on

Prosp ero allows users to create their own di-

the Internet. Unfortunately, this information is

rectory hierarchies from which they can make

scattered across many services and di erent ap-

links to directories and ob jects of interest. Since

plications are needed to access the data in each.

queries to other services and the ob jects stored

A framework was de ned within which such in-

by them are represented as no des in the Pros-

formation services can interop erate. By provid-

p ero naming network, users can create directo-

ing a common metho d for applications to query

ries in which the linked ob jects reside in di erent

the meta-information maintained by the services

services, but can b e accessed using a common

that organize data on the Internet, information

proto col, thus supp orting the integration of in-

maintained by one service can used by all. Pros-

formation across services.

p ero provides suchaninterface. A service that

uses Prosp ero to exp ort meta-information ab out

VI I. Security

the data it provides will b e usable by many appli-

Security is an imp ortant feature that is miss-

cations. Similarly, an application that uses Pros-

ing from most information systems on the Inter-

p ero to nd les will b e able to access information

net. Without ne grained control for access to in-

from many services.

formation, the owners of certain information will

not makeitavailable using such systems. Since

Acknowledgments

our goal is to allow the integration of information

from all sources on the Internet, securitymust

Many individuals contributed to the design

b e an imp ortant consideration. Security is esp e-

and implementation of Prosp ero. Ed Lazowska,

cially imp ortant in our mo del since we supp ort

John Zahorjan, David Notkin, Hank Levy, and

the remote mo di cation of information.

Alfred Sp ector help ed re ne the ideas that ul-

timately led to the development of Prosp ero.

In Prosp ero, access control lists ACLs may

Kwynn Buess, Steve Cli e, , George

b e asso ciated with directories in the naming net-

Ferguson, Bill Griswold, Sanjay Joshi, Bren-

work, and with individual links within a direc-

dan Keho e, Dan King, and Prasad Upasani

tory. The p ermissions supp orted include read,

help ed with the implementation of Prosp ero and

mo dify, insert, delete, list, and administer. Pros-

Prosp ero-based applications. Gennady Medvin-

p ero requests are authenticated and the iden-

sky and Stuart Stubblebine commented on drafts

tity of the client is used to determine the access

of this pap er.

p ermissions that apply. Several authentication

metho ds are supp orted including authentication

References

based on the client's Internet address, the use of

passwords, and Version 5 of Kerb eros.

[1] Tim Berners-Lee, Rob ert Cailliau, Jean-

Francois Gro , and Bernd Pollermann.

Ho oks are also present to supp ort distributed

World-wide web: The information universe.

authorization and accounting mechanisms [10].

Electronic Networking: Research, Applica-

Supp ort for accounting will b ecome critical as

tions and Policy, 21, Spring 1992.

new for-hire information services are intro duced. DGC-5

Prosp ero Pro c. INET '93 Neuman & Augart

[2] Alan Emtage and Peter Deutsch. archie: An [11] Jon B. Postel and J. K. Reynolds. File trans-

electronic directory service for the Internet. fer proto col. DARPAInternet RFC 959, Oc-

In Proceedings of the Winter 1992 Usenix tob er 1985.

Conference, pages 93{110, January 1992.

[12] R. Sandb erg, D. Goldb erg, S. Kleiman,

[3] Darren R. Hardy. Scalable internet resource D. Walsh, and B. Lyon. Design and imple-

discovery among diverse information. Tech- mentation of the Sun Network File System.

nical Rep ort CU-CS-650-93, Departmentof In Proceedings of the Summer 1985 Usenix

Computer Science, University of Colorado, Conference, pages 119{130, June 1985.

Boulder, April 1993. M.S. Thesis.

[13] Michael F. Schwartz and P. G. Tsirigotis.

[4] John H. Howard, Michael L. Kazar, Exp erience with a semantically cognizant in-

Sherri G. Menees, David A. Nichols, ternet white pages directory to ol. Journal of

M. Satyanarayanan, Rob ert N. Sideb otham, Internetworking: Research and Experience,

and Michael J. West. Scale and p erformance 21:23{50, 1991.

in a distributed le system. ACM Trans-

Author Information

actions on Computer Systems, 61:51{81,

February 1988.

Cli ord Neuman is a scientist at the Informa-

tion Sciences Institute of the University of South-

[5] and Art Medlar. An in-

ern California. After receiving a Bachelor's de-

formation system for corp orate users: Wide

gree from the Massachusetts Institute of Technol-

area information systems. Technical Rep ort

ogy in 1985 he sp entayear working for Pro ject

TMC-199, Thinking Machines Corp oration,

Athena where he was one of the principal design-

April 1991.

ers of the Kerb eros authentication system. He

holds M.S. and Ph.D. degrees from the Univer-

[6] Rob ert E. Kahn and Vinton G. Cerf. The

sityofWashington, where he initially develop ed

Digital Library Pro ject; Volume 1: The

Prosp ero as part of his dissertation. His research

world of Knowb ots draft. Corp oration for

fo cuses on problems of system organization and

National Research Initiatives, 1988.

security in distributed systems.

[7] Mark McCahill. The Internet gopher:

Steven Seger Augart is memb er of the research

A distributed server information system.

sta at the Information Sciences Institute of the

ConneXions { The Interoperability Report,

University of Southern California. He received

67:10{14, July 1992.

a Bachelor's degree from Harvard Universityin

1989 and an M.S. from the University of Califor-

[8] B. Cli ord Neuman. Prosp ero: A to ol for or-

nia at Irvine in 1992, with various b outs work-

ganizing Internet resources. Electronic Net-

ing as a systems programmer along the way. His

working: Research, Applications and Policy,

work at ISI has fo cused on the developmentof

21:30{37, Spring 1992.

Prosp ero as a scalable information infrastructure

for large distributed systems.

[9] B. Cli ord Neuman. The Prosp ero File Sys-

tem: A global le system based on the Vir-

This researchwas supp orted in part by the National Sci-

ence Foundation Grant No. CCR-8619663, the Washington

tual System Mo del. Computing Systems,

Technology Centers, Digital Equipment Corp oration, and the

54:407{432, Fall 1992.

Advanced Research Pro jects Agency under NASA Co op era-

tive Agreement NCC-2-539. The views and conclusions con-

tained in this pap er are those of the authors and should not

[10] B. Cli ord Neuman. Proxy-based autho-

be interpreted as representing the ocial p olicies, either ex-

rization and accounting for distributed sys-

pressed or implied, of any of the funding agencies. Figures

and descriptions in this pap er were provided by the authors

tems. In Proceedings of the 13th Interna-

and are used with p ermission. The authors may b e reached

tional Conference on Distributed Computing

at USC/ISI, 4676 AdmiraltyWay, Marina del Rey, CA 90292-

6695, USA. Telephone +1 310 822-1511, email b [email protected],

Systems, pages 283{291, May 1993.

[email protected]. DGC-6