Prosperoprovides a Framework Within Which
Total Page:16
File Type:pdf, Size:1020Kb
Prosp ero Pro c. INET '93 Neuman & Augart Prosp ero: A Base for Building Information Infrastructure B. Cli ord Neuman Steven Seger Augart Information Sciences Institute University of Southern California Abstract I I. The Four Functions The recent introduction of new network infor- The services of existing Internet information mation services has brought with it the need for retrieval to ols can b e broken into four functions: an information architecture to integrate informa- storage, access, search, and organization. tion from diverse sources. This paper describes The storage function is the maintenance of the how Prosperoprovides a framework within which data that may subsequently b e provided to and such services can be interconnected. The func- interpreted by remote applications. File systems tions of several existing information storage and supp ort storage, providing a rep ository where retrieval tools are described and we show how they data may b e stored and subsequently retrieved. t the framework. Prospero has been used since Do cument servers including the Wide Area In- 1991 by the archie service and work is underway formation Service WAIS [5], menu servers in- to develop application interfaces similar to those cluding Gopher [7], and hyp ertext servers includ- provided by other popular information tools. ing World Wide Web [1] also provide the stor- I. Intro duction age function since they manage do cuments that are subsequently retrieved and displayed by their The past several years has brought the intro- clients. duction of a large numb er of information services to the Internet. These services collectively pro- Whereas storage is primarily a function of the vide huge stores of information, if only one knows server, access involves b oth the client and the what to ask for, and where to lo ok. Unfortu- server. The access function is the metho d by nately, knowing what and where to ask is a ma- which the client reads and p ossibly writes the jor problem; available information is scattered data stored on a server. The access metho d is across many services, and di erent applications typically de ned by network proto cols including are needed to access the data in each. the File Transfer Proto col FTP [11], Sun's Net- While there have b een many attempts to cre- work File System NFS [12], the Andrew File ate gateways b etween these services, such gate- System AFS [4], and the application sp eci c ways have not b een as useful as needed. Gate- proto cols used byWAIS, Gopher, and World ways havetypically taken one of two forms. The Wide Web. The client and the server must use rst is a p ortal b etween two services: one ser- the same proto col. vice is used to nd an instance of a second ser- vice, to which the user is then connected, leaving The search function involves the iterativeor the user to gure out how to query the second recursive retrieval and analysis of information service. The second form of gateway translates ab out data meta-information, p ossibly across queries from one service into those of another, multiple rep ositories, with a sp eci c goal of iden- and returns the result of the query in a form that tifying or lo cating data that satis es the search a user of the rst service can understand. The criteria. A search heuristic de nes the metho d second kind of gateway is easier for the user, but and strategy used to conduct the search. Ex- it is still limited since, as typically implemented, amples of search to ols include Knowb ots [6] and such gateways do not allow complete integration NetFind [13]. Though searches are initiated by of the two information spaces. a client when the need for sp eci c data is real- This pap er shows how the Prosp ero Directory ized, parts of the searchmay execute remotely. Service provides a framework for interconnecting For example, the lo okup of an entry in a remote information services. The pap er b egins by di- index or directory on a WAIS, Gopher, WWW, viding the functions of information systems into or Prosp ero server constitutes a search. The Dy- four categories: storage, retrieval, organization, namic WAIS interface to NetFind [3] takes this a and search. The role of existing services in this step further; the server initiates and manages, in taxonomy is describ ed, and the role of Prosp ero real-time, a search across multiple hosts on b ehalf in interconnecting existing information services of the client. using this taxonomy is presented. DGC-1 Figure 1: A framework for network information services Brute force search in a system as large as Awell de ned interface is needed that will the Internet is not practical. To b e e ective, allow information maintained by one service to search heuristics must b e applied. These heuris- b e used by another. This interface would ap- tics rely on meta-information describing the data p ear b etween search and organization, providing that is available. The more structured this meta- a common metho d for applications to query the information, the more useful it is in directing meta-information maintained by other services. searches. The organization function involves A similar interface would separate storage and ac- the collection, maintenance, and structuring of cess, providing access by all applications to data such meta-information. Examples of organiza- maintained by di erent services. tion mechanisms include directories in Prosp ero, Figure 1 shows the lo cation of existing services menus in Gopher, links in hyp ertext do cuments, in such a framework. Meta-information avail- indices of data lo cal to a WAIS server, and the able from archie, WAIS, World Wide Web, Pros- le name index maintained by archie [2] for les p ero, and Gopher menus are available through scattered across the Internet. a common query mechanism. Menu browsers, The di erence b etween search and organiza- le system to ols, hyp ertext browsers, and ded- tion is that a search is initiated when sp eci c data icated applications such as the command line is needed, whereas data is organized in advance to archie client and NetFind use this mechanism to supp ort subsequent searches. Search mechanisms query services. Up date of meta-informationis play a role in the organization of data when the also supp orted, providing a common interface b e- results of p ossibly complex or exp ensive searches tween hyp ertext and menu editors and rep osito- are recorded for subsequent use by applications. ries. Search engines can use the query mechanism to identify ob jects of interest. They can then I I I. A Flexible Framework record the result of the search for subsequent use by using the up date mechanism to add new links Most information services presently available to the information fabric. Section IV discusses on the Internet provide their own mechanisms for how Prosp ero provides this functionality. each of the four functions, whereas their novel The separation of storage and access is a bit features are usually con ned to their user inter- more complicated. There are already numerous faces and/or at most one of the four functions. access metho ds in use, and many of the les of As a result, gateways are required to allow appli- interest in the Internet exist on servers that will cations from one service to use information main- only supp ort a single proto col. The problem can tained for another. DGC-2 Figure 2: The Prosp ero naming network Meta-information is represented by Prosp ero initially b e addressed by supp orting a common in a naming network, a directed graph with la- application library that accepts a reference to an b eled edges shown in gure 2. This naming ob ject obtained from the meta-information ser- network provides the fabric through which appli- vices, and automatically invokes the appropri- cations navigate. Each no de in the graph repre- ate access metho d to retrieve the ob ject. Pros- sents an ob ject. An ob ject can b e a le, a di- p ero currently uses this metho d. While not de- rectory, b oth, or neither if neither, it only has scrib ed in this pap er, we are also working on a attributes. Each ob ject has asso ciated with it a common data access proto col for information ser- set of user extensible attributes. These attributes vices. That proto col will supp ort gateway ser- can haveintrinsic meaning, such as the length of vices to existing data access proto cols. a le, or they can b e application sp eci c. IV. The Mo del If an ob ject serves as a container for data, then it is a le and has one or more access- The information services available on the In- method attributes asso ciated with it. Each ternet eachhave their advantages. In order to access-method attribute enco des the informa- accommo date the needs of each, a mechanism to tion needed to retrieve the data from a data integrate these services must b e exible. How- rep ository using a particular access metho d. ever, to b e practical and ecient, such a mech- anism must not b ecome b ogged down providing If an ob ject is a directory it contains a set of functionality that will only b e used o ccasionally. named links to other ob jects. The lab eled edges The solution to these comp eting goals is to pro- in the naming network corresp ond to these links.