Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference
Total Page:16
File Type:pdf, Size:1020Kb
USENIX Association Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference San Antonio, Texas, USA June 9-14, 2003 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2003 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. U-P2P: A Peer-to-Peer Framework for Universal Resource Sharing and Discovery Neal Arthorne, Babak Esfandiari, Aloke Mukherjee Department of Systems and Computer Engineering Carleton University Ottawa, Ontario, Canada [email protected] [email protected] [email protected] Abstract of files. Another important problem is the current frac- tured state of peer-to-peer communities. Again, the We present U-P2P, an open source framework for de- lack of metadata about communities makes it difficult veloping, deploying and discovering file-sharing com- to know what there is to look for in the first place. munities. We address the problem of search in peer-to- We propose a peer-to-peer framework called Univer- peer file sharing by allowing the end user to add meta- sal Peer-to-Peer (U-P2P) that simplifies the sharing of data to shared documents. Each file-sharing community custom metadata formats as well as the easy creation, allows the sharing of a particular structured document configuration and discovery of peer-to-peer communi- type. Communities are themselves modeled as struc- ties. In U-P2P, each community is described in part tured documents, thus enabling their sharing and dis- by the metadata of the files it exchanges. The format covery just like any other document. The creator of of a community’s metadata is specified using the XML a particular community specifies, among other proper- Schema language, allowing new communities centered ties, the document type that it shares and the deployment on that file type to be created in any text editor. U-P2P model. U-P2P’s extensible architecture allows develop- allows these custom resources to be created and shared ers to create new properties or extend existing ones. For as in traditional file-sharing services. example, developers can provide new deployment mod- By logical extension, the description of the commu- els or custom privacy and authentication features. U-P2P nity itself (the metadata format of its files, the proto- makes use of other open source projects such as Jakarta col used for search, etc) is encapsulated in an XML file. Tomcat and eXist, an XML database system. In U-P2P, creating, sharing or discovering a community follows the same principles as creating, sharing or dis- 1Introduction covering a file within that community. As a result, file- The current success of peer-to-peer (P2P) file sharing ap- swapping communities such as Napster or Kazaa can be plications has highlighted the benefits of resource dis- seen as instances of U-P2P devoted to sharing a few spe- tribution and redundancy. However, to truly exploit cific file types and utilizing a given P2P protocol. such advantages, a few roadblocks remain to be cleared. In particular, search and discovery of resources is still 2 Related Work quite difficult. Most known approaches rely on sim- Our focus is on the discovery problem in peer-to-peer ple schemes such as a search for the resource name or systems. In Napster [1], the only files that could be type. File sharing communities are most efficient when shared on the network were MP3 audio files. Search the name of the file carries most if not all of the needed was based on filenames and relied on users encoding information. As a result, most file swapping commu- the artist and title of each song in the MP3’s title. Al- nities are restricted to swapping music and video files. though metadata such as encoding rate could be used Even when the exchange of non-music files is possible, to sort the results, there was no way to search on these the difficulty of finding such files has been a sufficient metadata or define other parameters. On a Gnutella [2] deterrent. Lack of metadata is the main problem. network, any type of file can be shared: however there is Another roadblock to more general applications of no explicit metadata handling. Search strings are passed P2P has been the difficulty of creating communities for around without processing between peers and their in- specific purposes. This arises partially from the diffi- terpretation is left to the peer. Each peer must imple- culty of defining custom metadata about different types ment its own search algorithm using the search string USENIX Association FREENIX Track: 2003 USENIX Annual Technical Conference 29 as input. Most Gnutella implementations, like Napster, 3 Background: XML Schema simply return filenames that contain the search string. U-P2P relies heavily on XML Schema and the following Gnutella does permit the design of overlay protocols to sections include some examples of XML Schema and a encode and decode metadata from search strings. Some brief description of the structure of XML Schema docu- Gnutella developers have proposed using this approach ments. for richer metadata searches. [3, 4]. Schemas are defined XML Schema is a Recommendation [8] by the World for common file types: for example an audio file might Wide Consortium for a schema language for XML. It be defined to have properties such as artist, title, bit rate, constitutes a set of markup tags themselves written in album, etc. The schema defines a structured format for XML, that are used to describe both the structure of an searching MP3 metadata that is sent as a search string to XML document and the datatypes of individual elements other Gnutella nodes. Responding clients use the query and attributes. XML Schema uses the XML Namespace to search local files annotated using the schema and re- Recommendation that allows authors to prefix their own turns the results using the same structured format. In tags with a specific namespace. It also allows importing practice though, all members of a given community must types from other schemas both in the current namespace be able to speak a common language in order to commu- and from other namespaces. nicate. XML Schema can be used to validate XML docu- Other P2P systems such as FastTrack [5], Opencola ments using a schema processor. XML documents them- [6] or Bitzi [7] propose variations on that idea, but they selves can link to their schema using a schemaLocation are still limited to a number of predefined schemas. The tag. However this is only a suggested method for speci- latter two have the capability to extract metadata infor- fying a schema for the document and they may use other mation from a file given the file format, but this is only types of schemas such as DTDs or not require validation possible for certain formats. at all. In XML Schema, there are four basic units that can be Clearly, there is a need for shared ontologies if we used to describe parts of an XML document: elements, want to allow search for any type of file. In the In- attributes, simpleTypes and complexTypes. ternet world, the XML Schema format[8] is the cur- rent choice to represent ontologies, replacing Document Elements - Elements are the tags used in XML to Type Definition (DTD) [9], the schema language de- label a piece of data. They have a datatype associated fined in the original XML specification. XML Schema with them that is either defined at the same time as the supports the creation of custom and complex data types element (anonymously) or is a complexType or simple- for XML tags, which is essential to describe aggregated Type. Elements can contain subelements, attributes and or composite resources. For richer semantic descrip- can also be part of an XML Namespace. The structure tions, for example to allow software agents to perform of an element can be controlled in XML Schema using search instead of humans, there can be a need to de- sequences, choices and the same restrictions used to de- scribe relationships between resources. RDF [10] and rive simple and complex types. For example an author more generally the Semantic Web [11] effort address can specify that an element called ’name’ should consist that need. The Edutella project [12] is a technology for of an exact sequence of the two elements ’firstname’ and distributed learning that uses a P2P network for sharing ’lastname’. RDF-formatted metadata. However in Edutella metadata Attributes - Attributes are properties that belong to a is the resource, not the means to describe one. parent element. They can only use simpleTypes and they cannot contain any elements or attributes. Attributes can A P2P system using a semantic layering approach is be optional or mandatory within an element. not without its drawbacks. First and foremost is get- Simple Types - These types are either built-in data ting users of the system to supply metadata for their re- types specified in the XML Schema Recommendation or sources. The addition of metadata requires user-friendly Simple Types derived from the built-in or existing types tools for authoring RDF and schemas and a simplified (XML Schema defines some derived types such as date approach for users who are not familiar with XML lan- and time). The Recommendation includes many useful guages. data types such as string and decimal, that can easily be What we propose in U-P2P starts, as in Edutella, with manipulated into more specialized Simple Types.