A Unified Peer-To-Peer Database Framework For

A Unified Peer-To-Peer Database Framework For

A UNIFIED PEER-TO-PEER DATABASE FRAMEWORK FOR XQUERIES OVER DYNAMIC DISTRIBUTED CONTENT AND ITS APPLICATION FOR SCALABLE SERVICE DISCOVERY DISSERTATION DIPL. ING. WOLFGANG HOSCHEK AUSGEFUHRT¨ ZUM ZWECKE DER ERLANGUNG DES AKADEMISCHEN GRADES EINES DOKTORS DER TECHNISCHEN WISSENSCHAFTEN EINGEREICHT AN DER TECHNISCHEN UNIVERSITAT¨ WIEN FAKULTAT¨ FUR¨ TECHNISCHE NATURWISSENSCHAFTEN UND INFORMATIK CERN-THESIS-2002-050 //2002 UNTER DER ANLEITUNG VON O.UNIV.-PROF. DIPL.-ING. MAG. DR. GERTI KAPPEL AO.UNIV.-PROF. DR. ERICH SCHIKUTA CERN BETREUER: DR. BERND PANZER-STEINDEL GENF, IM MARZ¨ 2002 Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Background . 4 1.3 Contribution and Organization of this Thesis . 7 1.4 Terminology . 13 2 Service Discovery Processing Steps 15 2.1 Introduction . 15 2.2 Description . 16 2.3 Presentation . 18 2.4 Publication . 19 2.5 Soft State Publication . 21 2.6 Request . 22 2.7 Discovery . 23 2.8 Brokering . 23 2.9 Execution . 24 2.10 Control . 25 2.11 Summary . 27 3 A Data Model and Query Language for Discovery 29 3.1 Introduction . 29 3.2 Database and Query Model . 30 3.3 Generic and Dynamic Data Model . 32 3.4 Query Examples and Types . 34 3.5 XQuery Language . 37 3.6 Related Work . 44 3.7 Summary . 46 4 A Database for Discovery of Distributed Content 49 4.1 Introduction . 49 4.2 Content Link and Content Provider . 51 4.3 Publication . 53 4.4 Query . 56 4.5 Caching . 57 i ii CONTENTS 4.6 Soft State . 59 4.7 Flexible Freshness . 61 4.8 Throttling . 62 4.9 Related Work . 63 4.10 Summary . 68 5 The Web Service Discovery Architecture 69 5.1 Introduction . 69 5.2 Interfaces . 70 5.3 Network Protocol Bindings . 74 5.4 Services . 75 5.5 Properties . 75 5.6 Comparison with Open Grid Services Architecture . 78 5.7 Summary . 81 6 A Unified Peer-to-Peer Database Framework 85 6.1 Introduction . 86 6.2 Agent P2P Model and Servent P2P Model . 89 6.3 Loop Detection . 90 6.4 Routed vs. Direct Response, Metadata Responses . 91 6.5 Query Processing . 96 6.6 Pipelining . 101 6.7 Static Loop Timeout and Dynamic Abort Timeout . 102 6.8 Query Scope . 106 6.9 Containers for Centralized Virtual Node Hosting . 110 6.10 Query Processing with Virtual Nodes . 112 6.11 Related Work . 115 6.12 Summary . 119 7 A Unified Peer-to-Peer Database Protocol 125 7.1 Introduction . 125 7.2 Originator and Node . 126 7.3 High-Level Messaging Model . 127 7.4 Concrete Messages . 131 7.5 Communication Model and Network Protocol . 134 7.6 Node State Table . 138 7.7 Related Work . 139 7.8 Summary . 141 8 Conclusion 145 8.1 Summary . 145 8.2 Directions for Future Research . 148 9 Acknowledgements 151 Abstract In a large distributed system spanning administrative domains such as a Grid, it is desirable to maintain and query dynamic and timely information about active participants such as services, resources and user communities. The web services vision promises that programs are made more flexible and powerful by querying Internet databases (registries) at runtime in order to discover information and network attached third-party building blocks. Services can advertise themselves and related metadata via such databases, enabling the assembly of distributed higher-level components. In support of this vision, this thesis shows how to support expressive general-purpose queries over a view that integrates autonomous dynamic database nodes from a wide range of distributed system topologies. We motivate and justify the assertion that realistic ubiquitous service and resource dis- covery requires a rich general-purpose query language such as XQuery or SQL. Next, we introduce the Web Service Discovery Architecture (WSDA), which subsumes an array of dis- parate concepts, interfaces and protocols under a single semi-transparent umbrella. WSDA specifies a small set of orthogonal multi-purpose communication primitives (building blocks) for discovery. These primitives cover service identification, service description retrieval, data publication as well as minimal and powerful query support. The individual primitives can be combined and plugged together by specific clients and services to yield a wide range of be- haviors and emerging synergies. Based on WSDA, we introduce the hyper registry, which is a centralized database node for discovery of dynamic distributed content, supporting XQueries over a tuple set from an XML data model. We address the problem of maintaining dynamic and timely information populated from a large variety of unreliable, frequently changing, autonomous and heterogeneous remote data sources. However, in a large cross-organizational system, the set of information tuples is partitioned over many such distributed nodes, for reasons including autonomy, scalability, availability, performance and security. This suggests the use of Peer-to-Peer (P2P) query technology. Consequently, we take the first steps towards unifying the fields of database management systems and P2P computing. As a result, we propose the WSDA based Unified Peer-to-Peer Database Framework (UPDF) and its associated Peer Database Protocol (PDP), which are unified in the sense that they allow to express specific applications for a wide range of data types (typed or untyped XML, any MIME type), node topologies (e.g. ring, tree, graph), query languages (e.g. XQuery, SQL), query response modes (e.g. Routed, Direct and Referral Response), neighbor selection policies, pipelining, timeout and other scope characteristics. The uniformity and wide applicability of our approach is distinguished from related work, which (1) addresses some but not all problems, and (2) does not propose a unified framework. iii iv CONTENTS Zusammenfassung In einem mehrere Organisationen ¨uberspannenden, großen verteilten System, wie z.B. einem Grid, ist es w¨unschenswert dynamische und zeitsensitive Information ¨uber Netzwerkdienste, Ressourcen und Benutzer zu verwalten und abzufragen. Das Konzept der Webdienste ver- spricht flexible Programme die zur Laufzeit Internet Datenbanken (Registries) benutzen um Informationen und Netzwerkdienste von Drittanbietern zu finden. Dienste k¨onnensich und verwandte Metadaten durch derartige Datenbanken anbieten und so das Zusammenf¨ugenvon h¨oherenverteilten Komponenten erm¨oglichen. Diese Dissertation unterst¨utztdiese Vision indem sie zeigt, wie ausdrucksstarke Mehrzweckabfragen ueber eine Sicht formuliert werden k¨onnen,die autonome dynamische Datenbankknoten von beliebigen Topologien integriert. Wir motivieren und rechtfertigen die Behauptung, daß das Finden von Ressourcen und Di- ensten eine reiche Mehrzweckabfragesprache wie z.B. XQuery oder SQL verlangt. Wir f¨uhren die sogenannte Web Service Discovery Architecture (WSDA) ein, die disparate Konzepte, Schnittstellen und Netzwerkprotokolle unter einem quasi-transparenten Dach zusammen- faßt. WSDA spezifiziert eine kleine Menge von orthogonalen Mehrzweckfunktionen (Bau- steinen) zum Finden von Diensten. Diese decken die Bereiche der Dienstidentifizierung, Dienstbeschreibung, Datenpublikation sowie minimale und m¨achtige Abfrageunterst¨utzung ab. Clients und Server k¨onnendiese Funktionen so kombinieren daß dabei eine breite Palette von Verhalten und Synergien ensteht. Basierend auf WSDA f¨uhrenwir eine zentrale Daten- bank f¨urdas Finden von dynamischen verteilten Daten ein, die Hyper Registry. Diese un- terst¨utztXQueries ¨uber dynamische, zeitsensitive Daten, die von unzuverl¨assigen,sich h¨aufig ¨andernden,autonomen und heterogenen Datenquellen stammen. In einem mehrere Organisationen ¨uberspannenden, großen verteilten System jedoch sind die Datentupel ¨uber viele Knoten verteilt, z.B. aus Gr¨undender Autonomie, Skalierbarkeit, Verf¨ugbarkeit, Effizienz und Sicherheit. Daher empfiehlt sich die Ben¨utzungvon Peer-to-Peer (P2P) Abfragetechnologie. So unternehmen wir die ersten Schritte zur Vereinheitlichung von Datenbank Management Systemen und P2P Computing. Auf WSDA basierend schlagen wir das Unified Peer-to-Peer Database Framework (UPDF) und korrespondierende Peer Database Protocol (PDP) vor. Beide sind vereinheitlicht in dem Sinn daß innerhalb ihreres Rah- mens spezifische Applikationen f¨ureine Vielfalt von Datentypen, Knotentopologien, Abfrage- sprachen, Antwort-Modi, und verschiedene Formen der Nachbarschaftsauswahl, des Pipelin- ing, und von Timeouts formuliert werden k¨onnen. Die Einheitlichkeit, breite Einsatzf¨ahigkeit und Wiederverwendbarkeit unseres Ansatzes unterscheidet sich von verwandten Arbeiten die (1) einzelne aber nicht alle Probleme behan- deln, und (2) keinen einheitlichen Rahmen einf¨uhren. v vi CONTENTS Chapter 1 Introduction 1.1 Motivation This thesis tackles the problems of information, resource and service discovery arising in large distributed Internet systems spanning multiple administrative domains. We show how to support expressive general-purpose queries over a view that integrates autonomous dy- namic database nodes from a wide range of distributed system topologies. The work was carried out in the context of the European DataGrid project (EDG) [1, 2, 3] at CERN, the European Organization for Nuclear Research, and supported by the Austrian Ministerium f¨urWissenschaft, Bildung und Kultur. The international High Energy Physics (HEP) research community is facing a substantial challenge in joining a massive set of loosely coupled people and resources from multiple distributed organizations. Although this is the driving use case of the EDG project, this thesis distills and generalizes the essential properties of the discovery problem and then develops generic solutions that apply to a wide range of large distributed

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    166 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us