A Unified Peer-To-Peer Database Framework For

A UNIFIED PEER-TO-PEER DATABASE FRAMEWORK FOR XQUERIES OVER DYNAMIC DISTRIBUTED CONTENT AND ITS APPLICATION FOR SCALABLE SERVICE DISCOVERY DISSERTATION DIPL. ING. WOLFGANG HOSCHEK AUSGEFUHRT¨ ZUM ZWECKE DER ERLANGUNG DES AKADEMISCHEN GRADES EINES DOKTORS DER TECHNISCHEN WISSENSCHAFTEN EINGEREICHT AN DER TECHNISCHEN UNIVERSITAT¨ WIEN FAKULTAT¨ FUR¨ TECHNISCHE NATURWISSENSCHAFTEN UND INFORMATIK CERN-THESIS-2002-050 //2002 UNTER DER ANLEITUNG VON O.UNIV.-PROF. DIPL.-ING. MAG. DR. GERTI KAPPEL AO.UNIV.-PROF. DR. ERICH SCHIKUTA CERN BETREUER: DR. BERND PANZER-STEINDEL GENF, IM MARZ¨ 2002 Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Background . 4 1.3 Contribution and Organization of this Thesis . 7 1.4 Terminology . 13 2 Service Discovery Processing Steps 15 2.1 Introduction . 15 2.2 Description . 16 2.3 Presentation . 18 2.4 Publication . 19 2.5 Soft State Publication . 21 2.6 Request . 22 2.7 Discovery . 23 2.8 Brokering . 23 2.9 Execution . 24 2.10 Control . 25 2.11 Summary . 27 3 A Data Model and Query Language for Discovery 29 3.1 Introduction . 29 3.2 Database and Query Model . 30 3.3 Generic and Dynamic Data Model . 32 3.4 Query Examples and Types . 34 3.5 XQuery Language . 37 3.6 Related Work . 44 3.7 Summary . 46 4 A Database for Discovery of Distributed Content 49 4.1 Introduction . 49 4.2 Content Link and Content Provider . 51 4.3 Publication . 53 4.4 Query . 56 4.5 Caching . 57 i ii CONTENTS 4.6 Soft State . 59 4.7 Flexible Freshness . 61 4.8 Throttling . 62 4.9 Related Work . 63 4.10 Summary . 68 5 The Web Service Discovery Architecture 69 5.1 Introduction . 69 5.2 Interfaces . 70 5.3 Network Protocol Bindings . 74 5.4 Services . 75 5.5 Properties . 75 5.6 Comparison with Open Grid Services Architecture . 78 5.7 Summary . 81 6 A Unified Peer-to-Peer Database Framework 85 6.1 Introduction . 86 6.2 Agent P2P Model and Servent P2P Model . 89 6.3 Loop Detection . 90 6.4 Routed vs. Direct Response, Metadata Responses . 91 6.5 Query Processing . 96 6.6 Pipelining . 101 6.7 Static Loop Timeout and Dynamic Abort Timeout . 102 6.8 Query Scope . 106 6.9 Containers for Centralized Virtual Node Hosting . 110 6.10 Query Processing with Virtual Nodes . 112 6.11 Related Work . 115 6.12 Summary . 119 7 A Unified Peer-to-Peer Database Protocol 125 7.1 Introduction . 125 7.2 Originator and Node . 126 7.3 High-Level Messaging Model . 127 7.4 Concrete Messages . 131 7.5 Communication Model and Network Protocol . 134 7.6 Node State Table . 138 7.7 Related Work . 139 7.8 Summary . 141 8 Conclusion 145 8.1 Summary . 145 8.2 Directions for Future Research . 148 9 Acknowledgements 151 Abstract In a large distributed system spanning administrative domains such as a Grid, it is desirable to maintain and query dynamic and timely information about active participants such as services, resources and user communities. The web services vision promises that programs are made more flexible and powerful by querying Internet databases (registries) at runtime in order to discover information and network attached third-party building blocks. Services can advertise themselves and related metadata via such databases, enabling the assembly of distributed higher-level components. In support of this vision, this thesis shows how to support expressive general-purpose queries over a view that integrates autonomous dynamic database nodes from a wide range of distributed system topologies. We motivate and justify the assertion that realistic ubiquitous service and resource discovery requires a rich general-purpose query language such as XQuery or SQL. Next, we introduce the Web Service Discovery Architecture (WSDA), which subsumes an array of disparate concepts, interfaces and protocols under a single semi-transparent umbrella. WSDA specifies a small set of orthogonal multi-purpose communication primitives (building blocks) for discovery. These primitives cover service identification, service description retrieval, data publication as well as minimal and powerful query support. The individual primitives can be combined and plugged together by specific clients and services to yield a wide range of be- haviors and emerging synergies. Based on WSDA, we introduce the hyper registry, which is a centralized database node for discovery of dynamic distributed content, supporting XQueries over a tuple set from an XML data model. We address the problem of maintaining dynamic and timely information populated from a large variety of unreliable, frequently changing, autonomous and heterogeneous remote data sources. However, in a large cross-organizational system, the set of information tuples is partitioned over many such distributed nodes, for reasons including autonomy, scalability, availability, performance and security. This suggests the use of Peer-to-Peer (P2P) query technology. Consequently, we take the first steps towards unifying the fields of database management systems and P2P computing. As a result, we propose the WSDA based Unified Peer-to-Peer Database Framework (UPDF) and its associated Peer Database Protocol (PDP), which are unified in the sense that they allow to express specific applications for a wide range of data types (typed or untyped XML, any MIME type), node topologies (e.g. ring, tree, graph), query languages (e.g. XQuery, SQL), query response modes (e.g. Routed, Direct and Referral Response), neighbor selection policies, pipelining, timeout and other scope characteristics. The uniformity and wide applicability of our approach is distinguished from related work, which (1) addresses some but not all problems, and (2) does not propose a unified framework. iii iv CONTENTS Zusammenfassung In einem mehrere Organisationen überspannenden, großen verteilten System, wie z.B. einem Grid, ist es wünschenswert dynamische und zeitsensitive Information über Netzwerkdienste, Ressourcen und Benutzer zu verwalten und abzufragen. Das Konzept der Webdienste ver- spricht flexible Programme die zur Laufzeit Internet Datenbanken (Registries) benutzen um Informationen und Netzwerkdienste von Drittanbietern zu finden. Dienste könnensich und verwandte Metadaten durch derartige Datenbanken anbieten und so das Zusammenfügenvon höherenverteilten Komponenten ermöglichen. Diese Dissertation unterstütztdiese Vision indem sie zeigt, wie ausdrucksstarke Mehrzweckabfragen ueber eine Sicht formuliert werden können,die autonome dynamische Datenbankknoten von beliebigen Topologien integriert. Wir motivieren und rechtfertigen die Behauptung, daß das Finden von Ressourcen und Di- ensten eine reiche Mehrzweckabfragesprache wie z.B. XQuery oder SQL verlangt. Wir führen die sogenannte Web Service Discovery Architecture (WSDA) ein, die disparate Konzepte, Schnittstellen und Netzwerkprotokolle unter einem quasi-transparenten Dach zusammen- faßt. WSDA spezifiziert eine kleine Menge von orthogonalen Mehrzweckfunktionen (Bau- steinen) zum Finden von Diensten. Diese decken die Bereiche der Dienstidentifizierung, Dienstbeschreibung, Datenpublikation sowie minimale und mächtige Abfrageunterstützung ab. Clients und Server könnendiese Funktionen so kombinieren daß dabei eine breite Palette von Verhalten und Synergien ensteht. Basierend auf WSDA führenwir eine zentrale Daten- bank fürdas Finden von dynamischen verteilten Daten ein, die Hyper Registry. Diese un- terstütztXQueries über dynamische, zeitsensitive Daten, die von unzuverlässigen,sich häufig ändernden,autonomen und heterogenen Datenquellen stammen. In einem mehrere Organisationen überspannenden, großen verteilten System jedoch sind die Datentupel über viele Knoten verteilt, z.B. aus Gründender Autonomie, Skalierbarkeit, Verfügbarkeit, Effizienz und Sicherheit. Daher empfiehlt sich die Benützungvon Peer-to-Peer (P2P) Abfragetechnologie. So unternehmen wir die ersten Schritte zur Vereinheitlichung von Datenbank Management Systemen und P2P Computing. Auf WSDA basierend schlagen wir das Unified Peer-to-Peer Database Framework (UPDF) und korrespondierende Peer Database Protocol (PDP) vor. Beide sind vereinheitlicht in dem Sinn daß innerhalb ihreres Rah- mens spezifische Applikationen füreine Vielfalt von Datentypen, Knotentopologien, Abfrage- sprachen, Antwort-Modi, und verschiedene Formen der Nachbarschaftsauswahl, des Pipelin- ing, und von Timeouts formuliert werden können. Die Einheitlichkeit, breite Einsatzfähigkeit und Wiederverwendbarkeit unseres Ansatzes unterscheidet sich von verwandten Arbeiten die (1) einzelne aber nicht alle Probleme behan- deln, und (2) keinen einheitlichen Rahmen einführen. v vi CONTENTS Chapter 1 Introduction 1.1 Motivation This thesis tackles the problems of information, resource and service discovery arising in large distributed Internet systems spanning multiple administrative domains. We show how to support expressive general-purpose queries over a view that integrates autonomous dynamic database nodes from a wide range of distributed system topologies. The work was carried out in the context of the European DataGrid project (EDG) [1, 2, 3] at CERN, the European Organization for Nuclear Research, and supported by the Austrian Ministerium fürWissenschaft, Bildung und Kultur. The international High Energy Physics (HEP) research community is facing a substantial challenge in joining a massive set of loosely coupled people and resources from multiple distributed organizations. Although this is the driving use case of the EDG project, this thesis distills and generalizes the essential properties of the discovery problem and then develops generic solutions that apply to a wide range of large distributed

Load more