Efficient Distributed Query Processing on Heterogeneous Xquery Engines

XRPC Efficient Distributed Query Processing on Heterogeneous XQuery Engines Zhang Ying XRPC Efficient Distributed Query Processing on Heterogeneous XQuery Engines ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam, op gezag van de Rector Magnificus prof. dr. D. C. van den Boom ten overstaan van een door het college voor promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op donderdag 8 juli 2010, te 12:00 uur door Zhang Ying ( ) geboren te ChongQing, China Promotiecommissie: Promotor: Prof. dr. Martin L. Kersten CoPromotor: Dr. Peter A. Boncz Overige leden: Prof. dr. Torsten Grust Prof. dr. Arjen P. de Vries Prof. dr. Maarten de Rijke Dr. Maarten Marx Faculteit: Faculteit der Natuurwetenschappen, Wiskunde en Informatica Universiteit van Amsterdam The research reported in this thesis was finished at current position of the author at CWI, the Dutch national research laboratory for mathematics and computer science, within the theme Data Mining and Knowledge Discovery, a subdivision of the research cluster Information Sys- tems. SIKS Dissertation Series No. 2010-26. The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. ISBN 978-90-9025263-6 Contents 1 Introduction 1 1.1 Motivation . .1 1.1.1 Interoperability . .2 1.1.2 Extending XQuery with Query Shipping . .3 1.1.3 Efficiency . .4 1.1.4 Stateless versus Stateful . .5 1.2 Research Objective . .6 1.3 Thesis Outline . .7 2 Related Work 11 2.1 P2P Data Management Systems . 11 2.1.1 Extending XQuery with Query-Shipping . 12 2.1.2 Distributed XML Querying in Structured P2P Systems . 13 2.1.3 Distributed XML Querying in Unstructured P2P Systems . 16 2.1.4 PDMSs of Relational Data . 18 2.2 Related Query Processing Techniques . 19 2.2.1 XML Document Filtering . 19 2.2.2 Query Decomposition . 20 2.3 Conclusion . 20 3 The XRPC Language Extension 23 3.1 Design Considerations . 23 3.2 XRPC Syntax . 27 3.3 SOAP XRPC Message Format . 29 3.3.1 XRPC Request Messages . 29 3.3.2 XRPC Response Messages . 30 3.3.3 XRPC Error Message . 31 3.4 XRPC Formal Semantics . 31 3.4.1 Read-Only XRPC Semantics . 32 3.4.2 XRPC Update Semantics . 36 3.5 Loop-lifted Implementation of XRPC . 38 3.5.1 Relational XQuery and Loop-Lifting . 39 3.5.2 Bulk RPC . 40 3.5.3 Performance Evaluation . 43 3.6 Conclusion . 44 i ii CONTENTS 4 Distributed XQuery With XPRC 45 4.1 Introduction . 45 4.2 Cross-System Distributed XQuery . 47 4.3 Distributed XQuery Optimisation . 49 4.4 Deterministic Distributed Updates . 51 4.4.1 Order-Correct Update Tags . 52 4.5 Distributed XRPC Transactions . 55 4.5.1 Heterogeneous Distributed 2PC . 56 4.6 MonetDB/XQuery? ............................... 58 4.6.1 Simple Scenarios . 58 4.6.2 Loose DHT Coupling . 58 4.6.3 Tight DHT Coupling . 60 4.7 Conclusion . 60 5 XQuery Decomposition 63 5.1 Motivation . 63 5.2 Semantic Differences with Pass-By-Value . 65 5.3 XQuery Core Rewrite Framework . 68 5.3.1 XCore Dependency Graph . 69 5.3.2 XRPCExpr Insertion . 71 5.4 Conservative Decomposition . 71 5.4.1 By-Value Insertion Conditions . 71 5.4.2 Interesting Decomposition Points . 72 5.4.3 Normalisation . 74 5.4.4 Distributed Code Motion . 74 5.5 By-Fragment Decomposition . 75 5.6 By-Projection Decomposition . 77 5.6.1 Extending Projected XML . 79 5.6.2 Runtime XML Projection . 81 5.7 Decomposition of XQUF Queries . 84 5.7.1 Distributing Normal XQUF Queries . 84 5.7.2 Updating XCore Queries on Remote Documents . 86 5.8 Evaluation in MonetDB/XQuery . 89 5.8.1 Read-Only Queries . 89 5.8.2 XQUF Queries . 91 5.9 Conclusion . 94 6 Correctness Proof of XQuery Decomposition 95 6.1 Preliminaries . 95 6.1.1 Equality Relationships of Sequences . 96 6.1.2 Equality Relationships of Sequences with Projection . 96 6.1.3 Equality Relationship of Read-Only Queries . 98 6.1.4 Equality Relationship of Updating Queries . 100 6.1.5 Sequence Properties . 102 6.1.6 XPath Steps and distinct-doc-order ................ 103 6.2 Static Properties Analysis . 104 CONTENTS iii 6.2.1 Literal Values . 105 6.2.2 Variables . 105 6.2.3 Sequences . 105 6.2.4 for Expressions . 105 6.2.5 let Expressions . 106 6.2.6 Conditionals . 106 6.2.7 Typeswitch . 106 6.2.8 Value and Node Comparisons . 106 6.2.9 Order Expressions . 107 6.2.10 Node Set Expressions . 107 6.2.11 Constructors . 107 6.2.12 XPath Expressions . 108 6.2.13 Built-in Function Calls . 108 6.2.14 Transform Expressions . 109 6.3 Conservative Correctness Proof . 109 6.4 By-Fragment Correctness Proof . 115 6.5 By-Projection Correctness Proof . 120 6.6 XQUF Correctness Proof . 128 6.7 Code Motion Correctness Proof . 130 7 StreetTiVo: Manage Multimedia Data Using A P2P XDBMS 131 7.1 Motivation . 131 7.2 P2P Data Management . 132 7.3 StreetTiVo Architecture . 133 7.4 Next Steps . 137 7.5 Conclusion . 139 8 Conclusion and Outlook 141 8.1 Research Summary . 141 8.2 Future Work . 144 A XML Schema Definition of the XRPC SOAP Messages 147 B XQuery Implementation Defined and Implementation Dependent Features 149 B.1 XQuery 1.0 and XPath 2.0 Data Model . 149 B.2 XQuery 1.0: An XML Query Language . 150 B.3 XQuery 1.0 and XPath 2.0 Functions and Operators . 153 B.4 XSLT 2.0 and XQuery 1.0 Serialization . 155 B.5 XQuery Update Facility 1.0 . 156 C Static Property Analysis Rules for Built-in Functions 157 C.1 General Rules . 157 C.2 Accessory . 157 C.3 The Error Function . 158 C.4 The Trace Function . 158 C.5 Constructor Functions . 158 iv CONTENTS C.6 Functions and Operators on Numerics . 158 C.7 Functions on Strings . 159 C.8 Functions on anyURI . 160 C.9 Functions and Operators on Boolean Values.

Load more