XML Transformations, Views and Updates Based on Xquery Fragments

Faculteit Wetenschappen Informatica XML Transformations, Views and Updates based on XQuery Fragments Proefschrift voorgelegd tot het behalen van de graad van doctor in de wetenschappen aan de Universiteit Antwerpen, te verdedigen door Roel VERCAMMEN Promotor: Prof. Dr. Jan Paredaens Antwerpen, 2008 Co-promotor: Dr. Ir. Jan Hidders XML Transformations, Views and Updates based on XQuery Fragments Roel Vercammen Universiteit Antwerpen, 2008 http://www.universiteitantwerpen.be Permission to make digital or hard copies of portions of this work for personal or classroom use is granted, provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice. Copyrights for components of this work owned by others than the author must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission of the author. Research funded by a Ph.D. grant of the Institute for the Promotion of Innovation through Science and Technology in Flan- ders (IWT-Vlaanderen). { Onderzoek gefinancierd met een specialisatiebeurs van het Instituut voor de Aanmoediging van Innovatie door Wetenschap en Technologie in Vlaanderen (IWT-Vlaanderen). Grant number / Beurs nummer: 33581. http://www.iwt.be Typesetting by LATEX Acknowledgements This thesis is the result of the contributions of many friends and colleagues to whom I would like to say \thank you". First and foremost, I want to thank my advisor Jan Paredaens, who gave me the opportunity to become a researcher and teached me how good research should be performed. I had the honor to write several papers in collaboration with him and will always remember the discussions and his interesting views on research, politics and gastronomy. I am greatly indebted to my co-advisor, Jan Hidders, who helped me assessing what good research questions are and whose expertise helped me a lot. Moreover, I will also remember him for non-scientific reasons, such as the badminton games and his fascination and broad knowledge of history. Philippe Michiels deserves a special thank you for the many vivid discussions we had and the interesting research questions on XQuery optimization he came up with and that we tried to tackle together. Even though the research we did together is not reflected in this thesis, it broadened my view on the field of database research and he brought me in touch with many great researchers, such as Mary Fernández,JérômeSiméonand Maurice van Keulen. I am also very thankful to the rest of the ADReM team for giving me a nice stimulating environment the past four years. I want to acknowledge Toon Calders, Serge Demeyer, Bart Goethals and Maarten Marx, members of my PhD Jury, for giving feed-back on this thesis, which helped to improve this thesis. I also want to thank Stefania Marrara, who was in Antwerp for six months and with whom I had the honor to collaborate on parts of the research that resulted in Chapter 3. My family deserves my gratitude as well, for the continuous support and encouragement they gave me and the nice and fun environment which I am privileged to call home. Last but not least, I want to say thank you to the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT) for making it possible for me to do the research that resulted in this thesis. Not only their financial support, but also the frequent reporting and the interesting project interviews have helped me a lot into the direction of this thesis. 3 4 Abstract During the past decade, XML has become ubiquitous when it comes to sharing data and publishing information on the Web. Because of the ever increasing amount of data available in XML format and the data model which differs from previous work in database research, this new standard has drawn the attention of the database community. Several query and updating languages have been proposed, which eventually led to the W3C recommendation for XQuery 1.0 and update facilities for this query language. We define a formal framework called LiXQuery to study and investigate properties of this language and its constructs. More precisely, we compare the relative expressive power of some of the most important features of XQuery in terms of performing transformations. Data integration and sharing is an interesting problem which can benefit from XML research. XML views can be defined to transform data that several parties want to share to a common schema. It is desirable that such virtual XML trees behave as much as possible as ordinary XML documents. We introduce the concept of projection views based on XPath expressions and show how we can translate XPath queries on the view documents to XPath queries on the base document. Moreover, we introduce three simple XPath- based update operations, i.e., insertion, attribute update and deletion, and show that it is decidable whether propagating the primitive update operations to the base tree introduces view side-effects or not. 5 6 Contents 1 Introduction 13 1.1 XML . 13 1.2 Querying XML Documents . 15 1.3 Updating XML Documents . 18 1.4 Updating XML Views . 20 1.4.1 XML Views . 20 1.4.2 Views as Access Control Mechanism . 21 1.4.3 Updating Views . 23 1.4.4 Existing Approaches in RDBMS and OODBMS . 24 1.4.5 Existing Work on XML View Updates . 25 1.5 Contributions and Organization . 26 2 A Formal Model for XQuery 29 2.1 Syntax and Informal Semantics . 29 2.1.1 Non-Updating Expressions . 30 2.1.2 Updating Expressions . 33 2.2 Some Examples and Syntactic Sugar . 35 2.3 Formal Framework . 39 2.3.1 XML Store . 40 2.3.2 Evaluation Environment . 42 2.3.3 Auxiliary Notions . 42 2.3.4 List of Pending Updates . 44 2.3.5 Program Semantics . 48 2.4 Expression Semantics . 49 2.5 Conclusion . 56 7 8 CONTENTS 3 Expressing Transformations in XQuery 59 3.1 LiXQuery Fragments . 59 3.2 Expressiveness Relationships between Fragments . 61 3.3 Properties of the Fragments . 63 3.3.1 Reachable Substores from Input/Output . 63 3.3.2 Set Equivalence and Bag Equivalence . 64 3.3.3 Relationships between Input and Output Values and Length . 66 3.3.4 Depth of Transformations . 71 3.3.5 Termination of Programs . 72 3.4 Expressibility Results . 73 3.4.1 Expression Simulations . 73 3.4.2 Program Simulations . 77 3.5 Proving the Relationships between the Fragments . 87 3.6 Conclusion . 88 4 Views and Updates with XPath Transformations 91 4.1 Introduction . 91 4.2 Preliminaries . 93 4.2.1 Data Model . 94 4.2.2 View-Update Problem . 95 4.3 Queries, Updates and Views . 95 4.3.1 Queries . 96 4.3.2 Properties . 97 4.3.3 Updates . 98 4.3.4 Views . 99 4.3.5 Additional Notations . 100 4.4 Simple Propagation Update Strategy . 100 4.4.1 Adding an Escape to Path Expressions . 101 4.4.2 View Composition . 102 4.5 Deciding Well-Behavedness . 103 4.5.1 Configurations . 103 4.5.2 Configuration Trees . 107 4.5.3 Checking Configuration Trees . 109 4.5.4 Checking for Conflicts in Configuration Trees . 112 4.5.5 Complexity of Deciding Well-Behavedness . 112 4.6 Conclusion . 114 5 Discussion 117 5.1 LiXQuery as a Framework . 117 5.2 Studying the Expressive Power of LiXQuery . 118 5.2.1 Separating Queries and Updates . 118 5.2.2 Other Measures of Expressive Power . 119 5.3 Updating XML Views . 121 CONTENTS 9 5.3.1 Complexity for Deciding in a Positive Fragment of P ........ 121 5.3.2 Dealing with a Larger Class of Update Strategies . 123 5.4 Updating XML Views with LiXQuery . 125 A Dutch Summary 127 Publications by the Author 129 Bibliography 131 10 CONTENTS List of Figures 1.1 Example of an XML document for a virtual learning environment. 16 1.2 Example of an XML view for the document for Figure 1.1. 22 2.1 Syntax of LiXQuery . 31 2.2 Simulation of the LISP list ((b c) d) .................... 39 2.3 XML tree for the Example of Definition 2.1 . 41 2.4 Semantics of the Primitive Update Operations. 45 3.1 Equivalence classes of XQuery fragments . 62 4.1 General Setting for the XML View-Update Problem . 95 4.2 Example Document Trees for illustrating update operations . 98 4.3 Rewriting P} to P expressions. 102 4.4 Node-creation independent and attribute-minimal configuration. 104 4.5 A Configuration Tree for the Configuration of Figure 4.4 . 108 5.1 XML Document containing Members of an Organization . 124 11 12 LIST OF FIGURES CHAPTER 1 Introduction haring data has been one of the main reasons why the Web has become so popular S the past decade. The HyperText Markup Language (HTML) and graphical browsers have been killer applications that caused a tremendous increase of Internet connections during the nineties. One of the major challenges posed by this evolution is managing all this available information. In this thesis we investigate one of the many facets of integrating and managing the loosely structured data on the Web. This introduction highlights the topics that are studied in this thesis. First, we briefly discuss in Section 1.1 the use of the eXtensible Markup Language (XML) to represent data on the Web. Then, we talk about querying and updating XML documents in Section 1.2 and Section 1.3. In Section 1.4 we introduce the notion of XML Views and point out some of the problems that this raises with respect to performing updates.

Load more