UNSW >013758187

UNSW >013758187 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet Surname or Family name: Lam First name: Franky Shung Lai Qtlier name/s: Abbreviation for degree as given in the University calendar: PhD School: CSE Faculty: Engineering Title: Optimization Techniques for XML Databases Abstract 350 words maximum: (PLEASE TYPE) In this thesis, we address several fundamental concerns of maintaining and querying huge ordered label trees. We focus on practical implementation issues of storing, updating and query optimisation of XIVIL database management system. Specifically, we address the XML order maintenance problem, efficient evaluation of structural join, intrinsic skew handling of join, succinct storage of XML data and update synchronization of mobile XML data. Declaration relating to disposition of project thesis/dissertation I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968.1 retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstragt^lnternational (this is applicable to doctoral theses only). A / •/• Sighature ^itn^ Date The University recognises that there may be exceptional circumstances requiring^estrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for^longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research. FOR OFFICE USE ONLY Date of completion of requirements for Award: COPYRIGHT STATEMENT 'I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.' C Signed 2 7/3 Date AUTHENTICITY STATEMENT 'I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.' Signed Date THE UNIVERSITY OF NEW SOUTH WALES SCHOOL OF COMPUTER SCIENCE & ENGINEERING OPTIMIZATION TECHNIQUES FOR XML DATABASES Pranky Shung Lai LAM (2288414) PhD in Computer Science and Engineering Supervisor: Dr. Raymond K. Wong 5 5 ? A C li^f 11 AUG 20G- Originality Statement 1 isBRARY I hereby declare that this submission is own work and to the best of my knowledge it contains no materials previously pubhshed or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extend that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged. TT li/ c 7 Abstract In this thesis, we address several fundamental concerns of maintaining and querying huge ordered label trees. We focus on practical implementation issues of storing, updating and query optimization of XML database management system. Specifically, we address the XML order maintenance problem, efficient evaluation of structural join, intrinsic skew handling of join, succinct storage of XML data and update synchronization of mobile XML data. Acknowledgments Gratitude is not only the greatest of virtues, but the parent of all others. — Cicero (106-43 BC) First of all, I would like to express my utmost respect and deepest gratitude toward Dr. Raymond Wong, my supervisor, for his persistent guidance, excellent advice and the inspirations I received that have made this thesis possible. Furthermore, the aspects of having an entrepreneurial mindset, working as part of a research team and as part of the research community that I learned from him are invaluable. I am extremely grateful and appreciative of the co-authors of all my publications, especially Damien Fisher and Wilham Shui, for their numerous constructive dis- cussions and collaborations during these years. Their positive contributions to this thesis are immeasurable. They contributed on the the Order Maintenance chapter. Efficient Structural Join chapter and the Maintaining Succinct XML Data Chap- ter. William's critical contribution included implementation of the related works in the experiment sections; whilst Damien's most important input included significant correction of lingustic expressions throughout the above three chapters, as well as the probablistic formulas on the Order Maintenance chapter. I would also like to pay my sincere gratitude to the reviewers that have spent their precious time to review this thesis and their insightful comments that I received. Last, but definitely not least, I would love to thank my family, Jenny, Sebastian and Chantel, for their unlimited support, patience, understanding and encouragement. n Related Publications 1. Raymond K. Wong, Pranky Lam, William M. Shui. Querying and Main- taining a Compact XML Storage. In Proceedings of International World Wide Web Conference (WWW), Banff, Alberta, Canada, May 08-May 12, 2007. (Acceptance rate: 14%) 2. Damien K. Fisher, Franky Lam, William M. Shui, Raymond K. Wong. Dy- namic Labeling Schemes for Ordered XML Based on Type Information. In Proceedings of l?^'^ Australasian Database Conference (ADC), Hobart, Tas- mania, Australia, Jan 16-Jan 19, 2006. p69~78. 3. WilHam M. Shui, Franky Lam, Damien K. Fisher, Raymond K. Wong. Querying and Maintenance Ordered XML Data Using Relational Databases. In Proceedings of IG^'^ Australasian Database Conference (ADC), Newcastle, Austraha, Jan 31-Feb 03, 2005. p85-94. 4. Wilham M. Shui, Damien K. Fisher, Franky Lam, Raymond K. Wong. Ef- fective Clustering Schemes for XML Databases. In Proceedings of Inter- national Conference of Database and Expert Systems Apphcations (DEXA), Zaragoza, Spain, Aug 30~Sep 03, 2004. p569-579. 5. Damien K. Fisher, Pranky Lam, Raymond K. Wong. Algebraic Transforma- tion and Optimization for XQuery. In Proceedings of the Asian Pacific Web Conference (APWeb), Hangzhou, China, Apr 14-17, 2004. p201-210. 6. Pranky Lam, Wilham M. Shui, Damien K. Fisher, Raymond K. Wong. Skip- ping Strategies for Efficient Structural Joins. In Proceedings of the 9^^ Inter- national Conference on Database Systems for Advanced Applications (DAS- FAA), Jeju Island, Korea, Mar 17-19, 2004. pl96-207. (Acceptance rate: 60/272 = 22%) 7. Michael Barg, Raymond K. Wong, Franky Lam. An Efficient Path Index for Querying Semi-structured Data. In Proceedings of the Asian Pacific Web Conference (APWeb), Xi'an, China, Sep 27-29, 2003. p89-94. (Acceptance rate: 39/136 = 28%) 8. Damien K. Fisher, Franky Lam, Wilham M. Shui, Raymond K. Wong. Effi- cient ordering for XML data. In Proceedings of 12^^ ACM International Con- ference on Information and Knowledge Management (CIKM), New Orleans, Louisiana, USA, Nov 2-8, 2003. p350-357. (Acceptance rate: 59/400 = 15%) 9. Franky Lam, Nicole Lam, Raymond K. Wong. Efficient synchronization for Mobile XML data. In Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), McLean, Virginia, USA, Nov 4-9, 2002. pl53-160 (Acceptance rate: 74/300 = 25%) 10. Franky Lam, Nicole Lam, Raymond K. Wong. Performance Evaluation of XSync: An Efficient Synchronizer for Mobile XML Data. In Proceedings of The IEEE International Conference on Communications Systems (ICCS), Singapore, Nov 25-28, 2002. pl08 (2P-02-07). 11. Franky Lam, Nicole Lam, Raymond K. Wong. Efficient Update Propagations for Semistructured Data in Mobile Environment. In Proceedings of Inter- national Conference on Information Technology and Apphcations (ICITA), Bathurst, NSW, Australia, Nov 25-28 2002. p98-l. 12. Franky Lam, Raymond K. Wong, Mehmet A. Orgun. Modeling and Manip- ulating Multidimensional Data in Semistructured Databases. In Proceedings of the International Conference on Database Systems for Advanced Appli- cations (DASFAA), Hong Kong, China, Apr 18-20, 2001. pl4-21. 13. Raymond K. Wong, Franky Lam, Mehmet A. Orgun. Modeling and Manipu- lating Multidimensional Data in Semistructured Databases. World Wide Web 4(1^2): 79-99 (2001) 14. Raymond K. Wong, Franky Lam, Stephen Graham, Wilham Shui. An XML Repository for Molecular Sequence Data. In Proceedings of IEEE Inter- national Symposium on Bioinformatics and Biomedical Engineering (BIBE), Arlington, Virginia, USA, November 8 10, 2000. IEEE CS. p35 -42. Related Technical Reports 1. Franky Lam, Raymond K. Wong. Rotated Library Sort. Technical Report. UNSW-CSE-TR-0506. University of New South Wales. Mar 2005 2. Franky Lam, Wilham M. Shui, Damien K. Fisher, Raymond K. Wong. Querying and Maintaining Succinct XML Data. Technical Report. UNSW- CSE-TR-0424.

UNSW >013758187

Space-Efficient Data Structures for String Searching and Retrieval

Succinct Permutation Graphs and Graph Isomorphism [5]

Advanced Rank/Select Data Structures: Succinctness, Bounds, and Applications

MTAT.03.238 Succinct Trees

Hypersuccinct Trees – New Universal Tree Source Codes for Optimal Compressed Tree Data Structures and Range Minima

Statistical Encoding of Succinct Data Structures

Lower Bound for Succinct Range Minimum Query

A Review on Succinct Dynamic Data Structure

Arxiv:0812.2775V3

Space-Efficient Data Structures for Collectionsof Textual Data

Storage and Retrieval of Individual Genomes

Succinct Data Structures and Big Data