UNSW

>013758187 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: Lam

First name: Franky Shung Lai Qtlier name/s:

Abbreviation for degree as given in the University calendar: PhD

School: CSE Faculty: Engineering

Title: Optimization Techniques for XML Databases

Abstract 350 words maximum: (PLEASE TYPE)

In this thesis, we address several fundamental concerns of maintaining and querying huge ordered label trees. We focus on practical implementation issues of storing, updating and query optimisation of XIVIL database management system. Specifically, we address the XML order maintenance problem, efficient evaluation of structural join, intrinsic skew handling of join, succinct storage of XML data and update synchronization of mobile XML data.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968.1 retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstragt^lnternational (this is applicable to doctoral theses only). A /

•/• Sighature ^itn^ Date

The University recognises that there may be exceptional circumstances requiring^estrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for^longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award: COPYRIGHT STATEMENT

'I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.' C

Signed 2 7/3

Date

AUTHENTICITY STATEMENT

'I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.'

Signed

Date THE UNIVERSITY OF NEW SOUTH WALES

SCHOOL OF COMPUTER SCIENCE & ENGINEERING

OPTIMIZATION TECHNIQUES FOR XML DATABASES

Pranky Shung Lai LAM (2288414)

PhD in Computer Science and Engineering

Supervisor: Dr. Raymond K. Wong 5 5 ? A C li^f

11 AUG 20G-

Originality Statement 1 isBRARY

I hereby declare that this submission is own work and to the best of my knowl- edge it contains no materials previously pubhshed or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extend that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.

TT li/ c 7 Abstract

In this thesis, we address several fundamental concerns of maintaining and querying huge ordered label trees. We focus on practical implementation issues of storing, up- dating and query optimization of XML database management system. Specifically, we address the XML order maintenance problem, efficient evaluation of structural join, intrinsic skew handling of join, succinct storage of XML data and update syn- chronization of mobile XML data. Acknowledgments

Gratitude is not only the greatest of virtues, but the parent of all others. — Cicero (106-43 BC)

First of all, I would like to express my utmost respect and deepest gratitude toward Dr. Raymond Wong, my supervisor, for his persistent guidance, excellent advice and the inspirations I received that have made this thesis possible. Furthermore, the aspects of having an entrepreneurial mindset, working as part of a research team and as part of the research community that I learned from him are invaluable.

I am extremely grateful and appreciative of the co-authors of all my publications, especially Damien Fisher and Wilham Shui, for their numerous constructive dis- cussions and collaborations during these years. Their positive contributions to this thesis are immeasurable. They contributed on the the Order Maintenance chapter. Efficient Structural Join chapter and the Maintaining Succinct XML Data Chap- ter. William's critical contribution included implementation of the related works in the experiment sections; whilst Damien's most important input included significant correction of lingustic expressions throughout the above three chapters, as well as the probablistic formulas on the Order Maintenance chapter.

I would also like to pay my sincere gratitude to the reviewers that have spent their precious time to review this thesis and their insightful comments that I received.

Last, but definitely not least, I would love to thank my family, Jenny, Sebastian and Chantel, for their unlimited support, patience, understanding and encouragement.

n Related Publications

1. Raymond K. Wong, Pranky Lam, William M. Shui. Querying and Main-

taining a Compact XML Storage. In Proceedings of International World

Wide Web Conference (WWW), Banff, Alberta, Canada, May 08-May 12,

2007. (Acceptance rate: 14%)

2. Damien K. Fisher, Franky Lam, William M. Shui, Raymond K. Wong. Dy-

namic Labeling Schemes for Ordered XML Based on Type Information. In

Proceedings of l?^'^ Australasian Database Conference (ADC), Hobart, Tas-

mania, Australia, Jan 16-Jan 19, 2006. p69~78.

3. WilHam M. Shui, Franky Lam, Damien K. Fisher, Raymond K. Wong.

Querying and Maintenance Ordered XML Data Using Relational Databases.

In Proceedings of IG^'^ Australasian Database Conference (ADC), Newcastle,

Austraha, Jan 31-Feb 03, 2005. p85-94.

4. Wilham M. Shui, Damien K. Fisher, Franky Lam, Raymond K. Wong. Ef-

fective Clustering Schemes for XML Databases. In Proceedings of Inter-

national Conference of Database and Expert Systems Apphcations (DEXA),

Zaragoza, Spain, Aug 30~Sep 03, 2004. p569-579.

5. Damien K. Fisher, Pranky Lam, Raymond K. Wong. Algebraic Transforma-

tion and Optimization for XQuery. In Proceedings of the Asian Pacific

Web Conference (APWeb), Hangzhou, China, Apr 14-17, 2004. p201-210.

6. Pranky Lam, Wilham M. Shui, Damien K. Fisher, Raymond K. Wong. Skip-

ping Strategies for Efficient Structural Joins. In Proceedings of the 9^^ Inter- national Conference on Database Systems for Advanced Applications (DAS-

FAA), Jeju Island, Korea, Mar 17-19, 2004. pl96-207. (Acceptance rate:

60/272 = 22%)

7. Michael Barg, Raymond K. Wong, Franky Lam. An Efficient Path Index for

Querying Semi-structured Data. In Proceedings of the Asian Pacific Web

Conference (APWeb), Xi'an, China, Sep 27-29, 2003. p89-94. (Acceptance

rate: 39/136 = 28%)

8. Damien K. Fisher, Franky Lam, Wilham M. Shui, Raymond K. Wong. Effi-

cient ordering for XML data. In Proceedings of 12^^ ACM International Con-

ference on Information and Knowledge Management (CIKM), New Orleans,

Louisiana, USA, Nov 2-8, 2003. p350-357. (Acceptance rate: 59/400 = 15%)

9. Franky Lam, Nicole Lam, Raymond K. Wong. Efficient synchronization for

Mobile XML data. In Proceedings of ACM International Conference on

Information and Knowledge Management (CIKM), McLean, Virginia, USA,

Nov 4-9, 2002. pl53-160 (Acceptance rate: 74/300 = 25%)

10. Franky Lam, Nicole Lam, Raymond K. Wong. Performance Evaluation of

XSync: An Efficient Synchronizer for Mobile XML Data. In Proceedings of

The IEEE International Conference on Communications Systems (ICCS),

Singapore, Nov 25-28, 2002. pl08 (2P-02-07).

11. Franky Lam, Nicole Lam, Raymond K. Wong. Efficient Update Propagations

for Semistructured Data in Mobile Environment. In Proceedings of Inter-

national Conference on Information Technology and Apphcations (ICITA),

Bathurst, NSW, Australia, Nov 25-28 2002. p98-l.

12. Franky Lam, Raymond K. Wong, Mehmet A. Orgun. Modeling and Manip-

ulating Multidimensional Data in Semistructured Databases. In Proceedings

of the International Conference on Database Systems for Advanced Appli-

cations (DASFAA), Hong Kong, China, Apr 18-20, 2001. pl4-21.

13. Raymond K. Wong, Franky Lam, Mehmet A. Orgun. Modeling and Manipu-

lating Multidimensional Data in Semistructured Databases. World Wide Web 4(1^2): 79-99 (2001)

14. Raymond K. Wong, Franky Lam, Stephen Graham, Wilham Shui. An XML Repository for Molecular Sequence Data. In Proceedings of IEEE Inter- national Symposium on Bioinformatics and Biomedical Engineering (BIBE), Arlington, Virginia, USA, November 8 10, 2000. IEEE CS. p35 -42. Related Technical Reports

1. Franky Lam, Raymond K. Wong. Rotated Library Sort. Technical Report.

UNSW-CSE-TR-0506. University of New South Wales. Mar 2005

2. Franky Lam, Wilham M. Shui, Damien K. Fisher, Raymond K. Wong.

Querying and Maintaining Succinct XML Data. Technical Report. UNSW-

CSE-TR-0424. University of New South Wales. Jul 2004

3. Franky Lam, Wilham M. Shui, Damien K. Fisher, Raymond K. Wong. Skip-

ping Strategies for Efficient Structural Joins. Technical Report. UNSW-CSE-

TR-0320. University of New South Wales. Jun 2003

4. Damien K. Fisher, Franky Lam, Wilham M. Shui, Raymond K. Wong. Fast

Ordering for Changing XML Data. Technical Report. UNSW-CSE-TR-0317.

University of New South Wales. Jun 2003

5. Damien K. Fisher, Franky Lam, Wilham M. Shui, Raymond K. Wong. Ef-

ficient Ordering for XML Data. Technical Report. UNSW-CSE-TR-0316.

University of New South Wales. Jun 2003

6. Damien K. Fisher, Wilham M. Shui, Franky Lam, Raymond K. Wong. On

Clustering Schemes for XML Databases. Technical Report. UNSW-CSE-TR-

0315. University of New South Wales. Jun 2003

7. Franky Lam, Nicole Lam, Raymond K. Wong. Update Synchronization for

Mobile XML Data. Technical Report. UNSW-CSE-TR-0310. University of

New South Wales. Jun 2003 Contents

1 Introduction 1

2 Order Maintenance 5

2.1 Introduction 5

2.2 Related Work 9

2.2.1 The Order Maintenance Problem 9

2.2.2 Ancestor-Descendant Relationships 10

2.3 Formal Definitions 14

2.3.1 Data Model 14

2.3.2 Naive Sorting Algorithms 15

2.4 Naive Approach 16

2.4.1 Basicldea 17

2.4.2 Comparing document order between two nodes 19

2.4.3 Refactoring 20 2.5 Bender's Algorithm 21

2.6 Randomized Algorithm 23

2.7 Performance Evaluation 27

2.7.1 Bulk Insertion and Random Insertion 27

2.7.2 Uniform Query Distribution 30

2.7.3 Non-Uniform Query Distribution 32

2.7.4 Adversary Insertion Sequence 32

2.8 Applications 34

2.8.1 Ancestor-Descendant Relationships 34

2.8.2 Query Optimization 35

2.9 Conclusions 36

3 Efficient Structural Joins 37

3.1 Introduction 37

3.2 Related Work 39

3.2.1 Structural Joins 40

3.2.2 Numbering Schemes 41

3.3 Skip Joins 42

3.3.1 Skip-Join for Ancestor-Descendant Join 44

3.3.2 Skip-Join for Ancestor Structural Join 47 3.3.3 Skip-Join for Descendant Structural Join 48

3.3.4 Skipping Strategies 49

3.3.5 Skipping For Streaming Data 52

3.4 Experimental Results 53

3.4.1 Experimental Setup 53

3.4.2 Results and Observations 54

3.4.3 Summary 59

3.5 Conclusions 60

4 Intrinsic Skew Handling in Sort-Merge Join 62

4.1 Introduction 62

4.2 Formal Definitions 65

4.2.1 Intrinsic Skew 65

4.3 Sort-Merge Joins 67

4.3.1 Traditional Block-based Sort-Merge Join 67

4.3.2 Sort-Merge Join with Combined Skew Handling 68

4.4 Improvements

4.4.1 Localized Casterian Product 70

4.4.2 Rocking-Scan Within Value Packets 71

4.4.3 Shifting Buffer Offset 72 4.4.4 Heuristic for Significant Skew 73

4.5 Skipping Join Candidates 73

4.5.1 Current Commercial Database System Approach 74

4.5.2 Exponential-Then-Binary Skipping 74

4.5.3 Check Last Tuple Before Reading Next Block 74

4.5.4 Aggressive and Conservative Strategy of Skipping Blocks ... 75

4.5.5 Avoid Disk Penalty on Aggressive Skipping Strategy 76

4.6 Merge Phrase With Multiple Runs 76

4.7 Performance Evaluation 76

4.7.1 Combined Skew 77

4.7.2 No Skew 78

4.8 Conclusions 78

5 Maintaining Succinct XML Data 80

5.1 Introduction 80

5.2 Related Work 84

5.3 Data Storage 86

5.3.1 Representation of Topology 87

5.3.2 Representation of Elements and Attributes 89

5.3.3 Representation of Text Data 90 5.3.4 Navigational Operations 92

5.4 Handling Updates 93

5.4.1 Empty Space and Density Thresholds 93

5.4.2 Space and Time Cost 96

5.5 Optimizations 97

5.5.1 Auxiliary 97

5.5.2 Using Auxihary Structures 99

5.5.3 Space Cost 101

5.5.4 Theoretically Fast Navigation 102

5.5.5 Persistent Identifiers and Indexes 103

5.5.6 Querying and Indexing the Database 104

5.6 Performance Evaluation 104

5.6.1 Physical Storage Size of Data 106

5.6.2 Update Performance 109

5.6.3 Node Navigation Ill

5.6.4 Path Evaluation 112

5.7 Conclusions 113

5.8 Acknowledgments 114

6 Synchronization for Mobile XML Data 115 6.1 Introduction 116

6.2 Background 118

6.2.1 An XML-based Mobile Database Architecture 118

6.2.2 XPath as an Access Language 119

6.3 Related Work 120

6.3.1 The XPath Query Containment Problem 120

6.3.2 The XPath Filtering Problem 121

6.4 Overview of Solution 122

6.4.1 Transactions from Other Computers 124

6.5 Data Structure and Algorithms 128

6.5.1 Merging Simple Path Expressions 129

6.5.2 Handhng Wildcard/Descendant Operators 133

6.5.3 Synchronization Engine 135

6.6 Enhancements of Synchronization 136

6.6.1 Update Merging 138

6.7 Performance Evaluation 146

6.7.1 Settings 146

6.7.2 Experimental Results 148

6.8 Conclusions 155 7 Conclusions 156

Xlll Chapter 1

Introduction

A journey of a thousand miles begins with a single step. — Lao Tzu (r^600 BC)

XML [15], which stands for eXtended Markup Language, provides a natural way to represent such hierarchical information. It has emerged as a standard for information representation and exchange on the Internet. However, finding efficient methods to manage and query large XML documents is still problematic and poses many interesting challenges to the database research community.

Relational database management system, or RDBMS for short, require information to be normalized into tabular form with fixed data type. Only a small subset of information does fit into the above criteria. Recently, the popularity of Internet caused an upsurging need to manage hierarchical and semistructured data that goes beyonds what RDBMS can currently offer.

The core difference that sets XML apart from relational data is that XML is or- dered — two subtrees with identical data are distinguishable by their relative order; whereis in entity-relational model, tuples with identical values are considered equiv- alent. Such ordering property is essential, as any storage scheme must maintain the relative order between nodes, either imphcitly or explicity, in order to recon- sturct the original tree. We define a mapping reduction of the ordering problem to a known elegant theoretical problem, such mapping allows efficient maintenance of ordering information for any explicit order keeping tree shredding storage scheme.

We also offer a randomized algorithm that shows good practical performance. The

identical problem can be extened to solve the ancestor query problem that structural join relies upon.

The ancestor query problem is to efficiently determine whether one tree node is

an proper ancestor of the other. Structural join problem is a fundamental query

operation that answers the ancestor query problem between a list of potential an-

cestors and a list of potential descendants. We offer an elegant solution to solve

the structual join problem implicitly, without using any extra indices. Our solution

also directly applies and improves the relational sort-merge join operation, espe-

cially when intrinsic skew occurs between the two relations. Furthermore, we utihze

the data skew itself to skip candidates from the sorted relations during the merge

phrase, which enables us to break the minimum 0{\L\ + \R\) read barrier on the

merge phrase.

As the size of an XML database grows, the amount of space used for storing data and

auxihary supporting data structures becomes a major factor in query and update

performance. We present a new secondary storage scheme for XML data that sup-

ports all navigational operations and answers ancestor-descendant queries in near

constant time. In addition to supporting efficient queries, the space requirement of

our scheme is within a constant factor of the information theoretic minimum. In-

sertions and deletions can also be performed in near constant time. As a result, the

proposed structure features a small memory footprint that increases cache locality,

whilst still supporting standard APIs, such as DOM, efficiently. As an example

of the scheme's power, we further demonstrate that the structure can support ef-

ficient structural and twig joins. Both formal analysis and experimental evidence

demonstrate that the proposed structure is space and time efficient.

Nowadays, many hand-held applications receive data from a primary database server

and operate in an intermittently connected environment. They maintain data con- sistency with data sources through synchronization. In certain applications such as sales force automation, it is highly desirable for updates on the data source to be reflected on hand-held apphcations immediately. We propose an efficient method to synchronize XML data on multiple mobile devices. Each device retrieves and caches a local copy of data from the database source based on a regular path ex- pression. These local copies may be overlapping or disjoint with each other. An efficient mechanism is proposed to find all the disjoint copies to avoid unnecessary synchronizations. Each update to the data source results in the identification of all hand-held applications which are affected by the update. Communication costs can be further reduced by eliminating the forwarding of unnecessary operations to groups of mobile cHents.

This thesis addresses several essential and fundamental problems of querying and maintaining XML data — maintenance of document ordering, efficient evaluation of structural join, intrinsic skew handhng on sort-merge join, succinct storage and update synchronization.

• We first address the issue of maintenance of document order for dynamic XML databases in Chapter 2 and Several query languages that return their results in document order, such as XQuery, have been proposed. However, most recent efforts that have focused on query optimization have disregarded order. We present a theoretical optimal approach and a simple yet elegant randomized method to maintain document ordering for XML data in practice. Analysis of our method shows that it is indeed efficient and scalable, even for changing data.

• We answer the ancestor query problem by extending order labeling information to have a constant operator to determine the ancestor-descendant relationship between two nodes. In Chapter 3, we propose different structural join oper- ations and the possible optimizations applicable to those different scenarios, without using any external, pre-built index structures.

• In Chapter 4, we further extend our work on structural join for sort-merge join operations on relational databases systems. When structural join is performed on relational systems, high intrinsic skew is common. We investigate the performance penalty of sort-merge join operations under intrinsic skew and propose several improvements on sort-merge join. These improvements are not structural join specific, and thus are beneficial to sort-merge join in general.

In Chapter 5 — the major work of this thesis — we develop techniques to minimize the space overhead due to the verbosity of XML. We purpose a struc- ture that stores XML in asymptotically optimal space bound, while keeping theoretically fast ordering determination, path navigations and updates even under all types of adverse conditions. Besides the efficient theoretical bound, we show that our structure also works well in practice through experiments. This efficiency is possible because the entire topological information can be held in main memory or even within the cache, which is magnitudes smaller than other existing storage schemes.

The final chapter, Chapter 6, proposes a mechanism to minimize the com- munication costs, while maintaining data consistency, across multiple devices, each holding a partial local copy of a primary XML database server. Chapter 2

Order Maintenance

Order is the shape upon which beauty depends. — Pearl Buck (1892-1973)

2.1 Introduction

XML has arised to be the standard for representing and exchanging both structured and semi-structurerd information over the Internet. XML Data are encapulated within an hierarchial classificiation, where the classificiation embedded within the format, which make XML self describing and capable to represent semi-structured data [3]. However, querying XML is different to querying semi-structured data 27, 39], as the most notable distinction is the order dependence [42]. The standard XML query languages, such as XPath 2.0 [85] and XQuery [86], require query result to be sorted by document order by default. Researchers [27, 39] undertaked the ordering issue on the data model level and query language level when adapting their work from semi-structurered data to XML. However, recent work from optimizing XML queries [67] to publishing data in XML [34] have failed to address the issue of efficiently maintaining results in document order.

Two possible approaches can preserve query results in document order. First ap- nl:

n2:

n4: n5: n7: /\ /\ /\ n8: n9: nlO: nll: nl2: nl3: nl4: nl5: • I nl6:<"XYZ"> nl8:<"ABC"> n20: n22: /f n24: > n26: nl7:<500> nl9:<90> n21: n23: n25:

n27:<"ABC"> n29:<120> n31:<110> n33:<"ABC"> n28:<80> n30:<"ABC"> n32:<"XYZ">

Figure 2.1: A small XML database proach requires all query operators in each step of query processing to preserve document order. Second approach requires an efficient sort operator that sorts a set of nodes back into document order. Certain query operators preserve document order by their nature, but most indexing methods, such as hash table and B-Tree, cannot maintain document order. Using the former approach alone greatly hmits the number of indices; whilst using an efficient sort operator, previous research re- sults for unordered, semistructuered data query optimization such as [67], can be directly apply upon. In fact query optimizer can and should mix both approaches to generate more query plans.

Given an XML database shown in Figure 2.1 as an example, where the labels of the internal nodes represents the element name and the leaf node represents the textual data, with both prefix with its unique identifier oid. Imagine an user want to look for all hard drives that cost less than $200 which manufactured by "ABC", such query can be done with the following XPath statement:

//Harddisks/Item[Price < 200] [Brand = "ABC"]

Assume we relax the restriction of output must be in document order, we could employ indices and optimization techniques from the Lore semistructured database system [66, 67]. If there is a Tindex (hash index on string values) and Vindex (B+tree index on numeric values) built on top, two possible optimal query plans might includes the following:

1. hash("ABC")/parent:: Brand/parent : : Item[parent ::Harddisks][Price <

200]

2. bptree(<, 200)/parent : : Price/parent : : Item[parent : :Harddisks][Brand

= "ABC"]

B+tree implementations achieve its search time by maintaining the pointers to the nodes based on the nodes values, the pointers are sorted by the numerical order of node values they pointed to. As a result, function bptree (<, 200) from second query plan on Figure 2.1 would return nodes based on their numeric values (i.e., (^28, ^19, ^29)), instead of their original document order (i.e., (nig, n28, ^29, ^si))- The full evaluation of second query plan may yield (ni3,ni2) instead of (ni2,ni3). Under standard XML query languages such as XPath 2.0 and XQuery 1.0, which have hst semantics, the second query plan would obtain an incorrect result.

Using naive ordering algorithm that is based on node traversal, the worst case time complexity of such algorithm to compare the document order for two nodes of an XML document is 0(n), where n is the number of nodes in the document. Further optimization can be done when sorting a set of nodes into document order, but the complexity is still bounded superlinear or higher, in term of n, rather than in terms of the size of a set. Hence, query plans that utilizing indices and perform a final sort may have worst performance than query using top-down traversal, which is totally unacceptable.

If we improve the time complexity of ordering comparison from Unear to constant time, that makes efficient sorting possible on any intermediate steps of any query plans, which greatly increases the number of efficient possible query plans. An extra benefit of having a deterministic and consistant time complexity of a sort operator is that cost based query optimizer can estimate the sorting cost much more accurately. Therefore, we present empirical results for our algorithm in this chapter in order to facilitate such an estimation.

This chapter provides the following primary contributions:

• We define a mapping reduction of the problem — document order testing of two nodes in a dynamic XML document, to a well known theoretical problem called order maintenance problem.

• We present a randomized algorithm, that is very simple to implement, it does not guarantee worst case performance but shown to have excellent practical performance by experimental result.

• We relate the problem of solving the order maintenance problem to ancestor query problem, which is the problem of determining the ancestor-descendant relationship. Ancestry testing is a primitive operator of structural join.

• We demonstrate that it is possible to amend other implicit or exphcit labeling schemes that captured ordering information, by apply either our randomized algorithm or Bender et all [11]'s approach, with the effect of decreased the update cost from hnear time expected down to amortized polylogarithmic.

The rest of this chapter is organized as follows:

Section 2.2 summarizes relevant work related to document ordering. Section 2.3 lay- out the formal definitions used throughout this chapter as well as illustrate the naive ordering determination approach. Section 2.5 describes the Bender's algorithm that have nice theoretical properties. We then shown our secondr randomized algorithm that have good practical performance in Section 2.6, where such performance are backed by our empirical tests under Section 2.7. Section 2.8 discuss how our al- gorithm appUable to XML query optimization. Finally Section 2.9 concludes this chapter and illusrate how to apply the result to the ancestor query problem and enable efficient structual join. 2.2 Related Work

2.2.1 The Order Maintenance Problem

Definition The order maintenance problem [33] is the problem of maintaining a total order, subject to the following operations:

• LNSERT(a;, y): Insert record y after record x in the total order.

• DELETE(a:): Delete record x from the total order.

• ORDER(X,^): Return true if x precedes y in the total order.

It is obvious that it is possible to reduce the document ordering problem on XML to the ordering maintenance problem. We can store the nodes of an XML document as a linked list according to their order under preorder traversal. For example, to insert node y, we preform a PREORDER-PREVIOUS access function to retrieve x and call LNSERT(A:,Y). Therefore by solving the document ordering problem, we have essentially solved the document ordering problem on XML. If it is impossible to store nodes of an XML document on that order, we can use the linked hst as an external index using an extra level of indirection — by maintaining the persistent identifier in the linked list.

Definition The online list labeling problem [32] is the problem of maintaining a mapping from a dynamic set of n records to the integers in the range from 1 to the universe with size u. The file maintenance problem [89] has a similar definition but with u = 0{n).

It is obvious that the online list labeling problem is a special case of order mainte- nance problem. It is possible to answer ORDER in 0(1) by applying the online hst labeling problem because integer comparison only takes constant time. In a classic paper [31], Dietz and Sleator proved constant worst case time bounds for the order maintenance problem, building on previous results [33]. It constructed an amortized 0(1) update cost by using an online list labeling solution that guarantees an amor- tized O(logn) update cost. A substantially more elegant formulation of this result was recently obtained by Bender et al [11]. It is obvious that the maintenance of the document order for an XML document corresponds to the order maintenance problem, and hence this result gives the best possible theoretical bounds for our problem.

However, the 0(1) constant time algorithm presented in [31] is complicated. More- over, in database applications, even constant worst case performance can be unsat- isfactory, due to excessive disk I/O. Additionally, there is an upper bound on the size of the database for which the results hold. [11] estimates this upper bound at approximately 430,000 elements for a particular parameter selection. Hence, this al- gorithm is only an incomplete answer to the question of document ordering in large databases, where the number of nodes can easily run into the milhons. However, we do examine in detail the amortized time algorithms of [11] in Section 2.5.

2.2.2 Ancestor-Descendant Relationships

XML researchers mainly divided into two separate groups, one group (IR, informa- tion retrieveal) treats XML data mainly as text document; whilst the other group (DB, Database) treat XML data as a series of discrete data, similar to relational tu- ples in relational database management system, also related to semistructured data research such as [3]. Query languages of the former group are order-aware and they focus on semantics, whilst the later group are focused mainly on unordered data and its optimization. We focused on fusing two groups together, doing optimization whilst being order-aware, as ordered data are increasing gaining importance due to the popularity of the internet.

We consider the ordered labeling scheme of each node in the XML document as the closest related work to the order maintenance problem we address in the pre- vious section. Tatarinov et al [82] consider three different labehng schemes: global ordering, local ordering and Dewey ordering. Global ordering assigns a monotonia increasing integer to all nodes based on their document order. The local ordering also assigns an integer z to a node to denote such node is the z-th child node of its relative parent node. Dewey ordering assigns a dotted decmial system similar to the Dewey decimal system [18]. Each node stores the concentration of all the local ordering number or its ancestors and its own. Global ordering scheme provides constant time order determination, local ordering and Dewey ordering scheme have theoretical run time of the maximum tree depth for order determination, which can be hnear time on worst case, but fast in practice. For all cases, Tatarinov et al only illustrates the worst case and naive sceanrio for relabeling during updates. Our work, however, shows how it is possible to directly applied to update all the above three schemes and make them more efficient.

A closely related problem to order maintenance is ancestor query problem. In essence, solving ancestor query problem efficiently allows efficient answer to whether one node is an ancestor of the other. One can derives the solution of the ancestory query problem when he was given an solution of the order maintenance problem. However, the opposite does not always apply as an ancestor labeling scheme does not need to keep sibling information. There are numerous related research to address the ancestor query problem in the literature [4, 23, 26, 49, 52, 56, 58, 60, 93, 96], but most does not address the order maintenance problem, and let alone focused on the update performance.

58] solved the ancestor query problem by viewing an XML document as a complete A:-ary tree, where k is maximum number of children out of all nodes of the tree. For nodes with less than k arity, virtual children nodes are used to complete the tree. Every node, including virtual nodes, associate with a label assigned according to their level-order traversal order. With the above labeling, ancestor query can be answered. Two immediately shortcomings can be observed. First shortcoming is that most data does not form nicely to a complete /c-ary tree, huge portion of the tree are made up of virtual nodes, means most of the space within the label are wasted space when the label could be shorter. In worst case, a n nodes tree with n/2 depth and one node with n/2 childs would requires virtual nodes, so the label need \gn? space instead of just Ign. The second shortcoming is more severe as k changes under insertion of deletion of nodes. All labels must be discarded and recalculated from scratch, thus such approach only works for static or small document [60 .

There are several related research purpose using the position and depth of tree node to index each node. For example, Zhang et al [97] associate each internal node with a integer pair: the beginning position and end position, which each analogous to first visit and last visit of such node under pre-order traversal. [60] propose similar technique based on the containment properties using extended preorder position and tree depth.

Instead of using integers as to hold the beginning position and end position, the TIMBER database [49] (and similarly, approach by [9]) holds the positions using real numbers instead. However, this approach offers no advantage than using integer if both approaches use logn bits to represent. As they offer the same gap size thus having the same bounds, so frequent insertion within a close locality will still assert poor performance.

In [56], a labeling scheme is used, such that the label of a node's ancestor is a prefix of the node's label. The idea is similar to the Dewey encoding [18] that can be used to check parent-child relationships easily. Using this method requires variable space to store identifiers. The time required to determine the ancestor-descendant relationship is no longer constant, but linear in the length of the identifier — this makes it difficult to guarantee that such an index will be practically useful on large databases.

The performance and results of these approaches based on labeling schemes are consistent with the theoretical properties of labehng dynamic XML trees presented by [23], which proved that any general tree labeling scheme which answers the ancestor-descendant question must, in the worst case, have identifiers linear in the size of the database. This theoretical limitation is not relevant to this chapter, because it assumes that once a label is assigned it is never changed, whereas our work may change the value of the label frequently.

Deschler et al [26] have recently devised a modified Dewey ordering scheme, which uses strings instead of numbers as values. The use of strings allows one to insert new nodes anywhere in the database, without having to relabel any other nodes. Unfortunately, this scheme suffers from a lower bound on the label length which is hnear in the number of nodes in the database. This bound is impossible to circumvent in schemes which do not relabel other nodes, as shown by Cohen et al 23]. As our work does allow nodes to be relabeled, it is not affected by Cohen's result.

A novel work, by Wu et al [93], utilizes properties of prime numbers to provide an efficient ordered labeling scheme. In particular, they use the Chinese Remainder Theorem to find a mapping between the prime number labels of the nodes in the database, and their relative ordering. While the scheme is interesting, and only relabels nodes infrequently, the identifiers used are substantially larger in practice than for region algebra schemes. Also, their scheme involves an indirection through a potentially large array in order to answer queries, which is an expensive bottleneck for large databases. The most recent and promising work on XML order maintenance is by Silberstein et al [80], who proposed a data structure to handle ordered XML which guarantees both update and lookup costs.

Other work has been done in addressing or utihzing order information from schema or type information. [61] proposed a technique to specify and optimize queries on ordered semistructured data using automata. It uses automata to present the queries and optimize the query using query typing and automata unnesting. On the other hand, in response to the ordering issue addressed in [27, 39], [42] extended dataguides [41] and proximity search [40] to take order into consideration. 2.3 Formal Definitions

2.3.1 Data Model

We consider XML document as a rooted, ordered, ranked, labeled, dynamic tree data structure T with n nodes, where each internal node u has a label drawn from the alphabet set E which |E| n, and each leaf node v has an integer label drawn from an unbounded universe. Tree T may consists of arbitrary degree and arbitrary shape. Siblings are ranked by left-to-right order.

We denote the document ordering by the operator <, which is the total ordering defined by the preorder traveral of the docment [86 .

As the document ordering between attribute nodes of an element is implementation defined, for our purposes, we will simply choose an arbitrary ordering amongst the attributes inour ordered tree representation. Figure 2.1 gives the tree representation of a sample XML document. As some examples of document ordering, in this figure we have n2 < ris, n28 < ^22, and nio < ne.

Throughout this chapter, we impose a specific physical data model on our XML database, which gives a set of navigation primitives which take constant time to run. We have carefully chosen this set of primitives so that it is likely that any rea- sonable native XML database would need to be able to implement these primitives in constant time. The primitives needed are summarized in Table 2.L Of these,

PREORDER-PREVIOUS and PREORDER-NEXT can easily be implemented in terms of the others, although in worst case time linear in the depth of the database. In practice, however, the depth of an XML database is extremely small, and we can assume that these primitives will essentially run in constant time. In our imple- mentation, we do not maintain these primitives explicitly, instead relying on the observed properties of real XML documents.

We assign to each node a unique identifier, the object identifier, or oid, which is Accessor Description

PARENT(X) Parent of x

NEXT-SIBLING(X) Next sibling of x

PREVIOUS-SIBLING(X) Previous sibhng of x

FIRST-CHILD (a;) First child of x

PREORDER-PREVIOUS(a:) Node before x in document order

PREORDER-NEXT(a;) Node after x in document order

Table 2.1: Constant time navigational primitives typically represented by an integer of word size (32 bits on many modern machines). We stress that an ordering on the object identifiers of two nodes x and y does not necessarily correspond to the document ordering on x and y.

This chapter deals with order in dynamic XML databases. For simplicity, we as- sume that each insertion or deletion only adds or removes a single leaf node. The insertion or deletion of entire subtrees can be modeled as a sequence of these atomic operations.

2.3.2 Naive Sorting Algorithms

Algorithm 2.1 is the naive algorithm for determining the relative ordering of two nodes in an XML database when there is no extra ordering information. This algorithm has worst case time complexity hnear in the number of nodes in the database. When comparing nodes x and y, the algorithm finds nodes a, 6, and c, such that a and b are children of c, a is an ancestor of x, and b is an ancestor of y. Then, one can determine whether x < y by determining whether a comes before b in the hst of children of c.

Suppose we have a set of nodes S from a database T> that we wish to sort into document order. If we use a standard sorting algorithm with the comparison function given by Algorithm 2.1, we would have worst case time complexity 0(|5||X>| log |5|). However, it is possible to generalize Algorithm 2.1 to handle n nodes at once, in Algorithm 2.1 Relative document ordering of two nodes rii and n2, using no in- dices.

Naive-Order-Cmp (ni, 712) 1: if rii = 7^2 then return rii = 722 Ai <— [ni, PARENT(ni), PARENT(PARENT(ni)), • • • , ROOT if 712 € Ai then return n2 < ni ^ [^2, PARENT(n2), PARENT(PARENT(n2)), • • • , ROOT if Til e A2 then return rii < 722 9 Find the smallest i such that - ij ^ - i 10 mi — i 11 7712 ^ ^2 [1^21 - i 12 Determine the ordering between the sibhngs rrii and 7722 by traversing through all their siblings, 13 if mi < 7722 then 14 return 721 < 722 15 else 16 return 722 < rii

which case the complexity drops to Od^l The reason for this drop in complexity

is because examining the common ancestors of all nodes in S simultaneously can

save operations.

2.4 Naive Approach

In this section, we define a naive strategy for handhng document order. The basic

idea of this approach is to label the nodes in a preorder traversal. While this is trivial

on a static database, it is not immediately obvious how to extend this algorithm to handle changing data, particularly data that changes frequently. We will first describe the basic idea, and then present refinements which allow the average case to execute more quickly. 2.4.1 Basic Idea

We associate with each node a numeric identifier (the document ordering identifier

or docid). In practice, we make the size of the docid equal to the word size of

the machine, although the amount of storage needed depends on both how many

nodes are in the database, and the quality of the document ordering index algorithm

(better algorithms should handle more nodes with less storage). Given a node n, we

define a function DOCID which returns its docid. For simphcity, we will ignore any

disk reads necessary to fetch the docid for a given node. If this information is stored

directly in the record for each node, then this assumption makes sense, as they will

be loaded into memory whenever the corresponding node is.

Let us first consider the simple case of a static database V. In this case, the document

ordering index is initiahzed by performing a preorder traversal of the database, and

assigning successive docids to successive nodes. Then, to compare two nodes x and

2/, we merely need to compare the relative order of their docids.

This method can be easily extended to the case of a database in which all nodes

being inserted are inserted at the end of the database (in document order). We

assign to each new node the next docid after the docid of the last node in the

database. When a node is deleted from the database, we do nothing (this results in

gaps being left between docids).

However, this approach breaks down when nodes can be inserted anywhere in the

database. Suppose we insert a new node n between two sorted nodes x and y. If

there is a gap between the docids of x and y (due to a previously deleted node), we

can reuse that docid for n. If there is no gap, then one approach (used by TIMBER

49]) is to use real numbers as tags, and take the mean of the tags of the adjacent

nodes. This completely solves the problem in theory, but in practice we have only

a finite number of floating point numbers and hence eventually will not be able to

represent tags with sufficient precision.

Thus, if there is no gap, we instead set the docid of n to that of x. We emphasize Algorithm 2.2 Maintenance of the document ordering index during the insertion of a new node n. INSERT-MAINTAIN (N)

X ^ PREORDER-PREVIOUS(N) if X is the last node in document order then DociD(N) ^ DOCID(X) +1 else y ^ PREORDER-NEXT(N) if DociD(Y) > DOCID(X) then DociD(x)+DociD(y) DociD(n) 2 else DociD(n) ^ DociD(a;)

O sorted node • unsortednode O deleted node

'00001110^^^^^ oooiiooi Qoooiiiio 00000011 /\ /\ 00001010 9 Q 9 9 00011001 ^ 00000101 00001000 • • 00011001 t 00000100 , 00011111 o o o o n33: 00000110 00001001 00001011 00001111 ¡00010011 00010110 00011000 t O •?) OO QOOOUOOl •'J o 00100000 00010000 00010100 00010111

Figure 2.2: An instance of the document ordering index that while this leads to nodes sharing their same docid, each node still has a unique oid, and hence the database is still consistent. Algorithm 2.2 summarizes this pro-

cedure. As discussed in Section 2.3, the worst case of PREORDER-PREVIOUS and

PREORDER-NEXT is linear in the depth of the database. In the latter case, for bulk insertions we can reduce the overall cost by using only one traversal for the entire set of nodes being inserted.

Figure 2.2 demonstrates the state of the document ordering index on the database in Figure 2.1, after several insertions and deletions have been performed. Deleted nodes are represented using dotted edges, and newly inserted nodes are represented using solid circles. We call a subtree consisting entirely of nodes with the same docid an unsorted subtree. In the figure, the subtree rooted at node 7234 is an unsorted subtree.

2.4.2 Comparing document order between two nodes

As mentioned before, we require a near constant time method for comparing the document order of two nodes. Algorithm 2.3 determines the relative document ordering of two nodes. The most expensive case in this algorithm is when both nodes to be compared are nodes with the same docid, as this falls back on a slightly faster variant of the naive algorithm. In all other cases, we get the comparison almost for free. If we let the maximum depth of an unsorted subtree (that is, a subtree of nodes with the same docid) be d, and the maximum breadth of an unsorted subtree be 6, then in the worst case we need to execute 2d + b operations. Thus, this algorithm performs extremely well when the size of the unsorted subtrees in the database are reasonably small — all further refinements to this algorithm focus on ensuring that this is so.

Algorithm 2.3 Find the relative document order of nodes rii and n2 using the

document ordering index. (NB: In line 6, we can improve on the call to NAIVE-

ORDER-CMP. For instance, consider Figure 2.2. If we sort nodes n^s anduve, once we find that 7135 is an ancestor 0/7133 we can terminate the search, as DociD(n35) < DociD(n36)/ ORDER-CMP (NI, 712) if DOCID(71I) < DOCID(712) then return ui < 712 elif DOCID(712) > DOCID(71I) then return 712 < else return NAIVE-ORDER-CMP(71i, 712) // see note 6'# 6'

(a) Before refactoring (b) After refactoring

Figure 2.3: Example of Refactoring

2.4.3 Refactoring

We present here an enhancement which can make the above algorithm practical. To reduce the chance of having large unsorted subtrees, we can refactor unsorted subtrees into several smaller unsorted subtrees by shifting the docid of neighbouring sorted nodes into the unsorted area. Figure 2.3 gives an example of how this strategy works.

Suppose we are comparing the document order of two unsorted nodes rii and n2, such that DOCID(ni) = DOCID(712). We scan, in document order, to the left and right of rii and 77,2 in exponentially increasing ranges, until we find nodes n[ and 712 such that DociD(ni) < DociD(ni) = DociD(n2) < DociD(n2). We then relabel this range of nodes so that they are evenly distributed, i.e., if there are n nodes, we set the docid of the z-th node in the range equal to DociD(ni) + [^(DociD(n2) - DociD(ni)) . We note that this does not assign a unique docid to each node, but it does minimize the number of nodes on each docid, if we restrict the docids to those in the range.

The advantage of this algorithm is its simpHcity. It is also, in a primitive sense, dynamic, because relabeling will only happen in areas where document ordering comparisons are actually occurring. Its most significant shortcoming, however, is that it changes read operations into write operations; this means that a transaction which would otherwise be read-only may be escalated to a write transaction. This obviously could have a significant impact on overall database performance. 2.5 Bender's Algorithm

In this section, we provide a brief overview of the algorithm of Bender et al [11.

Theoretically, this algorithm has excellent time complexity, but in practice there are some limitations. The basic idea of this algorithm is to assign to each node of the tree an integral identifier, which we call its tag (or docid, in keeping with the previous section's terminology), such that the natural ordering on the tags corresponds to the document ordering on the nodes. During insertions and deletions, it is obviously necessary to relabel surrounding nodes at some point, if there is no space to assign a tag for the new node. The algorithm guarantees that such relabellings cost only constant amortized time.

Let ^¿ € N be the tag universe size, which we assume to be a power of two, and consider the complete B corresponding to the binary representations of all numbers between 0 and li-1. Thus, the depth of the tree is log \U\, and the root- to-leaf paths are in one-to-one correspondence with the interval I = [0,u — 1] C Z; more generally, any node of the tree corresponds to a sub-interval of J. When our database has n nodes, this tree will have n leaf nodes used, corresponding to the tags assigned to the nodes in the database. For a node n G X>, we write DociD(n) G B for its numeric identifier. For a node in the identifier tree, we define its density to be the proportion of its descendants (including itself) which are allocated as identifiers.

When inserting a new node n between two nodes x and y, we proceed as follows.

First, if DociD(A;) + 1 ^ DociD(I/), we set DociD(N) = |(DOCID(X) -h DOCID(2/)).

Otherwise, we consider the ancestors of x, starting with its immediate parent and proceeding upwards, and stop at the first ancestor a such that its density is less than where T is a constant between 1 and 2, and i is the distance of a from x.

We then relabel all the nodes which have identifiers in the sub-range corresponding to a.

Bender et al [11] prove that the above algorithm results in an O(logn) amortized time algorithm. We omit the proof, but quote the following results. Firstly, for a fixed T the number of bits used to represent a tag is log^¿ = l°jiog t • Intuitively, then, we would expect that as T decreases, the amortized cost of insertions decreases, because more bits are used to represent the tags, and hence there are larger gaps.

This can be verified from the fact that the amortized cost of insertions is {2 — ^) log u.

Practically speaking, of course, we wish to fix log^¿ = W, where W is the word size of the machine. In this case, there is a trade-off between the number of nodes that can be stored and the value T. Another practical difficulty is that as more nodes are inserted into the database, the average gap size decreases. At some point, thrashing will occur due to the fact that many nodes are frequently relabeled, and the theoretical properties of the algorithm fail. To alleviate this problem. Bender et al make the following small refinement: instead of making T constant, at each point in the algorithm we can can take T as the smallest possible value that causes the root node to not overflow. They show experimentally that this modification yields good results. The pseudocode for this algorithm is given in Algorithm 2.4.

From the above 0(log n) amortized time algorithm, we can obtain an amortized 0(1) algorithm using a standard technique (see, for instance, [31]). We partition the hst of nodes into G(n/logn) lists of 9(logn) nodes, and maintain ordering identifiers on both levels. When one of the sub-hsts overflows, we split it into two sub-lists, and insert the new sub-list into the list of lists. It is easy to show that this removes the logarithmic factor.

While this algorithm obviously has very desirable theoretical properties, in the con- text of disk-bound lists there are several problems. Firstly, in order to get amortized constant time worst case bounds, we need to maintain quite a bit of extra informa- tion for the two-level list structure. At a minimum, we must maintain the top level hnked hst, and for each node we must store a pointer to the sub-list it belongs to.

Additionally, to perform ordering between two nodes one would have to lookup the tags of their sub-lists, which is an unavoidable indirection. This can have an adverse impact on paging, and possibly incur many expensive disk reads. Our experimen- tal results show that it is this last effect that has the most serious impact on the constant time algorithm. Algorithm 2.4 Document ordering index during the insertion of a new node, using the algorithm of Bender et al [11. Bender-Insert (n, c) X PREORDER-PREVIOUS(n), y PREORDER-NEXT(n) if X = nil and y = nil then DociD(n) ^ elif y = nil and DociD(a:) then DOCID(n) - [Docm^^ elif X = nil and Docid(2/) ^ 0 then DociD(n) ^ [Do^^ elif DociD(a;) + 1 ^ Docid(?/) then 9 DOCID(n) ^ [DociD(x)+DociD(y)^ 10 else w 11 \ 12 for i (G do DociD(a:) 13 /< 2» 14 U 1 + 2' 15 S {mev :l < DociD(m) < ^¿} U {n} 16 d ML 17 if d < T-' then 18 j^O 19 for m e S in ascending document order do 20 DociD(m) I + 21 j ^ j + 1

2.6 Randomized Algorithm

In this section we present an alternative probabihstic algorithm which performs very well in practice. To illustrate how the algorithm works, suppose we have an ordered list of objects Xi, a;2,..and to each Xi we assign a tag (as in previous algorithms) to determine relative ordering. We define gi = DociD(a;i+i) - DociD(a:i) to be the gap between the node's tag and its successor's tag.

Suppose we wish to insert a new node Xq at the beginning of the list. We initialize xo's tag to [Docm^xilj iterate through Xi,X2,. •adjusting the gap sizes as follows. We draw a random number g from some fixed discrete probability dis- tribution ranging over the positive integers. If the gap we are currently considering

(say Qi) is smaller than g, then we set DociD(xi+i) ^ DociD(xi) + g. We continue with this procedure on successively higher values of i until we find a gap larger than the random number we sample. This handles the case where insertions happen at the beginning of the list. Insertions in the middle are handled by two traversais, one forward through the list (as above), and one backwards through the list, in a completely symmetric fashion.

While it is clear that this algorithm will preserve the document ordering properties of the tags, it is not at all clear why this algorithm should work quickly. Suppose that upon the insertion of a new node, the algorithm relabels n nodes. Then it is easy to see that the tag of Xn will be the sum of n random numbers from our probability distribution, because gi for z < n will have been drawn from this distribution.

However, we cannot say anything about gn- Nevertheless, we make the assumption that, once the algorithm has run for some long length of time, it will be the case that the tag of the node Xi will be the sum of i random numbers from our probability distribution.

Of course, this assumption is not valid in the general case. To alleviate this problem, we will, as described below, choose a probabihty distribution which favors small gap sizes. This means that after the first n nodes have been relabeled, even though gn will not have been sampled from the distribution, it will still be small and hence one of the more likely values from the probability distribution. This means that the effect of these "unsampled" gaps will have a negligible impact on the rest of this analysis. Unfortunately, this breakdown does result in linear time worst case performance for our algorithm; however, we show in Section 2.7 that it still has good practical performance.

With the above assumption in mind, we can now restate the algorithm as follows.

Upon the insertion of a new node, we progressively choose increasing values of i, and for each i we choose a new sampled from the cumulative probability distribution. We terminate the search if Xi > Xi. We now must show that this algorithm terminates in a reasonable amount of time.

Suppose that X and Y are independent and identically distributed random variables. Then it is clear that P{X >Y) — P{Y > X), by independence. Hence:

P{X >Y)-\-P{X

P{Y >X) + P{Y > X) = 1 1 + p{X = Y) P{Y>X) =

P{X>Y) >

Thus, at the step of the algorithm, there is at least a 50% chance that the algorithm will terminate. Hence, the probability of the algorithm not terminating after i steps is at most 2~\ Therefore, in practice, the algorithm should terminate fairly quickly. In fact, it is easy to see that on average we would expect at least four relabellings. In practice, the number of relabellings will be higher due to the failure of our assumption; however, our experiments show that the algorithm still has good performance.

The question remains as to what probability distribution we choose to use. We choose to use the exponential probabihty distribution, given by probabihty density function:

fix) = \e — Xx

Of course, this is a continuous distribution, whereas we require a discrete distribu- tion, because gap sizes must be integral and non-negative. Hence, we actually use the distribution defined as: P{9 = i)= i f{x)dx Ji-i

For our experiments, we used A = ln2. We chose the exponential distribution (and this value of A) because while the above algorithm works well in theory, it assumes implicitly that there is no upper bound on the size of tags. Of course, in practice we want tag values to remain small. Hence, we do not want a probability distribution which yields large gaps with high probability. Additionally, the assumption we made in the above analysis can only be satisfied by a distribution such as the exponential distribution, which generates small values with very high probability.

Algorithm 2.5 Updating the document ordering tags using the randomized algo- rithm, upon inserting a node n.

RANDOM-UPDATE (N)

NI

712 PREORDER-NEXT(N) DOCID(N) ^ ^DOCID(NI)+DOCID(N.)

while 721 nil do g ^ GET-GAP() if DociD(n) - DociD(ni) < g then DociD(ni) max{DociD(n) - g,0} 9 else 10 break 11 n rii 12 TIL PREORDER-PREVIOUS(NI) 13 n ^ n' 14 while 712 ^ nil do 15 g ^ GET-GAP() 16 if DociD(n2) - DociD(n) < g then 17 DociD(n2) ^ min{DociD(n) -h g, \U\ - 1} 18 else 19 break 20 n <— 712 21 712 ^ PRE0RDER-NEXT(712)

The algorithm is given in pseudo-code in Algorithm 2.5. The function GET-GAP obtains a random sample from the gap distribution we defined above. We note one potential problem with our algorithm, which does not seem to be significant in practice. It is possible that during the relabeling process, the algorithm will hit the greatest or least possible tag value. In this case, we simply allow multiple nodes to have the same tag value, and use Algorithm 2.1 in this case to determine ordering. This case is unlikely to occur in practice, because the number of nodes present in the database would have to approach the total number of docids available. On the other hand, the fact that the algorithm makes only one pass of the range that is relabeled (as opposed to the two passes of Bender) will make a significant practical difference in a disk-bound data structure such as a database, as can be seen in our experimental results.

2.7 Performance Evaluation

We performed our experiments using the DBLP database. All experiments were run on a dual processor 750 MHz Pentium III machine with 512 MB RAM and a 30 GB, 10000 rpm SCSI hard drive. We tested both Bender algorithms (the O(logn) and 0(1) variants), the simple refactoring algorithm of Section 2.4, and the randomized algorithm of Section 2.6.

2.7.1 Bulk Insertion and Random Insertion

For each algorithm, we inserted 100, 1000, and 10000 DBLP records into a new database. The insertions were done in two stages. The first half of the insertions were appended to the end of the database, and hence simulated a bulk load. The second half of the insertions were done at random locations in the database; that is, if we consider the document as a linked list in document order, the insertions happened at random locations throughout the list — this stage simulated further updates upon a pre-initialized database. While the inserts were distributed over the database, at the physical level the database records were still inserted at the end of 100 Records

• 0(1) • 0(log n) • Refactor • Random

0 X #Reads 100x#Reads

Figure 2.4: Results for database of 100 records

the database file. This resulted in a database which was not clustered in document order, which meant that traversing through the database in document order possibly incurs many disk accesses. We hypothesize that while many document-centric XML databases will be clustered in document order, data-centric XML databases will not be, as they will most likely be clustered through the use of indices such as B-trees on the values of particular elements. Hence, our tests were structured to simulate these kinds of environments, in which the document ordering problem is more difficult.

At the end of each set of insertions, there were n records in the database, where n G {100,1000,10000}. Choosing three different magnitudes of sizes of record list in exponential scale allows us to shows the overhead of different approaches on small scale, as well as the scalarability. We then additionally performed lOn and lOOn reads upon the database. Each read operation chose two random nodes from the database and compared their document order. The nodes were not chosen uniformly, as this does not accurately reflect real-world database access patterns. Instead, in order to emulate the effect of "hot-spots" commonly found in real-world database 1000 Records

• 0(1) • 0(log n) • Refactor • Random

0 X #Reads 10x#Reads 100x#Reads

Figure 2.5: Results for database of 1000 records

10000 Records

100x#Reads

Figure 2.6: Results for database of 10000 records applications, we adopted a normal distribution with mean f and variance

Figures 2.4 through 2.6 show the results from our experiments. There are several interesting things to note from our experiment. Firstly, the 0(1) algorithm of Bender is easily slower than the O(logn) algorithm. The relative performance gap becomes more noticeable as the number of reads increases, and hence is due to the extra level of indirection imposed in the comparison function by the 0(1) algorithm. It is due to the fact that that 0(1) variant's maintenance of the two level ordering structure did not match the access patterns present in the experiment. Secondly, it is clear that the refactoring algorithm is use in high read scenarios, as the process of maintenance occurs does not occur at the insertion but during the order comparison, which makes it undesirable if we need to guarantee worst case speed to do order comparison. On the other hand, it means refactoring is fast even when there is a high ratio of write.

Finally, the performance of our randomized algorithm is clearly more encouraging,

as it is ahead of all the other algorithms by a comfortable amount in all tests. The

performance gap between the other algorithms and the randomized algorithm also

increases as the number of records increases, which indicates that our algorithm will

perform better on large databases than the others.

2.7.2 Uniform Query Distribution

In this experiment, we evaluated the performance of the algorithms under a uni-

form query distribution. The experiment began with an empty database, which

was then gradually initiahzed with the DBLP database. After every insertion, on

average r reads were performed, where r was a fixed parameter taken from the set

{0.01,0.10,1.00,10.0}. Each read operation picked two nodes at random from the

underlying database, using a uniform probability distribution, and compared their document order. In all of our experiments, we measured the total time of the com- bined read and write operations, the number of read and write operations, and the number of relabellings. However, due to space considerations, and the fact that the other results were fairly predictable, we only include the graphs for total time. As 10000

• 0(1) Bender

• 0(log n) Bender

• Randomized

Figure 2.7: Result for uniform query distribution

can be seen from the results in Figure 2.7, the randomized algorithm is easily the best performer.

We note that, as the ratio of reads increases, the performance of both Bender's 0(1) algorithm degrades. We attribute this in both cases to the extra indirection involved in reading from the index. As values of r > 100 are common in practice, we expect that the behavior of the 0(1) algorithm will be even worse than even the O(logn) variant for many real-hfe situations. Also, because of the extremely heavy paging, even the small paging overhead incurred by an algorithm such as the schema based algorithm, which only infrequently loads in an additional page in due to a read from the index, has a massive effect on the performance. Thus, although this experiment is slightly contrived, it does demonstrate that in some circumstances the indirection involved becomes unacceptable, given that values of r in real life will often be 100 or 1000. 10000

Figure 2.8: Result for non-uniform query distribution

2.7.3 Non-Uniform Query Distribution

This experiment was identical to the previous experiment, except that the reads were sampled from a normal distribution with mean and variance |D|/10, and we took r e {0.01,0.10,1.00,100.0}. The idea was to reduce the heavy paging of the first experiment, and instead simulate a database "hot-spot", a phenomenon which occurs in practice. As can be seen from the results of Figure 2.8, this experiment took substantially less time to complete than the first experiment because paging comes into effect.

2.7.4 Adversary Insertion Sequence

The previous experiments showed that the randomized algorithm had very good performance for the special case of appending to the end of the database. We demonstrate in this experiment that, in some cases, it has very bad performance, 500

0(1) Bender 0(log n) Bender Randomized Algorithm

Figure 2.9: Result for worst case performance

far worse than the other algorithms. This experiment was identical to the first ex- periment, except that instead of inserting DBLP records at the end of the database, we inserted them at the beginning. As can be seen from the results in Figure 2.9, both Bender's algorithms easily beat the randomized algorithm's performance as they both guarantee the worst case performance. Hence, in situations where worst case bounds must be guaranteed, the randomized algorithm is not a good choice. However, in practice, such an adversary only occurs during bulk loading in the mid- dle of the document, and should be treated a separate case where the update of the ordering tag can be done after the bulk loading.

The reason of why randomized algorithm perform generally better under average sceansio because it redistribute identifier locally during insertion linearly. 2.8 Applications

2.8.1 Ancestor-Descendant Relationships

As mentioned Section 2.1, our method can be applied to efficiently determine ancestor- descendant relationships. The key insight is due to Dietz [33], who noted that the

ancestor query problem can be answered using the following fact: for two nodes x and ?/ of a tree T, x is an ancestor of y if and only if x occurs before y in the preorder traversal of T and after y in the postorder traversal of T.

We note that our discussion has been in terms of document order (that is, preorder

traversal), our results could be equally well applied to the postorder traversal as well.

Thus, by maintaining two indices, one for preorder traversal, and one for postorder

traversal, which allow ordering queries to be executed quickly, we can determine

ancestor-descendant relationships efficiently.

It is well-known that ancestor-descendant relationships can be used to evaluate

many path expressions, using region algebras. In fact, some native XML databases,

such as TIMBER [49], use this trick by storing numerical identifiers giving the start

and end positions for each node in a preorder traversal. However, to the best of

our knowledge, this is the first work to address the efficient maintenance of these

identifiers in the context of dynamic databases. As an example of how structural

and range queries can be answered efficiently, consider the XPath:

//Item[.//Price > 200]

This query can be answered by the following plan (where we adopt the definitions

of Section 2.1). Taking: D-Join rm intersect hash(Harddisks) Twig-Join 3 C T-m A-Join parent-Item hash(Item) parent::Brand r~ I — child: :Item parent::Brand parent::Price parent: :Price HashC'ABC") t t hash(Harddisks) 1 HashC'ABC") bptree(<,200) bptree(<,200)

(a) Plan A (b) Plan B

Figure 2.10: Possible query plans on //Harddisks/Item[Price<200] [Brand="ABC"]

Si hash ("Item")

<52 ^ hash("Price"), and

Ss bptree(>,200)

We then find M where:

(Vn G M){M C Si){3x £ 52)(3y 6 S3){n

post X >post y) 

In the above, ^pve ^^ an ordering comparison in preorder traversal, and >post is an ordering comparison in postorder traversal. We can then sort M into document order using the results in this chapter.

2.8.2 Query Optimization

With an efficient order operator, it is possible to use access paths such as B"^tree to index numerical data. Query operators that disrupt the natural order such as UNION can be freely used within the query optimizer because query optimizers can ignore document order temporarily within a sub query plan. In order to choose the plan with a minimal cost, the query optimizer requires information such as selectivity and actual calculation of costing cost. Figure 2.10 shows two possible query plans on the same XPath query. Plan A ignores document order throughout the plan and sorts the final result at the end. Plan B immediate sorts the nodes returned by the B+tree back to document order and maintains the order for the rest of the plan.

If the selectivity estimator shows that the number of results returned by the query is significantly smaller than the number of nodes returned from the B+tree, plan A is a better plan. This is because even with the constant 0(1) search cost for the order maintenance problem, it still requires O(nlogn) time to sort a set of n nodes. Otherwise, plan B is preferred.

2.9 Conclusions

In this chapter, we have presented the first analysis of practical algorithms for main- taining an index for document order in dynamically changing databases. Having such an index will prove invaluable in optimizing queries over XML databases. We have shown that the straightforward approach of refactoring scales very poorly, and that even theoretically good results can have surprisingly poor practical perfor- mance. This is best demonstrated by the relatively poor performance of the 0(1) time algorithm of Bender. Taking into account practical considerations, we have developed a simple algorithm that performs better than all other known algorithms, and in particular scales in a significantly better fashion. Finally, we note that while we have couched our discussion in terms of native XML database systems, our results could be adapted to handhng XML data in relational database systems.

In the recent related work [35], we have also extended Bender's approach and create a parameterized family of algorithms which tradeoff comparison cost and update cost. We also investigated the utilization of schema information to reduce the number of nodes for which document ordering information needs to be obtained.

There are many open research topics left in this area. A more significant topic would be to extend the work of Lerner and Shasha [59] to handle ordered XML data. This is now possible because we have developed an efficient ordering operator. Chapter 3

Efficient Structural Joins

If you could see your ancestors, all standing in a row,

there might be some among them whom you wouldn't care to know

— Mahle Baker

3.1 Introduction

In recent years XML [15] has emerged as the standard for information representation

and exchange on the Internet. However, finding efficient methods of managing

and querying large XML documents is still problematic and poses many interesting

challenges to the database research community.

XML documents can essentially be modeled as ordered trees, where nodes in the or-

dered tree represent the individual elements, attributes and other components of an

XML document. The pre-order traversal upon the tree gives the document ordering

of the XML document. Recently proposed XML query languages such as XPath [84

and XQuery [86], which have been widely adopted by both research and commercial

communities for querying XML documents, heavily rely upon regular path expres-

sions for querying XML data. For example, consider the MEDLINE [68] database as

an example XML document. The regular path expression //DateCreated//Month returns all Month elements that are contained by DateCreated. Without any in- dexes to the document, the typical approach to evaluating //DateCreated//Month would require a full scan of the entire XML database, which can be very costly if the document is large.

Recently, the structural join, which involves finding structural relationships between a list of potential ancestor nodes and a list of potential descendant nodes, has been proposed. Structural joins are now considered a core operation in processing and optimizing XML queries. Various techniques have been proposed recently for efficiently finding the structural relationships between a list of potential ancestors and a list of potential descendants. Most of these proposals rely on some kind of database index structures such as B+Tree. These index structures will increase their resources requirement (e.g., memory consumption) and maintenance overhead (e.g., for updates). Furthermore, most of them count on numbering schemes such that they incur significant relabeling costs during data updates.

Instead of improving the performance of the state of the art structural join based on external index structures, this chapter proposes simple and yet effective ways of skip- ping unmatched nodes during the structure join processing. The key contributions of this chapter are summarized as follows:

1. We propose an improvement to the current state of the art stack based struc- tural joins [7] ^ based on various skipping strategies. In contrast to other work,

our proposed extension does not require any external indexes such as B-trees,

and hence imposes less overhead on the underlying database system.

2. Our proposed method does not employ any indexes such as B+Tree and its

entire operation cost is hnearly proportional to size of the query output. Hence

it can be extended to support XML stream data.

3. We present extensive experimental results on the performance of our proposed

algorithms, using both real-world and synthetic XML databases.

^The algorithm of [7] will be referred to as the STJ-D algorithm hereafter. 4. We show experimentally that our approach can outperform the stack-based structural join algorithms by several orders of magnitude.

5. We discuss how updates can affect structural join processing and how our approach can reduce the negative side-effects of updates on XML databases.

6. Finally, we discuss the differences between the preorder/postorder and start/end approaches to maintaining ancestor-descendant information in XML databases.

The rest of this chapter is organized as follows. Section 3.2 gives the problem defini- tions and discusses relevant related work. We present our improvements to existing structural join algorithms in Section 3.3. In Section 3.4, we give an experimental analysis of our algorithms, and compare our results with some existing schemes for structural joins. Finally, Section 3.5 concludes the chapter.

3.2 Related Work

XML data is generally modelled as a tree structure where elements, attributes and data are represented as nodes of the tree. Within this tree, parent-child and ancestor- descendant relationships represent the nesting of elements within the corresponding XML document. Querying XML data frequently involves the determination of the containment relationship between data nodes; for example, during the evaluation of a path expression a structural join may be used to determine whether an ele- ment A is the ancestor node of an element B. Thus, in order for structural join algorithms to operate efficiently, the database should be represented in a way which allows the structural relationship of nodes to be determined in close to constant time. This section describes different approaches for efficiently determining the ancestor-descendant relationship between two nodes. We also review related work on structural joins that make use of these schemes. 3.2.1 Structural Joins

Recently, several new algorithms, structural join algorithms, have been proposed for finding sets of document nodes that satisfy the ancestor-descendant relationship with another set of document nodes. Various approaches have been proposed using traditional relational database systems [37, 97] and on XML query engines such as proposed in [67 .

The current state of the art structural joins on XML data is described in [7]. It takes as input two lists of elements, both sorted with respect to document order, representing the hst of ancestors (AList^) and the list of descendants (DList^). The basic idea of the algorithm is to do a merge of the two lists to produce the output, by iterating through them in document order. While it is iterating through the two lists, it determines the ancestor-descendant relationship between the current top of a stack, which is maintained during the iteration, and the next node in the merge. Based on this and the manipulation of the stack, it produces the correct output. The cost of this approach is 0{\AList\ \DList\ -h \Output\).

More recent work extended this approach for better speed performance. For exam- ple, the XML Region Tree (XR-Tree) approach to index document structure on disk 50], uses a variant of B^Tree with different index key entry and Hsts to maintain the ancestor-descendant relationship between nodes. It then uses the stack-tree based join algorithm to carry out structural joins. The amortized I/O cost for inserting and deleting nodes for XR-Tree is 0[logFN + Cdp), where N is the number of ele- ments indexed, F is the fanout of the XR-tree, Cdp is the cost of one displacement of a stabbed element. Although this approach can support structural joins of XML data by using the stack-tree based join, determing ancestor-descendant relationship between the two nodes is not constant. Therefore, for large ancestor and descen- dant node sets it may not be as efficient as the original STJ-D algorithm. Also, any large set of random updates requires frequent updates of large parts of the XR-tree. Therefore, maintaining the index for a large, changing XML database can be costly.

2Ancestor node list and AList are interchangeable hereafter. ^Descendant node list and DList are interchangeable hereafter. In [16], the stack-tree join algorithm was extended to match more general selection patterns on XML data tree. The work done by [7, 16, 20] are very related to this chapter. For instance, the algorithm proposed in [20] speeds up the stack-tree based join algorithm by using a combination of pre-built indexes such as B^Tree and R-Tree. It utihzes B'^Tree indexes built on the element start-tag positions. Hence it can use B^Tree range queries to skip descendants that do not match during the structural join. However, these approaches are not effective in skipping ancestor nodes, as stated in [50

3.2.2 Numbering Schemes

Apart from the index based schemes that were mentioned above, numerous work has been done on using numbering schemes to support queries on ancestor-descendant relationships. The following discusses them in more detail.

Dietz and Sleator worked on maintaining order in a linked hst [31]. They proposed algorithms which permitted constant time queries on the relative order of nodes in a hst, with only a constant time overhead on insertions and deletions in the hst. This supported earlier work done on solving the ancestor query problem by comparing the relative preorder and postorder of two nodes [33 .

More recently, a more elegant approach has been proposed [11], which obtains the same performance as Dietz and Sleator's algorithm. The maintenance of both the preorder and postorder on an XML document corresponds to the order maintenance problem, and hence this result gives the best possible theoretical bounds on our problem. Additionally, there is an upper bound on the size of the database for which the results hold and [11] estimates it to be at approximately 430,000 elements for a particular parameter selection. Therefore, this algorithm is only an incomplete answer to the question of document ordering in large databases, where the number of nodes can easily run into the milhons. However, the paper did not focus on ancestor query problem. Furthermore, the order maintenance problem forcus on maintaining the order between two nodes. It is not directly related to skipping unmatched nodes in structural joins.

Extensive research has been done on inverted indices [78] for Information Retrieval (IR) systems. A recent work [97] showed that we can use an inverted index to solve the containment queries of XML data nodes. The inverted index data structure maps text words of XML documents in a T-index and elements in an E-index such that elements are mapped to inverted lists. Occurrences of a word or an element are recorded in inverted lists, with each occurrence indexed by its {DocID^ Start : End^ Level)^ where DocID is the document number, Start is the position of the word End is the position where this word ends; and Level is the depth of the data node. This information is sufficient to compute ancestor-descendant relationships. In practice however, there are always frequent, randomly distributed inserts, deletes and updates of XML data. Any changes to the database will require the majority of the inverted index to be re-calculated. Therefore, this approach can be costly for maintaining an inverted index for a non-static XML database.

3.3 Skip Joins

This section presents our algorithms to skip unmatched nodes for structural joins. It also describes various strategies that can be used for skipping, which lead to different performance outcomes. In order to present our algorithms in a meaningful way, we first classify all structural joins into three classes. Each class of structural joins has different applications for query optimization, and are sufficiently different that they should be optimized separately.

1. Descendant Join (D-Join): the first type of structural join filters a set of descendant nodes by selecting only those nodes that have an ancestor within the set of potential ancestors. For example, the query a//b should return a node set Rv = {d ^ Rv e A such that d is a descendant of a}.

^Word and element will be used interchangeably hereafter ma-skip - • ' ' • a-ski• p • « o o p) o o o ^ ^ O ^^^ d-skip md-skip 0123456789 10 11 12 is ancestor of O

(a) Ancestor node list A and descendant node (b) A and V in tree representation list V

Figure 3.1: Possible skipping strategies during a structural join

2. Ancestor Join (A-Join): this type of structural join filters a set of ancestor nodes by selecting only those nodes that have a descendant within the set of potential descendants. For example, the query a//b should return a node set R^ = {a e Ra \3d eV such that a is an ancestor of d}.

3. Ancestor-Descendant Join (AD-Join): the third type of structural join returns the set of ancestor-descendant node pairs. For example, the query a[.//b=.//c] may be evaluated by performing a structural join on the the sub- expression .//b=.//c, which would then make use of the set Rav = {{^i, c?)|Va G eV such that a is an ancestor of d}.

The algorithm proposed in [20] suggested that the-state-of-the-art STJ-D al- gorithm proposed in [7] has the disadvantage of having to scan through the entire ancestor list for the join operation, and hence in some cases unnecessarily scans through ancestor nodes that do not contain any nodes in the descendant list. A similar phenomenon can occur during the scanning of the descendant list. Their solution to this was to use a B+Tree, which requires a prebuilt index system for the database. In this section, we introduce some extensions to the STJ-D algorithm by introducing a skipping mechanism to skip ancestor and descendant nodes that do not match the structural pattern, and hence which may be safely ignored during the structural join.

Figure 3.1(a) represents a particular instance of a structural join. Depending on the type of query, we can utilize different skipping mechanisms to optimize the join. The circled regions denote the ancestor-descendant relationship between nodes. For ex- ample, {(¿o, . The a-skip arrow and d-skip arrow in the figure show the nodes which are not included in the result set of an AD-Join; hence, the struc- tural join algorithm should try to minimize traversal of those nodes if possible. For A-Joins (respectively D-Joins), in the optimal case we should further skip all the matched descendants {di,d2,ds,diQ,dii} (respectively ancestors {a7,a8}). For ex- ample, as soon as we can determine that do is a descendant of ao, we do not need to traverse di and 0^2, because they only match with ao- Similarly, for D-Joins, the traversal of ay and as should be avoided since QQ is their common ancestor, and so descendants of aj and ag are also descendants of ae, thus skipping a-j and ag will not affect the result.

Of course, in order to skip nodes we must make the assumption that we can perform these skips in constant time. We note that this assumption is not necessary for previous work such as that of [7]. However, we believe that very frequently the node sets being joined will be stored in array-like structures in memory or on disk. This is because even for relatively large data sets such as DBLP [1], the node sets remain only a few megabytes in size, and hence are easily manipulated as arrays.

The pseudo-code for the STJ-D algorithm is shown in Algorithm 3.1; this algorithm is used later as the control for our experiments. However, we have modified the algorithm from its original presentation such that it uses a preorder and postorder labelling scheme to determine ancestor and descendant relationships between nodes.

We use this labelling scheme for data update maintainence instead of using the traditional {StartPos : EndPos, Level) approach.

3.3.1 Skip-Join for Ancestor-Descendant Join

Here, we propose an alternative stack-tree based structural join algorithm on two input hsts AList and DList (both sorted in document order). Our approach is to make the assumption that we can skip quickly (as discussed previously), and then to Algorithm 3.1 Slightly modified Stack-tree based structural join proposed in [7 (STD-J). (NB: All algorithms in this chapter are simplified for ease of presentation: boundary cases are omitted and all boolean operations returns false if a particular required element does not exist or out of range.)

STACK-TREE-DESC(^, V) 1: a ^ 0, d ^ 0, 7^ ^ 0, Stack ^ 0 2: while 0? < A a < |.4| V \Stack\ > 0 do 3: if F0LL0WlNG(T0P(S'iac/c), A[a]) A F0LL0WING(T0P(5iacA:), T>[d]) then 4: Fop{Stack) 5: elif PREORDER(^[A]) < PREORDER(D[C/]) then 6: PuSH(^[a], Stack) 7: a a + 1 8: else APPEND((S, V[d]),1l), Vs E Stack d^ d^l

FOLLOWING (N,/) I I Returns true if and only f / belongs to following axis of n. 1: return PRE0RDER(/) > PREORDER(n) A P0ST0RDER(/) > POSTORDER(n)

ANCESTOR (D, a) // Returns true if and only if a belongs to ancestor axis of d. 1: return PREORDER(d) > PREORDER(a) A POSTORDER(ii) < POSTORDER(a)

a-skip

— — — — — — — — — — — — ^ • J V. A • (p o o o o O O o Stack t, Stack d-skip

(a) Skipping ancestors (b) Skipping descendants

Figure 3.2: Skipping scenarios for an AD-Join

use this assumption by utilizing a skipping mechanism during the traversal of A and

T>. The basic idea is that during the structural join, whenever we advance the cursor

of A we call the A-SKIP function to search for the next node a' G A, such that it

a' is either an ancestor of the current descendant node d, or follows d in document

order. Similarly, whenever we advance the cursor of we call the D-SKIP function

to search for the next node d' such that d' is either a descendant of the current Algorithm 3.2 Structural joins that return ancestor-descendant node pairs (AD-

Join).

SKIP-JOIN-AD(^,r>)

1: a ^ 0, d 0, 7^ ^ 0, Stack ^ 0 2: while d<\V\ A a<\A\ V |S'iacA:| > 0 do 3: if F0LL0WING(T0P(S'iacA;),^[a]) A FOLLOWING (TOP (S'iacA;), then

4: FoF{Stack) 5: elif PREORDER(^[A]) < PREORDER(P[ÍÍ]) then 6: PuSH(^[a],5'iac/c) 7: a ^ A-SKip(a, T>[d],A) 8: else 9: ApPEND((5,r>[d]),7e),Vs G S'iac/c 10: if IS'iacA:! >0 then

11: d<^d-\-l 12: else

13: d ^ D-SKIP(C?, A[a],V)

ancestor node a, or follows a in document order.

In Figure 3.2(a), which uses the original STD-J algorithm, all nodes under the dashed

arrow need to be traversed by pushing them onto the stack and immediately popping

them in the next iteration. By using an A-SKIP function, we try to minimize the

number of lookups of these unnecessary nodes during list traversal and reducing

the number of nodes pushed and popped from the stack. However, node a' may

not necessarily be an ancestor of node d, as it may follow d in document order. Similarly, in Figure 3.2(b), by using the original STD-J algorithm, we again need to

traverse all nodes above the dashed arrow. Hence, function D-SKIP is used to try to

minimize the traversal of the descendant list. The algorithm is hsted in Algorithm

3.2.

We will also, at times, need to use two additional skipping functions, BA-SKIP

and BD-SKIP. Thesse functions are used when we can skip nested ancesetors or

descendants, a situation which occurs as described previously during A-Joins and

D-Joins. The definitions of the functions A-SKIP, D-SKIP, BA-SKIP and BD-SKIP

have not yet been given, because they can vary according to the skipping strategy chosen. We will discuss several possible strategies later in this chapter.

It should be pointed out that although it is possible to skip nodes in V even when the stack is not empty, the performance gain may not cover the penalty of the overhead.

This is because, in practice, real world XML trees have very shallow depth, and hence the number of skippable nodes within nested regions are generally small.

3.3.2 Skip-Join for Ancestor Structural Join

Many XML queries require the efficient filtering of ancestor nodes. For example, the query a//b [.//c] returns a set of b nodes which all have an ancestor a and a descendant c. If we use the STJ-D algorithm to process this query, we have to first join a//b, then b//c and finally merge the two joins together. However, if we have an ancestor filtering algorithm, it can return a smaller set of b nodes that are ancestors of c. We then feed this smaller b set as the new V for joining with a nodes. Then, we can take advantage of our previously described skip-join algorithm for descendant filtering, where it will perform better with smaller descendant sets.

Algorithm 3.3 Structural joins that return only matched ancestor nodes (A-Join).

SKIP-JOIN-A(^, 1: a ^ 0, ii ^ 0, ^ 0, Stack ^ 0 2: while d<\V\ ^ a <\A\y \Stack\ >0 do 3: if F0LL0WING(T0P(S'iac/i;),^[a]) A FOLLOWING (To? (S'iac/c), r>[c?]) then 4: Fop{Stack) 5: elif PREORDER(^[A]) < PREORDER(P[C/]) then 6: F\JSu{A[a], Stack) 7: a<- A-SKip{a,V[d],A) 8: else 9: APPEND(S,7^), VS G 5iac/C 10: d^D-SKip{d,A[a\,V) 11: if IS'tac/cl >0 then 12: d ^ BD-SKip(ii, Top{Stack),V) 13: else 14: d^d-\-l To further improve the performance of skip-joins on ancestor structural joins, we can take advantage of knowing that only ancestor nodes are wanted, and hence when the stack is not empty, we can skip all matched descendant nodes using BD-SKIP, because these nodes are not needed to increase the size of the result set. The detailed steps are described in Algorithm 3.3.

3.3.3 Skip-Join for Descendant Structural Join

For descendant structural joins, we do not need to keep the stack of ancestor nodes, as keeping only the top most ancestor will yield the same result set. As soon as we

push any nodes onto the stack, we can immediately use BA-SKIP to skip all nodes in A until a' follows the node in the stack in document order. Algorithm 3.4 shows the pseudo-code for this approach. We expect this type of structural join to perform well regardless of the size of AList and DList.

Algorithm 3.4 Structural joins that return only matched descendant nodes (D- Join).

SKIP-JOIN-D(^,P) 1: a ^ 0,ci 0,7^ 0,s ^ 0 2: while d<\V\ A a<\A\ V s^cj) do 3: if F0LL0WING(S,^[A]) A FOLLOWING(S,D[(Ì]) then 4: S (f) 5: elif PREORDER(^[a]) < PREORDER(D[ÌÌ]) then 6: \i s ^ (p then 7: a ^ BA-SKip(a, s. A) else

9 A a 10 a A-SKip{a,V[d],A) 11 else 12 APPEND

13 if then

14 d^d+1

15 else 16 d^ B-SKiF{d,A[a],V) 3.3.4 Skipping Strategies

Algorithm 3.5 Properties (and hence the algorithm) for each of the skipping strate- gies: A-Skip, D-Skip BA-Skip and BD-Skip A-Skip (a, c?, A) I I Skip from a to the first a' e A following a such that a' is either an ancestor of d, //or follows an ancestor of d 1: return MIN({a' G A\a' > a, ANCESTOR(d, a') V Following(c?, a')}) D-Skip {d,a,V), BD-SKip(ii,a,r>) I I Skip from d to the first d' eV following d such that d' is either a descendant of a, //or follows a descendant of a 1: return MIN({d' G V\d' > d, ANCESTOR(ii', a) V F0LL0WiNG(a, d')}) BA-Skip {a,s,A) I I Return the first a' e A following a which is not a descendant of a 1: return MIN({a' € A\a' > a, Following(5, a')}) BD-Skip {d,s,V) I I Return the first d' e D following d which is not a descendant of a 1: return MIN({i/' G T)\d' > d, ^ANCESTOR(ii', 5)})

Algorithm 3.5 describes the semantics of each skipping function. The goal of all these functions is to skip as many of the unmatched nodes as possible. However, for each of these skipping functions, different skipping strategies can be applied, each of which will result in different performance for the overall algorithm. For instance, one may find that it is more effective to skip nodes using a binary search when using A-Skip, but better to skip nodes using an exponential technique when using BD-Skip. We will investigate this in the experimental section of this chapter. In this section, we propose and describe different skipping strategies in detail. All examples below assume we are discussing skipping strategies for the A-SKIP function, but each of them can be similarly applied to other skipping functions.

Binary Skipping

Here we propose a skipping strategy which uses a simple binary search, as described in Algorithm 3.6. As is illustrated in Figure 3.3(a), the function hops through AList Algorithm 3.6 Binary skipping of ancestor nodes A-BiNARY-SKlp(mm^, max^, d, A) 1: while mm^ < maxj^ do 2 min_A+max_A X 2 3 if POSTORDER(^[X]) > POSTORDER(C?) then

4 if POSTORDER(^[X - 1]) < POSTORDER(C/) then 5 return x 6 else 7 maxj\_ ^ X — 1 8 else

9 if PREORDER(^[X]) > PREORDER((I) then 10 maxj\^ ^ X — 1 11 else 12 minj^ X + 1 13 return A

AList AList

DList O DList O

(a) Binary skipping strategy for A-SKIP (b) Exponential skipping strategy for

A-SKIP

Figure 3.3: Skipping strategies for A-SKIP

trying to find a node a such that a is an ancestor of both the current top node in the stack s and the current descendant node d and PREORDER(A) does not match the appropriate structural pattern. If no node is found, it returns the empty set and the join function stops scanning through the DList. We beheve this approach can be efficient for processing queries such as //Month [text () = "03"]; in this case, even if there exists a large set of Month elements, there may only be a small subset of them that matches the predicate. This yields a large ancestor to descendant node ratio and in most cases, large sections of the ancestor node list need not be visited at all. Exponential Skipping

Algorithm 3.7 Exponential skipping of ancestor nodes

A-EXPONENTIAL-SKIP(a, d, A) 1: miriA ^ a + 1, maxj\^ — 1, 5 1, x mm^ 2: while X < |.4| do 3: if POSTORDER(.4[x]) < POSTORDER(ii) then mzn^ X X ^ X 8 5^26 else maxj( X 9 break 10 return A-BINARY-SKIP(mm^, max a, d, A)

Since binary skipping uses a binary search approach, in the worst case it can take log n skips to locate the next AList node that matches the structural pattern. Thus, the worst case of the binary skipping strategy is if most nodes in the ancestor node list are matched. In this case, then every call to the skipping function would require approximately logn skips to find the next node. Here, we propose an exponential skipping strategy, which tries to avoid this worst case scenario. The exponential skipping strategy first skips through the ancestor list using exponentially increasing gaps, for example, 1, 2, 4, 8, 16, etc. When it over-shoots the target node, we then switch to binary search with the high and low boundaries of the search set to the current and previous hop position. The pseudo-code is described in Algorithm 3.7.

Figure 3.3(b) illustrates how the exponential skipping strategy augments the binary skipping by using slow start to increase the gap size exponentially until it over-skips past the next ancestor node. Since the next gap size is based on the past observed number of skipped nodes, the number of over-skipped nodes will be at at most equal to the size of the gap. Therefore, no matter how many nodes we have to skip, the slow start nature of this type of skipping strategy guarantees that in the worst case only one extra traversal is executed, and that this worst case happens when the number of nodes we must skip is either one or three. Most Recent Gap (MRG) Skipping

Algorithm 3.8 shows another skipping strategy, which is based on the gap infor- mation generated from the previous successful skip. In the exponential-skipping strategy, we increase the gap exponentially after each skip until we over-shoot the matching node. However, in the cases where gaps between nodes are large, expo- nential skipping strategy may waste the first few skips before the gap is big enough to reach the next matching node. In MRG strategy, we still use the exponential skip. However, the initial gap (instead of 1) can be different for each skip, depend- ing on the gap queue. Lines 1 and 2 of the function SKIP-JOIN-AD in Algorithm 3.8 defines a queue of e in size. The queue holds the gap information of the last e skips. Everytime, a new matching node is found, new gap is appended to the queue and the top of the queue is removed. MRG then uses the gap size at the top of the queue as the initial gap for exponential skip.

3.3.5 Skipping For Streaming Data

Chien et al [20] mention that it is not possible for any pre-built indexes to perform faster than sequential scan based structural join algorithms for streaming data. This is simply because they cannot send any feedback to the producer modules. As we have shown, our algorithms do not use any pre-built indexes for node skipping.

Therefore, we can easily adapted our techniques to suit the on the fly join strategies needed for streaming data, Of based on the assumption that the incoming streams are in document order. For processing streaming data, we set a fixed buffer size for the stream input, and page size proportional to the buffer size, thus simulating the same environment we would normally have for skip-joins. In the event the current position is under the high boundary, we just load in more pages from the buffer pool, until it passes the buffer size. Then, we set the current position to the buffer size and do a sanity check on whether we have skipped pass the desired ancestor or descendant node. If not, we flush the buffer and load in more data from the stream. Name Number of Elements Size (MB) Depth DBLP 3,803,281 160 6 MEDLINE 2,768,743 130 7 XMark 2,921,323 204 12

Table 3.1: Properties of the experimental data sets

3.4 Experimental Results

In this section, we present our experimental results on the performance of structural join algorithms on both real-world and synthetic XML data sets. We compare the performance of all join algorithms proposed in this chapter with the original Stack- Tree-Desc (STJ-D) described in [7]. We will then discuss the impact of updating data on our approach in the next section.

3.4.1 Experimental Setup

The experiments were carried out on a machine with dual Intel Itanium 800MHz processors, 1 GB of RAM, and a 40 GB SCSI hard-drive. The machine ran the Debian GNU Linux operating system with the 2.4.20 SMP kernel.

The data set for our experiments consisted of the data sets from DBLP [1] and MEDLINE [68], and a data set randomly generated by XMark [79]. The statistics of the data sets used for the experiments are detailed in Table 3.1. Table 3.2 sum- marizes the join algorithms to be compared in our experiments and their shorthand notations, which are referred to in this section.

We implemented the join algorithm using the XPath processor from the SODA XML database engine. For the purpose of maintaining control over the experiment, we disabled all database indexing and we implemented our join algorithm and the STD- J algorithm using exactly the same code base in C. For each of the experiments, we scanned the database to filter out all elements which do not fit the required Notation Algorithm

STJ-D-Join Stack-Tree-Join-Desc [7]

AD-JoiUe Skip-Join-AD with exponential skipping strategy

A-Joiug Skip-Join-A with exponential skipping strategy

D-JoiUg Skip-Join-D with exponential skipping strategy

AD-Join^ Skip-Join-AD with binary skipping strategy

A-Join^ Skip-Join-A with binary skipping strategy

D-Joiuf, Skip-Join-D with binary skipping strategy

Table 3.2: Notations for algorithms

element name before the structural join. Both the AList and the DList along with

their ordering information were stored in memory and no swapping to disk was

performed throughout the experiment. We only measured the time spent on the

structural join algorithm itself.

We defined a set of XPath expressions that capture various access patterns, which

are listed in Table 3.3, along with the number of nodes that satisfy each XPath ex-

pression. Our experiments joined pairs of the result sets hsted in this table together

using a structural join. For example, we used the expression AVj¡D\ (as defined

in the table) to compute the result of the path expression //dblp//title[. = "The

Asilomar Report on Database Research."].

3.4.2 Results and Observations

In this section, we compare the performance of different types of structural joins using our proposed skip-join algorithms against the existing STJ-D join algorithm.

Each join query is performed on the AList and DList nodes using an STJ-D al- gorithm and our proposed skip-join algorithms (i.e., AD-Join, A-Join and D-Join).

The full results are presented in Table 3.4 and Table 3.5. Columns |.4| and \T>\ give the size of the two lists, AList and DList, being joined. Column IZ gives the size of the output of join operations. Query# Database Query Output Size (# of nodes)

A1 DBLP //dblp 1 A2 DBLP //article 128,533 A3 DBLP //inproceedings 240,685 A4 DBLP /*/•/* 3,424,646 A5 MEDLINE //MedlineCitation 30,000 A6 XMark //listitem 106,508 A7 XMark //keyword 122,924 A8 XMark //bold 125,958 D1 DBLP //title[.="The Asilomar Report on Database Research."] 1 D2 DBLP //author[.="Jeffrey D. Ullman"] 227 D3 DBLP //author 820,037 D4 DBLP /*/* 375,225 D5 DBLP //sup 1,155 D6 DBLP /*/*/*/*/sup 50 D7 MEDLINE //Year 92,624 D8 MEDLINE //Year [.="2000"] 5,426 D9 XMark //listitem 106,508 DIO XMark //keyword 122,924 Dll XMark //bold 125,958

Table 3.3: Document and query expression used for experiments

As we have mentioned earlier in the chapter, there are three different types of struc- tural joins for XML data: AD-Join returns ancestor-descendant node pairs, A-Join returns matching ancestor nodes only and D-Join returns matching descendant nodes only. For example in Ql, column A-JoiUe gives the time it took to execute the query articlef.//title[.="The Asilomar Report on Database Research."]] using the exponential skip strategy on an ancestor only structural join. Column D-JoiUg gives the time it took to execute//article//titie[.="The Asilomar Report on Database Research."], but this time returning only descendant nodes. Column Set Cardinality Time Taken (//s) Q# A V A V n STJ-D AD-Joing A-Joing D-Joine Ql A2 D1 128,533 1 1 66,518 131 139 145 Q2 A3 D2 240,685 227 116 119,747 1,197 1,224 1,186 Q3 A3 D3 240,685 820,037 557,868 450,257 470,990 357,005 313,501 Q4 A1 D4 1 375,225 375,225 197,807 200,628 224 169,168 Q5 A4 D5 3,424,646 1,155 1,155 1,754,825 14,374 14,501 13,984 Q6 A4 D6 3,424,646 50 50 1,742,093 2,796 2,943 2,806 Q12 D1 A2 1 128,533 0 44,349 331 351 348 Q7 A5 D7 30,000 92,624 92,624 72,243 75,152 76,683 48,895 Q8 A5 D8 30,000 5,426 5,426 21,567 8,137 7,776 6,823 Q9 A6 D9 106,508 106,508 39,244 77,193 85,852 84,167 63,065 QIO A8 DIO 125,958 122,924 6,636 117,640 103,772 102,221 98,883 Qll A7 Dll 122,924 125,958 7,485 116,578 103,638 101,965 99,464 Table 3.4: Runtime for structural joins with exponential skipping strategy against STJ-D

AD-JoiUe again gives the time of a join on the same query using an exponential skip strategy, but this time for a descendant only join.

For queries Ql, Q2, Q5, Q6 and Q8, the reason for the superior performance of the skip-join variants over the STJ-D join is because of their abihty to skip through the ancestor node list quickly to filter out unmatched ancestor nodes. There is a close correlation in the performance speedup of skip joins with the ratio. Q12 is a special case where the result of a structural join is empty — in this case, the STJ-D algorithm is outperformed by our skip-join algorithms by two orders of magnitudes. This is again due to the effect of skipping descendant nodes.

However, for the rest of the queries Hsted in Table 3.4, the performance of skip-joins are only comparable to STJ-D join algorithm; this is due to the fact that the number of unmatched nodes is small. In the majority of cases, however, our skip-joins still outperform the STJ-D join algorithm. This comparable behavior can be attributed to two factors: Set Cardinality Time Taken (^s)

Q# A V A 7^ STJ-D AD-Joine A-Joine D-Joing

Qi A2 D1 128,533 1 1 66,518 117 122 118 Q2 A3 D2 240,685 227 116 119,747 2,359 2,434 2,391 Q3 A3 D3 240,685 820,037 557,868 450,257 815,160 1,528,636 1,485,116

Q4 A1 D4 1 375,225 375,225 197,807 205,983 235 167,333 Q5 A4 D5 3,424,646 1,155 1,155 1,754,825 28,833 29,855 29,504 Q6 A4 D6 3,424,646 50 50 1,742,093 3,237 3,246 3,231 Q12 D1 A2 1 128,533 0 44,349 309 380 381

Q7 A5 D7 30,000 92,624 92,624 72,243 114,355 152,286 147,618 Q8 A5 D8 30,000 5,426 5,426 21,567 20,762 26,058 25,253 Q9 A6 D9 106,508 106,508 39,244 77,193 180,529 187,655 109,155 QIO A8 DIO 125,958 122,924 6,636 117,640 337,705 342,628 219,683

Qll A7 Dll 122,924 125,958 7,485 116,578 342,435 338,975 218,537

Table 3.5: Runtime for structural joins with binary skipping strategy against STJ-D

1. The input lists (both the ancestor and descendant hsts) for these queries have

approximately equal cardinalty, and the number of nodes in the result set is

large. This means that the number of mismatched nodes is low, and hence the

chance to have a large region that can be skipped is also small.

2. When there are large numbers of common nodes between A and D, that is, the

two iterators walk in parallel rather than the "optimal" (for the skip strategies)

zig-zag iteration pattern (i.e., one iterator is fixed as a pivot whilst the other

iterator does a large skip).

In the case of Q3 and Q4, the AD skip join is outperformed by the STJ-D join by a small percentage of approximately 4%. The Q3 query evaluates //inproceedingsZ/author on DBLP; in DBLP, both "inproceedings" and "author" are very frequent, and within every "inproceeding" element, there is always a minimum of one "author" element. Therefore, skipping through ancestor or descendant lists is useless for this query. As a result, the skipping algorithms reduce to the behavior of STJ-D. The extra in time taken is due to a number of small redundant skips. Q4 evaluates //dblp//*, which has an extremely small ancestor list of only one element, and an extremely large descendant list (the entire database). Again, in this scenario, skip- ping is not useful and the extra operations become an overhead that STJ-D does not have. However, as can be seen from the results, the overhead is only 4%, which is still quite acceptable given the gains on other queries. We also note that both the

A-Join and D-Join are actually faster than STJ-D, mainly because the results are only single nodes, and hence there is reduced usage of the stack (depending on the number of input ancestor nodes). Similar results hold for Q7.

The queries Q9, QIO and Qll are performed on random data sets created by XMark.

The data set is highly nested, with the practically rare and unnatural property that two distinct element names interleave each other multiple times on a single path

(e.g. //keyword//bold//keyword//bold). Note the ratio of on Q9 is two, which means that, on average, mismatched descendant nodes and matched nodes interleave each other. This pattern makes skipping difficult and hence the algorithms yield similar performance to that of an STJ-D join. It is interesting to see that D-

Join does perform slightly better on random and highly nested data, because the

D-Join algorithm does not interact with the stack.

So far, all exponential skip-joins have similar performance on all queries with the exception of Q4. However, the A-Join outperforms the other two skip join techniques by a significant margin. This is because for Q4, all DList nodes are matched under the same ancestor node, therefore almost all descendant nodes are skipped using

BD-SKIP.

We can also see from our experiments that stack manipulation does add notable overheads to the stack-based structural joins. For example, in Table 3.4, in queries where the STJ-D join outperforms both the AD-Join and A-Join, and a D-Join outperforms all other types of skip-joins. This is due to the fact that the stack is not maintained in D-Join, because only matching descendants are returned, whereas a stack has to be maintained in both AD-Join and A-Join.

Let Ar and As are the sets of nodes in A that will and will not be included in the result set, and similarly let Vr and Vg be the corresponding subsets oiV. If we use an AD-Join as an example, on the basis that the skipping strategy of all skipping functions are perfect, i.e., there are no unnecessary skips, then the minimum bound of the runtime cost is 0(|A| + + I'^l), which, if no skipping occurs, becomes 0(|.4| + \D\ + as in that case Ar = A and T>r = V. In other words, the worst case performance of our proposed skip-joins is the same as that for STJ-D algorithm.

In Table 3.5, the last three columns show the performance time of AD-Join, A- Join, D-Join using the binary skipping strategy instead of the exponential skipping strategy. From the results, the performance for Ql, Q2, Q5, Q6, Q8 and Q12 matches exponential skipping. However, with the exception of Ql, all binary skip joins are slower than exponential skip joins. This is because of the logn nature of binary search; that is, even for AList or DList that have small gaps between matching nodes, it will still cost logn skips to search for the next node. However for Ql, the binary skipping strategy permits larger jumps through the AList, and hence has better performance than exponential skip.

3.4.3 Summary

To summarise our experimental results, our proposed skip join algorithm performed very well for Ql, Q2, Q5, Q6, Q8 and Q12, where the returning result node sets are small in size and there are large differences in size between AList and DList. Compared to the STJ-D algorithm, we were able to achieve up to three orders of magnitude in performance for the above queries. In general, the times for A-Join and D-Join are always faster than for the STJ-D algorithm, and in most cases faster than AD-Join. Therefore, we recommend the query optimizer should utihze A-Join and D-Join more for structural joins. However, the AD-Join still performns very closely to STJ-D Join for most queries where large result sets are returned. In the case of Ql, Q2, Q5, Q6, Q8 and Q12, an AD-Join was still able to outperform an STJ-D join by several orders of magnitudes. The experimental results also show that the exponential skipping strategy outperforms the binary skipping strategy, and that therefore we should adopt exponential skipping strategy as a default for future implementation.

3.5 Conclusions

This chapter has focused on improving the algorithms for structural join, a core op- eration for XML query processing. We presented a simple, yet efficient, improvement to the work of Al-Khalifa et al [7], which skips unnecessary nodes in the ancestor and descendant lists. In constrast to from [20], our method does not require any auxiliary index structure and hence is significantly easier and cheaper to maintain. It can also be implemented in non-database applications such as an XSL processor, which does not normally have a built-in B-Tree index, as well as into a streaming XML data processor.

Furthermore, informal justifications of the effect of updates on the structural join problem have been presented. Since the ordering scheme we used in this chapter is based on the use of preorder and postorder identifiers, the update cost is identical to those analyses and experiments performed in [11] and our other work [35]. By employing the use of gaps in a theoretically sound fashion, the amortized update cost is much lower than the update cost in other tree-based labeling schemes such as [50 .

Finally, extensive experiments on both real-world data and synthetic data have shown that our extension has improved the performance of the state-of-the-art struc- tural join algorithm [7] by orders of magnitude at the best.

We believe that there is still a wide range of interesting research problems in this area. In particular, we are currently investigating the extension of our work to produce a query optimization framework in the presence of ordering. Similar work in this area includes [16, 59^. Algorithm 3.8 Adaptive structural joins that return ancestor-descendant node pairs (AD-Join). A SKIP {a,d,AJa) 1 minj^ 0 do if F0LL0WiNG(T0P(S'iac/c),^[a]) A Following (Top (S'iac/c), P[ii]) then FoF{Stack) elif PREORDER(^[a]) < PREORDER(I>[ci]) then PuSH(^[a],5iac/c) a ^ A-SKip{a,V[d],A, la) 9 la ^ la ^ I 10 la ^ La[0] if la > La 11 else 12 ApPEND((s,D[cí]),7^),Vs G Stack 13 if Stack\>0 then 14 15 else 16 d ^ D-SKip{d,A[alVJd) 17 ¿d ^ ¿d + 1 18 Id ^ Ld[0] IF Id > Ld Chapter 4

Intrinsic Skew Handling in Sort-Merge Join

Eventually, all things merge into one, and a river runs through it.

— Norman Maclean (1902-1990)

4.1 Introduction

In relational algebra, selection (cr), projection (tt) and cross-product (x) are three

of the most basic operators required to express retrieval requests on multiple re-

lations. Real database management systems implement these relational operators

using the additional join (M) operator [22], which combines all three operators using

pipelining. This reduces the number of steps that generate temporary files on disk.

These temporary files can take up quadratic space relative to the join relations.

This means that there is a need to perform these join operations efficiently and

any improvements on join operator performance will be benefitial to all database applications.

Block Nested-Loop Join [74] works in all join cases and provides consistent per- formance. However, it does not avoid the quadratic lookup for join relations. In previous research, numerous join algorithms have been proposed and can be cate- gorized as — Single-Loop Join, Hash Join [29] and Sort-Merge Join [14]. Hash join is in fact a subset of single-loop join because it also uses the concept of an access path to retrieve the matching tuples. However, it deserves its own category due to the significant attention it has received.

Sort-merge join addresses the drawbacks of hash join. Firstly, sort-merge join does not rely on access path and does not require pre-built indices, so it can be performed on any non-prime attribute. Secondly, sort-merge join demonstrates excellent per- formance in cases where the join attributes are already sorted. Since sort-merge join accesses data linearly during the merge phrase, it does not suffer from any of the disadvantages (such as loss of cache locality) that affect hash join. Thirdly, when a query requires multiple joins, sort-merge join performs better than hash join because the results are already sorted on the join attributes after the first join.

However, data skew occurs in data and affects all join algorithms, except for the simple nested-loop join. There are two types of skew, partition skew and intrinsic skew. Partition skew is implementation dependent and occurs when the data is not evenly distributed on the index. All single-loop join algorithms suffer from this. For example, when most of the data is hashed to a small subset of hash buckets using hash join (due to adverse data in relation to the hash function), this degrades the constant time complexity of the lookup to linear time in the worst case.

This also occurs with parallel hash join. This uses a data partition scheme, which partitions the join operation across multiple machines. This allows certain machines have higher loads while some machines are idle. As partition skew affects hash join, many papers [30, 46, 87] address the issue of intrinsic skew on hash joins.

Unlike partition skew, intrinsic skew is unavoidable. This is because it is the data itself that skews towards a smaller subset of possible values. This only happens when the join attributes contain non-prime attributes. For example, an attribute age skews towards the range from 0 to 100 instead of the entire integer domain. Even within the subset of values, it may have an uneven distribution. Intrinsic skew is unavoidable in all hash join algorithms because identical value will be hashed to the same bucket. This results in a significant loss in performance because such hash joins are generally avoided in the presence of significant intrinsic skew. Hence, sort-merge join is preferred.

Unfortunately, sort-merge join also suffers from intrinsic skew. However, the ef- fects of intrinsic skew on sort-merge join has not been studied thoroughly. Most sort-merge algorithms in database textbooks ignore data skew completely. Most textbooks state: for two relations with cardinaUty of \L\ and \R\, the time com- plexity of sort-merge join is 0(|L| log |L|) + 0{\R\ log\R\) -h 0{\L\ + \R\). They also give similar run time on block-based sort-merge joins. In fact, the time complexity of the merge phrase alone can be as bad as 0{\L\ x \R\). This occurs when there is significant skew in both relations, making nested-loop join look desirable. The most recent research on intrinsic skew handhng of sort-merge join is by Li et al

88], which contains several improvements, this chapter further improves on this by adding extra techniques.

Recently relational database systems have been used for other non traditional data, such as temporal data, semistructured data such as XML [15]. These database systems make use of special types of joins such as temporal join and structural join.

These joins have a similar behavior to band join where intrinsic skew is a norm.

Thus, improvement on intrinsic skew for sort-merge join becomes very attractive and a significant problem to solve.

This chapter presents a study of techniques for dealing with high intrinsic skew in sort-merge join in relational database systems. Our main contributions are:

• We define all possible scenarios of intrinsic skew in joins and hst the most

efficient method to process those scenarios using sort-merge join.

• We present several simple extensions which, without using extra indices, are

able to minimize the impact of performance in sort-merge join with high in- trinsic skew in both relations.

• Using the notion of skipping strategy from the previous chapter, we present an extension to process sort-merge join, without having to scan the entire hst during the merge phrase.

4.2 Formal Definitions

Join A two-way equijoin operator involves two relations L{A) and R{B), which are called them outer relations and inner relation respectively. Both relations have cardinahty of |L| and with arity of a and b. X e A and Y e B axe the join attributes, where ^Xi G X,\fyi G Y,dom{xi) = dom{yi). For simplicity of presentation and without lost of generality, we assume there are no extra projections, selections or function calls on both relations. Thus all tuples are join candidates.

The join operation is L MA=B R and the arity of the result relation is a-h 6. We also define two memory buffers L' and R' to hold the tuples in relations L and R.

4.2.1 Intrinsic Skew

Figure 4.1 shows the type of intrinsic skew — outer skew, inner skew and four cases of combined skew.

Value Packet: Using the same terminology as Graefe [43], value packet is defined as a contiguous tuple where their join attributes have identical values. A single value packet can span across multiple disk blocks, and a disk block can contain multiple value packets.

Outer Skew: Outer Skew is the first type of intrinsic skew, where all value packets occurs within the outer relation. This scenario occurs when the join attributes of outer relation contains non-key attributes, while the join attributes of the inner outer skew inner skew combined skew case 1 case 2 case 3 case 4 y s/ I y I • A !! 11IIII11 I)! 11 r I ri 11 outer r inneIr outer inner outer inner outer inner outer inner outer inner

Figure 4.1: Types of intrinsic skew in sort-merge join relation is a subset of key attributes. For example, user would like to express a join under the following conditions:

b, c) Mb=d^c=e S{d, e, /)

Both join attributes d and e in relation S are key attributes. However, join attribute c in relation Rìsa non key attribute. Therefore it is possible for value packets occurs in outer relation.

Inner Skew: The second type of intrinsic skew is similar to outer skew, but value packets occur only at the inner relation, For example:

Sjd.eJ) Mb=d,c=e

Combined Skew: A scenario where value packets occurs in both relations. For example:

R{a, b, c) Mc=/ S{d,eJ)

66 4.3 Sort-Merge Joins

4.3.1 Traditional Block-based Sort-Merge Join

This section shows the intrinsic skew scenarios that work in traditional sort-merge join without skew handUng and the sort-merge join with skew handhng that handle all intrinsic skew scenarios but without optimizations.

Algorithm 4.1 Traditional block-based sort-merge join. Works with either both relations with no skew, or only inner skew is present.

SORT-MERGE-JOIN (L, R) sort relation L using buffer L' on join attributes X E L sort relation R using buffer R' on join attributes Y E R NEXT-BLOCK(/, L, V) NEXT-BLOCK(r, R, R') while I < \L'\ and r < do if L'[l]{X) < R'[r]{Y) then NEXT-TUPLE(/, L, V) elif L'[l]{X) > R'[r]{Y) then 9 NEXT-TuPLE(r, R, R') 10 else I I successful join

11 output L'[l] o R' 12 NEXT-TuPLE(r, R, R') NEXT-TUPLE (6, B, B')

if 6 = B' then NEXT-BLOCK(6, B, B') NEXT-BLOCK (6, B, B') if B.cursor < \B\ then read block Bcursor in relation B into buffer B' B.cursor ^ B.cursor + 1

The simple block-based sort-merge join is presented in Algorithm 4.1. First, both relations are sorted. Next, both relations are read block by block from disk to the buffer. Within each relation, one memory pointer is maintained for scanning the buffer. The algorithm increments the pointer that points to a tuple with a smaller join attribute. If both tuples have the same join attributes, the algorithm joins the two tuples and forms the result tuple. Obviously, with no skew, the merge phrase only needs to scan the two relations once. This algorithm assumes no skew on both relations. However, notice that it only increments the pointer on the inner relation

(line 12) when it successfully merges two tuples. This makes the algorithm also work when the skew happens only on the inner relations R (inner skew). For those cases, the query optimizer should only pick this simple but efficient block-based sort-merge join. By changing line 12 to increment the outer relation, the same algorithm works when only outer skew is present. By observation, as there are no repeated reads on any tuple on both relation using Algorithm 4.1, the disk read cost is \L\/L'R'.

4.3.2 Sort-Merge Join with Combined Skew Handling

While the simple block-based sort-merge join in Algorithm 4.1 fail when both re- lations have skew. Algorithm 4.2 processes the query correctly in a combined skew scenario. Algorithm 4.2 marks the block position and pointer position of the inner relation for a successful merge. It then produces a Cartesian product of the tuples of both relations with the same join attributes by: first joining all the subsequence inner relation tuples with the same join attributes and returning to the marked block position and marked pointer position. Next, it increments the pointer of the outer relation. In this case, the next tuple from the outer relation can be merged with the same tuples in the inner relation again if any skew occurs. However, the cost of the merge is not as simple as 0{\L\ + \R\) anymore. It depends on the type of combined skew.

There are four subtypes of combined skew, which are shown in Figure 4.1. In case 1 of combined skew, all value packets occur within the same block. Assuming the best case scenario, where no value packets lies across disk boundary, the merge phrase of

Algorithm 4.2 has \L\/L' -h \R\/R' disk read, meaning no extra disk read occurs.

In case 2, some of the value packets on the outer relation span across multiple blocks, Algorithm 4.2 Traditional block-based sort-merge join, works when both relations contain skew. SORT-MERGE-JOIN-SKEW(L, R) sort relation L using buffer L' on join attributes X £ L sort relation R using buffer R' on join attributes Y E R NEXT-BLOCK(/, L, L') NEXT-BLOCK(r, R, R') while I < \L'\ and r < do if L'[l]{X) < R'[r]{Y) then

NEXT-TUPLE(/, L, L') elif L'[l]{X) > R'[r]{Y) then 9 NEXT-TuPLE(r, R, R') 10 else I I successful join 11 b

but no value packets on the inner relation span across multiple blocks. Interestingly, there is also no overhead in this case on Algorithm 4.2.

In case 3, the value packets span across multiple blocks on the inner relations. In case 4, the value packets span across both relations. Both case 3 and 4 degrade the performance of the merge phrase significantly because every tuple in the value packet of L result is re-read for every block in the value packet oi R. As a single join usually consists of a mixture of the 4 cases, it is impossible to quantify the average time complexity of Algorithm 4.2. The time complexity also depends on the buffer size. However, in the worst case, the time complexity approaches that of nested-loop join as more of case 3 and 4 occurs. Obviously the degradation of linear to quadratic ovl ov2 ov(m-l) ov(m) 1 cursor tl t2 tl t2 t3 t4 tS t6 tl t2 t3 t4 t5 t6 tl t2 t3 - "" skippable range " " - ^ outer )ooo|[¥booo»|[b»#Qoe||ooo>y/ outer )ooo|[o"ooooo|foooooo"]|oQo»} inner (00»»|[b00000||000Q0dl|00#^ inner (oooo|[Boo®OO||QOOQOQ||OO»^ tl t2 t3 tl t2 t3 t4 t5 t6 tl t2 t3 t4 t5 t6 tl t2 ivl ¡v2 iv(n-l) iv(n) tr cursor

(a) An example of multiple value packets in (b) An example scenario that skipping can be both relations performed

Figure 4.2: Example of combined skew in sort-merge join in the merge phrase is not desirable because a combination with the time complexity of sorting makes the sort-merge join perform worse than nested-loop join.

4.4 Improvements

In this section, we show several improvements that reduce the number disk reads under significant intrinsic skew, in a combined skew scenario.

4.4.1 Localized Casterian Product

Notice that in Algorithm 4.2, although it handles intrinsic skew, it re-read the whole value packet in R for every tuple in the value packet in L. Thus, the same pairs of outer and inner blocks can occur many times in one join. First, the algorithm should be extended such that it does the Casterian product of all value packet tuples locally within two blocks before loading the next block. Using Figure 4.2(a) as an example, the beginning of a successful merge starts when the pointers I and r point to tuple (oi;i,ii) and First we increment the r pointer as usual until the end of ivi. Instead of loading the next block iv^, we start incrementing I to (o^i, is) and merge from again, until I reaches the end of ovi. Now all tuples in ovi have merged with ivi. Therefore, we can load the next block iv2 by incrementing r, setting I back to the beginning (ovi.tl) and start to merge tuples in ovi with iv2.

The following shows the sequence of merges: (ovi,ti_2)x(ivi,ti_3), ... (oz;i,ii_2)x(ii;n,ii-2>, {0V2,ti^6)x{m,ti_s), ... {0V2,ti-G)x{iVn,ti-2),

Using localized block read, we significantly decrease the number of quadratic block reads from L'm x R' n to mn.

4.4.2 Rocking-Scan Within Value Packets

We realized that the Cartesian product of all tuples is unavoidable, as it is the intended behaviour — since those tuples are part of the join result. However, we can still further minimize the number of disk reads. Another technique that apphes to block nested-loop join that can be equally apphed to the merging of two value packets is called rocking-scan [55]. Instead of re-reading the beginning of the value packets {ivi.ti) from R, we read the blocks of value packets backwards from ivn back to ivi, saving the extra re-read of iVn- In the third scan of the value packets of R, we scan forward again and save a re-read of ivi. Basically the scan of the inner relation is in a zig zag manner and thus saves one block per scan. The following shows the sequence of merges:

(0i;i,ii_2>x(wi,ii_3), ... {0Vi,ti-2)x{'iVn,tl-2), (0i;2,il-6>x(iVn,ti_2), {0V2,ti-G)x{iVi,ti-s), (0t'3,il_6)x(iVi,ti_3), ... (0^3,ii_6>x(Wn,il-2>,

(oVm.ti-s) X (iflorn,il-3orl-2), • • • (ot'm, ^l-s) X (iVnor 1, ti_2 or l-s) ,

It is possible to use rocking-scan in the tuple level as well as block level, but it only complicates the algorithm and does not offer any real improvements. Notice if m is a even number, when the rocking-scan terminates, buflPer R' will hold ivi instead of iVnt therefore we must reload iVn again. This neutrahzes the block save of the last scan. However, in general, using rocking-scan, we can reduce the disk read by m — 1.

4.4.3 Shifting Buffer Offset

In most cases, a multiple blocks value packet does not fully occupy its beginning and tailing block. Thus, we do not utilize the buffer efficiently. It is sensible to treat them separately. There are several issues we must consider.

• The head blocks of both value packets in the outer and inner relations are already in the buffer.

• The query engine does not have a priori knowledge of the number of blocks a value packet occupies until it does a first scan of the value packet. Thus we cannot perform rocking-scan before that happens.

• The end blocks of both value packets should be the last blocks that are read

into a buffer such that we don't need read them again to continue the merge

phase.

• The percentage of tuples that the value packet takes in the head and tail block

in both value packets.

Since the rocking-scan is done on the inner relation, there will be no extra read for

the outer relation. If we know the combined size of value packet in both head and

tail block is less than one block, we can optimize by changing the block offset of

the inner relation to the beginning of the value packet. We can either do this after

the first scan of the inner relation in the rocking-scan or before the first scan. Both

having their pros and cons.

However, any block re-read is a duplicate read, as we can only use heuristics to

predict the existence a multiple skew. If there is no skew in the next block, or the skew spans across only two blocks, the extra re-read is significant in itself. If there exists significant skew in the outer value packet, then in general, we always adjust the offset of the inner relation. This is because re-offset saves m—l blocks in general and requires extra partial block read in the worst case.

In theory, if we treat the re-offset of the first read as constant, having rocking- scan on the inner relation, along with re-reading, then the number of disk read is optimal. This is because the optimal way to perform a Cartesian product is to perform locahzed Cartesian product as well as using rocking scan. In practice, re-offset is risky, as any re-read without priori knowledge may result in extra reads.

4.4.4 Heuristic for Significant Skew

To choose a join strategy for sort-merge join with skew, we need to identify the type of intrinsic skew present. However, we cannot afford to scan the blocks to know the size of the value packet before making a decision. Therefore, some simple heuristics are required. A reasonable assumption is that multiple blocks skew occurs when the value packet touches the end of a block. Also the occupancy of the beginning block and end block of a value packet can be used as a decision factor to choose re-offset strategy. If the occupancy is less than half, it is very likely that re-offset of the inner relation will yield one less block in the value packet. For example, in Figure 4.2(a), instead of loading from we can re-offset to then the value packet

in the inner relation takes 3 blocks instead of 4.

4.5 Skipping Join Candidates

Although value packets with identical join attributes on both relations degrade the

performance of the merge phrase, a value packet on one relation without another

matching value packet on the other relation can actually improve performance. Here

we try to use the those intrinsic skews as an advantage to improve performance. One

73 might assume that the merge phase has to read the entire relation, but in fact it is not essential. In some cases, it is possible to make use of skew to increase the performance by performing skipping at the tuple level and block level without any known statistics on both relations. Figure 4.2(b) shows the scenario that we can utilize skipping to increase performance for the merge phase.

4.5.1 Current Commercial Database System Approach

Most commercial DBMSs have specific code fragments to deal with advancing the tuple pointer in merge join. If the outer tuple is advanced, it is assumed that there is a higher chance that the next outer tuple will have smaller join attributes, so the outer tuple is checked first. Similarly, if the inner tuple is advanced, the inner tuple will be checked before the outer tuple. For example, postgres uses the above heuristic approach to determine which tuple to check in the next step.

4.5.2 Exponential-Then-Binary Skipping

We can take this idea even further, by assuming that if we have already advanced n contiguous tuples in one relation, we can further advance another n tuples. If we skip over a tuple, we do a binary search within the last n tuples, as shown in Figure 4.3. To reach the last n tuples, it takes logn steps, and it also takes another logn steps for the binary search. In theory it takes precisely 2logn steps to walk across the linear n tuples. In practice, we also need to take the buffer size into consideration. This approach is similar to the previous chapter but much simpler as no there are no stack involved.

4.5.3 Check Last Tuple Before Reading Next Block

Although we intend to save processing time, the main concern is still disk access. Therefore, at any point during skipping, before we jump to a tuple in the next block, I I /'—^ exponential skipping

binary searching

Figure 4.3: Example of exponentially increasing then decreasing of range during skipping we read the final tuple to ensure that we need not re-read the block again, as disk penalty is undesirable.

4.5.4 Aggressive and Conservative Strategy of Skipping Blocks

When the range of skips is over the size of a block, we might consider to skip one or multiple blocks entirely. This makes the best case 0(log \L\ + log \R\) read. For a more conservative strategy, one can load all blocks in the entire relation as usual, but read the last tuple instead of the first tuple so we don't have to traverse the block in memory. This approach saves processing time but not disk read time. Figure 4.4 shows an example of aggressive skipping and conservative skipping.

If one chooses to skip the entire block and overskips to a block 6, we need to read the previous blocks. We might not want to do a binary search at the block level as this increases the number of block reads. Instead, we read linearly from the last skipped point, i.e. using conservative skips for binary searching. However, with the aggressive strategy, in the worst case, if we assume there is no disk read gain from the exponential skipping, then there will be one extra read (block b) when necessary. However the extra read only happens when 21ogn > n, where n can only be 3. Therefore, choosing between conservative or aggressive skipping purely depends whether one is willing to accept the chance of having that particular penalty disk read. Aggressive skips definitely have better performance than conservative skipping on significant skew. (a) Aggressive skipping strategy (b) Conservative skipping strategy

Figure 4.4: Example of skipping strategy

4.5.5 Avoid Disk Penalty on Aggressive Skipping Strategy

However, even that extra penalty can be avoided. We can use half of the buffer size when we skip, instead of using the full buffer size. When we overskipped to a block Ò, we keep the overskipped block b in the half buffer, while using the other half of the buffer for the traditional merge phrase until the other half of the buffer catches up with block 6. Then we can utiUze the full buffer again.

4.6 Merge Phrase With Multiple Runs

So far we assume both the outer and inner relations are sorted in a single Ust of blocks. However, in a commercial DBMS using pipelining on query processing, if a relation has to be sorted first, which is a must on a skewed relation as it is not a primary key, then they usually combine the last step of the sort phase with the merge phase to save an extra write and read. Although subsequence joins do not have this problem as the result is already sorted, we need to tune the techniques to suit the merge phase with multiple runs such that it can be use in wider scenarios. However, the approach of Li et al [88] can be directly applied to our improvements, which we will not discuss in detail.

4.7 Performance Evaluation

In this section, we tested how intrinsic skew affects the original sort-merge join algorithm and the improved versions of it. In all experiments, we use only 16MB Sort-Merge Join with No Skew

1390

1380

I 1370

1360

1350

1340 SMJ SMJS SMJS with Localized SMJS with LCP and SMJS with LCP, RS SMJS with LCP, RS, Casterian Product Rociting Scan and Shifting Buffer BSO and Sicipping Offset

Figure 4.5: join with no skew on both relations main memory and we generate all the relations have size of 128MB, all with different amount of intrinsic skew.

4.7.1 No Skew

From Figure 4.6, we tested the performance of different improvments when no skew occurs; All improvements have a slight overhead. However, as no skew can be easily detected by their attribute type before the join, the traditional sort-merge join is still perfered.

4.7.2 Combined Skew

In this experiment, a relation with 1% skew denoted 1% of the tuples have duphcates.

Figure 4.5 shows the improvements over the traditional sort-merge join with skew handUng (SMJS). The performance of SMJS degrades significantly when the number of skew increases. However, all other extra improvements only shows advantage 2200 SMJS —H SMJS with Localized Casterian Product —x- 2100 SMJS with LCP and Rocking Scan SMJS with LCP, RS and Shifting Buffer Offset SMJS with LCP, RS, BSD and Skipping

2000

1900

1800

1 1700 Q. (U 1600

1500 »»IT-'-"

1400

1300 10 15 20 25 30 skew (pencentage)

Figure 4.6: Fix relation size with different amount of skew when large amount of skew occurs.

4.8 Conclusions

In this chapter, we improved on existing techniques that deal with intrinsic skew on non parallel sort-merge join. Sort-merge join being the primary operation used as structural join in relational database systems. However, the result also benefits all database applications that utilize sort-merge join. We first generalize it to can handle significant skews. Secondly, we achieve better than linear scan on certain skew cases. We show that the number of blocks read by the algorithm is optimal with skews. Further possible investigations might include using extra domain information, such as statistics and histograms, to further improve the skipping and prediction of heavy intrinsic skew. Chapter 5

Maintaining Succinct XML Data

To be brief is almost a condition of being inspired. — George Santayana (1863-1952)

5.1 Introduction

The popularity of XML as a data representation language has produced a wealth of research on efficiently storing and querying tree structured data. As the amount of XML data available increases, it is becoming vital to be able to not only query this information quickly, but also store it compactly. We thus turn to the problem of finding a succinct representation for XML: a space-efficient representation of the data structure which also maintains low access costs for all of the desired primitive operations for data processing. The flexibility of XML makes finding a scheme which satisfies both of these requirements at the same time extremely challenging. There are numerous reasons to maintain such a compact XML representation on secondary storage:

• Reducing space requirements improves cache locality: Even in the current envi- ronment of enormous secondary storage capacities, reducing the space require- merits for native XML databases is an important goal. A typical approach to representing XML in such databases is to keep at least four pointers per node, to the parent, first child, and immediate siblings. This approach can also be found in many XML tools such as libxml In the standard computa- tional model, where a pointer takes O(lgn) bits\ using the above approach to represent the topology of n nodes requires Q{n\gn) space. For large XML documents, this representation becomes infeasible for many applications, par- ticularly as the hidden constant in the space bound is relatively high. Fur- thermore, using more space also reduces the cache locality and has an adverse impact upon query performance.

• Indirection is expensive: There has been a large amount of work on the succinct representation of trees [38, 47, 48, 71, 72, 75, 76, 77], many of which come within a factor of the optimal lower bound on space. However, to achieve these lower bounds generally requires a significant amount of address indirection. Such schemes are not suitable for secondary storage due to the expensive cost of a random disk seek, which will generally be required upon each indirection. In general, there is a trade-off between space usage and indirection, which we will optimize for secondary storage devices in this chapter.

When looking for a succinct storage scheme for XML, there are many important issues that need to be addressed:

• It must support fast navigational operations: Many XML apphcations, such as collaborative document editing systems, depend upon efficient tree traversal, using a standard interface such as DOM. Halverson et al [45] demonstrated that a combination of navigational and structural join operators is most ef- fective for evaluating queries. Hence, it is imperative that the storage scheme supports fast traversal of the XML tree, in all possible directions, preferably in constant time or near constant time. Previous work, such as that of Zhang et al [98], has addressed the issue of succinctly representing XML, but at the

^In this chapter, Ign is the base 2 logarithm of n cost of linear time navigational operations, which is not acceptable for many

practical applications. Our structure efficiently supports tree navigation prim-

itives in 0(lgn/lglgn) time, and also includes support for efficient structural

joins.

• It must support efficient insertions and deletions: Several papers address the

space issue by storing XML in compressed form [17, 62, 70, 83]. They also

support path expression queries or fast navigational access but do not allow

efficient updates, which can be a critical concern in many real database appli-

cations. In this chapter, we provide a scheme which allows near constant time

updates in practice, with a theoretical worst case time of O(lg^n).

• It must support efficient join operations: Current query optimization tech-

niques for XML such as work of Halverson et al [45], make heavy use of the

structural join [7], which rehes on a constant time operator to determine the

ancestor-descendant relationship between two nodes. Thus, any general XML

storage scheme should also support such an operator in near constant time.

Our scheme supports ancestor-descendant queries in 0{\gn/\g\gn) time.

• It must be practical: Many succinct tree representation schemes are elegant

theoretical structures that unfortunately do not translate well into practice.

Thus, while theoretical guarantees are important for any proposed structure,

practical considerations should not be forgotten. In this chapter, we focus

on developing a practical storage scheme, using values with fit to the natural

machine word size, block size and byte alignment, to allow our scheme to be

used in real-world database systems.

• It must be simple: Ideally, as with B-trees, the basis of the data structure

should be simple and clean enough to be used as material for an undergrad-

uate course. Our scheme, while both extremely compact and efficient, is also

amenable to simple implementation.

• It should separate the topology, schema and text of the document: All XML

query languages select and filter results based on some combination of the

topology, schema and text data of the document. To allow efficient scans over these parts of the document, it is natural to find a representation that

partitions them into separate physical locations.

• It should permit extra indices: As different applications generally need to add

speciaUzed indices upon their data, general purpose database systems should

use a storage representation which is flexible enough to allow individual users

and apphcations to create extra indices with ease — this means that the

scheme must provide simple, efficient, and stable means of referencing items

stored using the scheme.

This chapter presents a data structure that addresses all of the above issues. Our structure uses an amount of space near the information theoretic minimum. For a constant 1 < e < 2 and a document with n nodes, we need 2en + 0(n) bits to repre- sent the topology of the XML document. Updates can be handled in 0(lg^ n) time, and all query operations take time. In practice, the constant factor in this expression is extremely low, so that query times are virtually constant. Our struc- ture also allows an efficient implementation of the structural join operator. Most importantly, the structure is designed to minimize indirections, and hence is sec- ondary storage "friendly". The practical efficiency of the structure is demonstrated through a comprehensive set of experiments.

The rest of this chapter is organized as follows: Section 5.2 summarizes relevant work in the field. Section 5.3 presents the basics of our succinct representation scheme, without considering the issue of efficient navigation or updates. Efficient updates are discussed in Section 5.4, and efllcient navigation in Section 5.5. The experimental results are then presented in Section 5.6, and Section 5.7 concludes the chapter. 5.2 Related Work

Since XML data can be modeled as ordered trees, storing XML succinctly is closely related to succinct tree representations. The earliest space efficient data representa- tions for static unlabeled trees were proposed by Jacobson [47, 48], who showed that the information content of a tree of n nodes is Igk'^ or 0(n) bits. Hence, any repre- sentation of such trees must use at least a linear amount of space. The author then gave a representation which used 2n bits, plus an additional o{n) bits, which sup- ported ordered tree operations such as finding the first child, next sibling, and parent of a node in O(lgn) time. The author also introduced two fundamental operations, rank and select, in terms of which all other operations could be implemented.

Early works on succinct representations all assume a static model, and hence are not easily generahzed to support updates. Clark and Munro [21] gave a binary tree representation using 3n bits, which was used as a Patricia trie to index large, static, text files whilst minimizing the number of disk accesses. However, their scheme does not support navigation to a node's parent, and hence it is not clear how to extend the scheme to support updates. Munro and Raman [71] then developed a scheme which essentially solved the succinct representation problem for static unlabeled binary trees, as it allowed 0(1) time navigational operations with asymptotically optimal space. This was achieved through the use of a balanced parentheses representation, partitioned into three tiers of blocks. However, for rooted ordered trees, finding the n-th child of a node took 0(n) time. On the other hand, the scheme of Benoit et al [13] can support this operation (and also all other navigational operations) in constant time. We emphasize that all these results hold only when no updates are allowed, which is clearly undesirable in a database system.

The first work giving a succinct representation for dynamic labeled trees was that of Munro et al [72], which supported binary trees with labels of constant size. However, it did not support trees of higher degree. Raman et al [75] later extended the 2n bit representation of Jacobson [48] to a special case of the updatable partial sum problem called the dynamic bit vector problem. It supported rank and select with updates in 0(lgn/lglgn) time using an extra o{n) bits space. Alternatively, the structure supported 0(1) time for rank and select with updates in O(n^) time, allowing a trade-off between time and space. Raman et al [77] also considered the space and time cost overhead used by the memory manager. They further improved the lower bound for labeled dynamic binary trees, supporting navigational operations in 0(1) time with updates in 0((lg Ig and o{n) additional space. One problem with all of the above approaches is that they do not distinguish between the labels of internal

and leaf nodes; in practice, XML data has very few unique internal node labels, but

many unique leaf node labels. Since these schemes use constant size labels, the large

number of unique text nodes in an XML document can cause a dramatic blowup in

space usage. Furthermore, Raman et al [77] took little consideration on minimizing

accesses to secondary storage, which is a concern for any large data set.

Since XML documents are represented as ordered trees, there is a close relation

between this problem and the order maintenance problem addressed by Dietz and

Sleator [11, 31]. Most XML storage schemes, such as [44, 45, 49, 60], make use

of interval and preorder/postorder labeling schemes to support constant time order

lookup, but fail to address the issue of maintenance of these labels during updates.

Recently, Silberstein et al [80] proposed a data structure to handle ordered XML

which guarantees both update and lookup costs. The primary difference between

this chapter and Silberstein el al [80] is that we also attempt to minimize space usage

(and in fact keep the space requirement near the information theoretic minimum).

The work most related to this chapter is that of Kanne and Moerkotte [51], Geary et

al [38] and Zhang et al [98]. The Natix system proposed by Kanne and Moerkotte,

although efficient for storage, does not address the order issue in any way, and hence

incurs a worst case 0(n) cost to compare the document order between two nodes.

Consequently, as shown in [65], this storage scheme hmits the potential choices for query optimization, as the query optimizer cannot choose a plan which disregards document ordering during intermediate query processing, since it cannot sort the final result.

Geary et al [38] used a static approach that decomposed XML into two tiers of trees. dblp • inproceedings t @mdate author title year booktitle / \ \ \ 2003-06-23 £) uilman ^P^^^i^S ^^^ Efficiency of 2003 ^IGMOD Conference Database-System Teaching.

Figure 5.1: A DBLP XML document fragment

Their structure supports all operations in 0(1) time using an asymptotically optimal 2n + o(n) bits of space. However, they used a fixed number of bits for every label, and did not address the vastly different size of alphabets for internal node labels (element labels) and leaf node labels (text data) found in practical XML data. More seriously, they partitioned the tree in such a way that a node can appear multiple times in the representation, which makes it non-trivial to generalize the structure to support updates.

The succinct approach proposed by Zhang et al [98] targeted secondary storage, and used a balanced parentheses encoding for each block of data. Unfortunately, their summary and partition schemes support rank and select operations in linear time only. Their approach also uses the Dewey encoding (which is a variable length, root-to-leaf path identifier) for node identifiers in their indices. The drawbacks of the Dewey encoding are significant: updates to the labels can require linear time, and the size of the labels is also linear to the size of the database in the worst case. Thus, the storage of the topology can require quadratic space in the worst case.

5.3 Data Storage

In this section, we give a general overview of our succinct storage scheme for XML data. Sections 5.4 and 5.5 will then discuss update handling and optimization in more detail. Our storage structure consists of three main components, as shown in Topology Internal Node Layer Leaf Node Layer (Tags) (Text Data) N Tier 2 1 • Symbol Table, Offset Table Tier 1 Topology Labels + Z5X TierO ((())( {))({) Text Data Signatures Character Data V

Figure 5.2: Overview of the data structure

Figure 5.2:

1. Topology layer: this layer stores the tree structure of the XML document, and facilitates fast navigational accesses, structural joins and updates.

2. Internal node layer: this layer stores the XML elements, attributes, and sig- natures of the text data for fast text queries.

3. Leaf node layer: this layer stores the text data in the document.

5.3.1 Representation of Topology

Jacobson [47] showed that the lower bound space requirement for representing a binary tree is Ig(Cn) = lg(4" • G(n~2)) == 2n + o(n) bits, where the Catalan number Cn is the number of possible binary trees over n nodes. As XML documents can be modeled as unranked ordinal trees, we can use the mapping scheme proposed by Jacobson to map XML documents to binary trees. Based on this, if we exclude tag name and text data from an XML document, the tree structure of the document can be represented using one of the many asymptotically optimal encodings described in Katajainen [53] that use exactly 2n bits.

For our storage scheme, we use the balanced parentheses encoding from Katajainen 53] to represent the topology of XML. This encoding reflects the nesting of element nodes within any XML document. This encoding can be obtained by a preorder traversal of the tree: we output a left parenthesis when we first visit a node and a • io A.- ^r'^ - O /

000011001100110011001111 (((())(())(())(())(()))) lit I I Uf I Lff I l^t t

Figure 5.3: Balanced parentheses encoding of Figure 5.1 right parenthesis when we return from the traversal of its descendant nodes. Fig- ure 5.3 shows the balanced parentheses encoding of the XML document from Fig- ure 5.1. Herein, we will interchangeably use 0 and ( to represent left parentheses, and 1 and ) to represent right parentheses. We also define:

• The position of the left parenthesis of node x in the encoding. We will sim-

ply write X instead of when the context is clear. For example in Figure 5.3,

author ( = 6.

• xy. The position of the right parenthesis of node x in the encoding. For

example, title) = 13 in Figure 5.3.

• excess: The excess is the difference between the number of Os and Is occurring in a given section of the topology. For instance, in Figure 5.3, the excess of be- tween dblp^ and Omdate) is 3 and the excess between "2003") and booktitle^ is -1. Note that measuring excess from the beginning of the document gives the depth of the corresponding node in the tree.

There are two benefits of this encoding:

1. Each node is encoded using a fixed number of bits, which can help to simplify the indexing mechanisms and provides a better fit with secondary storage. 2. The position of the parentheses gives an impUcit region algebra representation

of the XML document. This allows us to answer ancestor-descendant queries

on any two nodes: x is an ancestor of y if and only if < < X).

5.3.2 Representation of Elements and Attributes

As our representation of the topology does not include a O(lgn) bit persistent object identifier for each node in the document, we must use an approach like that described in Munro [72], in which we make the element structure an exact mirror of the topology structure. This allows us to find the appropriate label for a node by simply finding the entry in the same position of the element structure. A pointer based approach would require space usage of G(nlgn), which is undesirable.

The next issue is to handle the variable length of XML element labels. We adopt the approach taken in previous work [83, 98], and maintain a symbol table, using a hash table to map the labels into a domain of fixed size. In the worst case, this does not reduce the space usage, as every node can have its own unique label. In practice, however, XML documents tend to have a very small number of unique labels. Therefore, we can assume that the number of unique labels used in the internal nodes (E) is very small, and essentially constant. This approach allows us to have fixed size records in the internal node layer.

We handle other XML constructs, such as processing instruction and comments, in the same way by using the same hash table. But as we want E to be small, we must not insert character data into the same symbol table, as that would rapidly increase the space used. Thus, we map all character data to an additional label, and handle the actual character data separately.

By limiting the maximum allowed number of unique element and attribute names per XML document to E, we need an extra IgE bits of space for each label and

0(E) space for the symbol table. Figure 5.4 shows an example of the storage of the element array that mirrors the parentheses array. (((()• )(() ())(( ))))

vl v2 V3 v4

vl= HASH2 ( ) v4= HASH2 ( ) v2= HASHl ( ) v5= HASHl ( ) v3= SIGNATURE ( "Impro..." ) </p><p>Figure 5.4: The relationship between the topology and element label structures </p><p>Note that each element in the XML document actually has two available entries in the array, corresponding to the opening and closing tags. We could thus make the size of each entry ^IgE bits, and split the identifier for each elements over its two entries. However, the two entries are not in general adjacent to each other, and hence splitting the identifier could slow down lookups — as we would need to find the closing tag corresponding to the opening tag — and decrease cache locality. Hence, we prefer to use entries of Ig E bits and leave the second entry set to zero; this also provides us with some slack in the event that new element labels are used in updates. </p><p>Since text nodes are also leaf nodes, they are represented as pairs of adjacent unused spaces in the internal node layer. We thus choose to make use of this "wasted" space by storing a hash value of the text node of size 2\gE bits. This can be used in queries which make use of equality of text nodes such as //* [year="2003"], by scanning the hash value before scanning the actual data to significantly reduce the lookup time. </p><p>5.3.3 Representation of Text Data </p><p>The final layer of our data structure deals with text data storage. We maintain an array of Ign bit pointers, one for each text node, pointing to its character data. The actual storage of the character data then reduces to the traditional problem of storing variable length records. We have two choices for indexing the array: Algorithm 5.1 Unoptimized, linear time, basic topological operations FORWARD-ExcESS(siari, end, excess) 1: for each current from start to end do 2: if tier0[current] = ( then 3: excess excess — 1 4: else 5: excess excess H-1 6: if excess = 0 then 7: return current 8: return NOT-FOUND BACKWARD-EXCESS {start, excess) 1: Similar to FORWARD-EXCESS but going backward PREV (node) 1: if node > 0 then 2: return node — 1 3: else 4: return NOT-FOUND NEXT (node) if node < |i2er0| then return node + 1 else return NOT-FOUND </p><p>• The most concise representation is to pack the array tightly, so that the </p><p> entry corresponds to the text node. However, this then means that it takes </p><p>0{i) time to find the value, since we do not exphcitly store the text node's </p><p> position in our structure. </p><p>• A less concise representation would be to make the array's structure mirror </p><p> that of the element label layer. Then, given a position in the element label </p><p> array, we could find the corresponding entry quickly. However, this would </p><p> waste space for the entries corresponding to non-text nodes. </p><p>In our scheme, we choose the first method. The reason for this is that the space savings can be significant, and we will see in Section 5.4 a way of substantially reducing the lookup time. In the worst case, the storage requirement of this method is I Ig n bits, because potentially half of the nodes can be character data. In practice, the number of text nodes in XML is within a constant factor of the number of element nodes, so this layer generally uses 6(nlgn) bits space. However, the space requirement is much reduced by treating elements and text data separately. For instance, if we assumed that the number of elements is c times the number of text nodes, and that S was the amount of space taken by considering element nodes and text nodes together, then our scheme would use approximately 5/(0+1), a significant space saving for large c. </p><p>5.3.4 Navigational Operations </p><p>Algorithm 5.2 Navigation operations FIND-CLOSE(nocie) 1: return FORWARD-EXCESS(nocie, |izerO|, 0) FIND-OPEN {node) 1: return BACKWARD-EXCESS (node, \tierO\, 0) PARENT {node) 1: return BACKWARD-EXCESS (node, \tierO\, 2) FIRST-CHILD {node) 1: if tierO[NEXT{node)] = { then 2: return NEXT(node) 3: else 4: return NOT-FOUND NEXT-SIBLING {node) 1: if iier0[NEXT(FiND-CLOSE(node))] = (then 2: return NEXT(FiND-CLOSE(noiie)) 3: else 4: return NOT-FOUND </p><p>We now give a brief description of how one may implement navigational operations on this storage scheme. The functions in Algorithm 5.1 are the basic access oper- ations. If X is the position of a parenthesis in an array of balanced parentheses, </p><p> then the function NEXT(A;) returns the position of the next parenthesis in the array </p><p>(for this simple data structure, this is a trivial function). The function PREV(X) is defined analogously. The function FORWARD-EXCESS(SIARI, excess) will scan for-</p><p> ward from start along the array and return the position end of the first parenthesis </p><p> satisfying the given excess from start Function BACKWARD-ExCESS(siari, excess) </p><p> scans backward along the array and returns the first position end such that the </p><p> excess between end and start is equal to excess. </p><p>Apart from the basic access operations mentioned in Algorithm 5.1, other essential </p><p> navigational operations are shown in Algorithm 5.2. As can be seen from the def-</p><p> initions in Algorithm 5.2, the navigational operations are closely tied to the basic </p><p> access operations. Therefore, the speed of the basic access operations is the deter-</p><p> mining factor for the performance of our navigational operations. However, both </p><p> forward and backward excess operations in Algorithm 5.1 take hnear time, which is </p><p> unsatisfactory. This is addressed in Section 5.5. </p><p>5.4 Handling Updates </p><p>So far, we have treated the balanced parentheses encoding as a contiguous array. </p><p>This scheme is not suitable for frequent updates as any insertion or deletion of data </p><p> would require shifting of the entire bit array. In this section, we present a small </p><p> modification to our storage scheme, that changes the space usage from 2n to 2en, </p><p> where € > 1, so that we can efficiently accommodate frequent updates. </p><p>5.4.1 Empty Space and Density Thresholds </p><p>It is obvious that in order to efficiently handle frequent updates, we need to have </p><p> some empty space within the array to minimize the chance of shifting the entire </p><p> array. In our approach, we first divide the array into blocks of \B\ bits each, and </p><p> store the blocks contiguously. Within each block, we leave some empty space by </p><p> storing them at the rightmost portion of each block. Now, we only need to shift 0{\B\) entries per insertion or deletion. We can control the cost of shifting by </p><p>92 Density Depth d,=37.59^^ [0.50, 0.75] 0 ^ , - ' -iv3V - , ^ dg =62.59^^," [0.42, 0.83] 1 ^ -iv2V ^ ^ d6=56.259^^ d7=68.75% [0.33,0.92] 2 AyO\ Avl\ y bO ^ bl h2 ^ ^ b3 b4^ ^ [0.25, 1.00] 3 Mliii iMI ^ II " Bl"" ^ di=62.5% d2=50% d3=75% d4=62.5% d5=50% </p><p> d: density within a range of blocks height of virtual binary trie: 3 </p><p>Figure 5.5: Densities of tlie parentlieses array and the corresponding virtual balanced trie with block size \B\ = S and height = 3. adjusting the block size. Section 5.5 will discuss in detail how we can use auxiliary structures to keep track of the total number of parentheses per block. </p><p>After the initial loading of an XML document, the empty space allocated to leaf nodes will eventually be used up as more data is inserted into the database. There- fore, we need to guarantee an even distribution of empty bits across the entire parentheses array, so that we can still maintain the 0{\6\) bound for the number of shifts needed for each data insertion. This can be achieved by deciding exactly when to redistribute empty space among the blocks and which blocks are to be involved in the redistribution process. </p><p>To better understand our approach, we first visualize these blocks as leaf nodes of a virtual balanced binary trie, with the position of the block in the array corresponding to the path to that block through the virtual binary trie. Figure 5.5 shows such a trie, where block 0 corresponds to the leaf node under the path 0 ^ 0 —> 0, and similarly block 3 corresponds to the path 0-^1^1. For each block, we define: </p><p>• L: the total number of left parentheses within a block. </p><p>• R: the total number of right parentheses within a block. </p><p>• DENSITY(6): the density of a block 6, defined as Given the above definition of density for leaf nodes, the density of a virtual node is the average density of its descendant leaf nodes. We then control the empty space within all nodes in the virtual binary trie by setting a density threshold [mm, max], within which the block densities must he. For a virtual node at height h and depth d in the virtual trie, we enforce a density threshold of [2 ~ I + ^^^ example, the density threshold range for virtual node VQ in Figure 5.5 is — | + ¿3 0.33,0.92], since the depth for VQ is 2 and height of the trie is 3. </p><p>Each insertion of a node into the XML document adds exactly two consecutive parentheses into a block (occasionally, the insertion will span two adjacent blocks). We maintain the empty space after each insertion as follows: if the density of the leaf node exceeds its maximum threshold, then we redistribute occupied bits among a range of leaf nodes by calling the function MAINTAIN in Algorithm 5.3. This function traverses up the virtual binary trie and stops at the first ancestor node v which does not have its maximum density threshold violated. We then evenly redistribute all the occupied bits (parentheses) amongst all the descendant leaf nodes of the v. It should be stressed that the trie is a pure visuahzation of the concept, and that in reality we are simply traversing a sequence of consecutive blocks in the bit array. Thus, each time we traverse up the binary trie, we are merely doubling the range of blocks considered for redistribution. Deletions are handled in a similar manner. </p><p>The reader may wonder why we use the formula above for controlling the density threshold. This is due to two factors: first, in order to guarantee good space uti- hzation, the maximum density of a leaf node should be 1, and the minimum density threshold of root node should be 1/2. Secondly, the density threshold should satisfy the following invariant: the density threshold range of an ancestor node should be tighter than the range for its descendant nodes. This is so that space redistribu- tion for an ancestor node v, the density threshold of all its descendants are also immediately satisfied. Algorithm 5.3 Insertion and maintenance operations </p><p>LNSERT(A;) 1: Rightshift tierO[x, L^ + R^] to [x + 2, + H« + 2 2: tierO[x, x + 1] ^ {(,)} Increment R^, Ll and Rl </p><p>3: if + > \B\ - 2 then MAINTAIN (a;) </p><p>MAINTAIN {X) 1: {height, weight, 5} {Ig n, height, 1} 2: {min,max] ^ {Bl,Bl + \B\] </p><p>Q. whilp 2iLm > ^ Hq 4: depth depth — 1 </p><p>5: 6^25 </p><p>6: min MAX(0, min - 5) </p><p>7: max max + 8: Evenly distribute bits in blocks [min, max] and update the corresponding tier 1 and tier 2 tuples. </p><p>5.4.2 Space and Time Cost </p><p>In the worst case, we use 4 bits per node, since the root node can be only half full </p><p>Thus, on a 32-bit word machine, we can store at most 2^2/4 ^ nodes. However, by adjusting the minimum root node density threshold, from | to Mt is possible to store more than 2^° nodes by choosing a smaller e. In practice, e should be 2 and therefore 2en bits is in effect An. The factor e should only be less than 2 when the document is relatively static. </p><p>The correctness of the above scheme, and its running time, are summarized in the following lemma (with proof omitted, see Bender et al [11]): </p><p>Lemma 5.4.1 Given an n node unlabeled higher degree ordinal tree with 2en bits </p><p> where e>l, we can obtain update in amortized 0(lg^ n) time with block size \B = e(ign). </p><p>In practice, we try to leave approximately 20% of each block empty during insertions. </p><p>Even when there are bulk insertions in the middle of the document, the lemma above should still guarantee a good worst-case performance. If desired, it is possible to deamortize the algorithm using the techniques of [11, 89 . </p><p>5.5 Optimizations </p><p>This section optimizes navigational operations from linear time (as presented previ- ously) to near constant time. It also analyzes the total space cost andfinally, outline s how containment queries can be built on top of the proposed succinct storage. </p><p>5.5.1 Auxiliary Data Structure </p><p>In order to speedup the navigational accesses, auxihary data structures (tier 1 and tier 2 blocks) are added on top of the tier 0 structure we presented in Section 5.3.1. Both tier 1 and tier 2 contain contiguous arrays of tuples, with each tuple holding summary information of one block in the lower tier. </p><p>Each tier 1 block stores an array of tuples T^,... where n is the maximum number of tuples allowed per tier 1 block. Each 7;° for 0 < z < n is defined as where: </p><p> the total number of left parentheses of a block. </p><p>R^: the total number of right parentheses of a block. </p><p> mP: the minimum excess within a single block by traversing the parentheses array from the beginning of the block. </p><p> the maximum excess within a single block by traversing the parentheses array </p><p> from the beginning of the block. </p><p>D^: total number of character data nodes. </p><p>96 bio Tier 0 (((()• )(()H )(())(• ())((• )))) bOo bOi bOj bOj b04 (L,R,m,M,D): (total left parentheses, total right parentheses, min excess, max excess, total character node) </p><p>Figure 5.6: Example of Tiers of Topology Part </p><p>Using the summary information in tuples, we can then easily calculate the density of each tier 0 block by using the formula density = °. </p><p>Similar to tier 1 blocks, each tier 2 block stores an array of tuples T^.T^,... where n is the maximum number of tuples allowed per tier 2 block. Each tuple T/ for 0 < z < n is then defined as where: </p><p>L^: the sum of all L^ for all tier 1 tuples T« (ESo'^'' </p><p>R^: the sum of all i?« for all tier 1 tuples T^ (ESo'"^"' ^i)- m^: the local minimum excess across all of its tier 1 tuples. </p><p>M^: the local maximum excess across all of its tier 1 tuples. </p><p>D^: the total number of character data nodes for all tier 1 tuples ' </p><p>The three tiers are clearly illustrated in Figure 5.6, where each tier consists of contiguous fixed size blocks, which in our implementation, are four kilobytes in size. Therefore, each tier 0 block can hold up to 32768 bits and each tier 1 block can hold ^ tier 0 blocks. Similarly, each tier 2 block can hold up to ^ tier 1 blocks, 4KB which is equivalent to ITO] tier 0 blocks. </p><p>Even though both tier 1 and tier 2 tuples look similar, the values of m^ and M^ are calculated in a different way to mP and The algorithm to calculate the local minimum/maximum excess in tier 2 is given in Algorithm 5.4. Algorithm 5.4 calculate local excess in Tier 2 block </p><p>TIER2-LOCAL-EXCESS(Ì2) 1- {tastarti ti end} ^ { |t1| ' ? l^^T^ ^ "" ^^ {tier2[t2].m, tier2[t2].M} ^ {tierl[tlstart]tierl[tlstart] excess tierl[tlstart].L — tierì[tlstart].R 4: for each tl from il start + 1 to il end do 5: if excess + tierl\tl].m < tier2[t2].M then tierl\tl].m excess + tierl[tl].m if excess + tierl[tl].M > tier2[t2].M then tierl[tl].M <— excess + tierl\tl].M excess ^ excess + tierl\tl].L — tierl\tl].R </p><p>Updating both of the auxiliary tiers is fairly easy. During the insertions and deletions in a tier 0 block, we simply update the appropriate tuples in the corresponding blocks in the higher tiers. Since the redistribution process we described in Section 5.4 can be seen as a sequence of insertions and deletions, the corresponding updates to the auxiliary tiers do not affect the worst case complexity for updates. </p><p>5.5.2 Using Auxiliary Structures </p><p>Recall the function FORWARD-EXCESS(siari, end, excess) in Algorithm 5.1 returns the position of the first parenthesis with the given excess within the range [start, end . If we only have tier 0 available, then this scan is linear. However, we can use tier 1 to test whether this value lies within the z-th tier 0 block by checking whether < excess < (M? + e^), where e^ is the excess between start and the begin- ning of the z-th tier 0 block (excluding the first bit). However, as \B\ = 6(lgn), there are potentially n/\B\ tier 1 tuples to scan. Hence, we use tier 2 find the appropriate tier 1 block within which excess hes, thus reducing the cost to a near constant in practice. This is essentially how we implement this function, with the pseudo-code given as function FAST-FORWARD-EXCESS in Algorithm 5.5. </p><p>Other operations, such as accessing text nodes, can be implemented in a similar fashion to FORWARD-EXCESS, and hence we omit the details. Algorithm 5.5 Optimized basic topology operations </p><p>NEXT(NOIIE) ro 1: if 'node <^ LL.^node. + ^^nodi^Lwe. then 2: return + 1 else if jS^o^g is the last tier 0 block then return NOT-FOUND else return B^,, + \B\ FAST-FORWARD-EXCESS {start, excess) current ^FORWARD-ExcESs(siari, -h - l,ea:cess) if current ^ NOT-FOUND then return current for each Tf G ^^Lrent where T^ > do if current -h < excess < current + M? then return FORWARD-EXCESS(Zf, + \B\ - I, excess) current current VI — R^ for each T/ € where Tj > T^^rrent do 9 if current -h m] < excess < current Mj then 10 for each T^ € B] where T^ > T? do 11 if current + < excess < current -h Mf then 12 return FORWARD-EXCESSMO + \B\ - 1, excess) i 13 current current L^ — R^ 14 current ^ current -t- L] — i?] </p><p>In practice, most matching parentheses lie within the same block, and occasionally are found in neighboring blocks. This is because the depth of an XML document is generally much less than \B\ (even the depth of the highly nested Tree Bank dataset </p><p>64] is much less than 100). Therefore, when FAST-FORWARD-EXCESS is called from navigation operations, we rarely need to access additional blocks in either the auxiliary data structure or the topology bit array. In the worst case, when the matching parentheses lies within a different block, we only need to read two tier 1 blocks and two tier 2 blocks. 5.5.3 Space Cost </p><p>As we mentioned in Section 5.4.2, using 32-bit words, we can store 2^° nodes. In our implementation we also chose to use four kilobyte sized blocks. Based on these values, we now discuss the space cost of each component of our storage scheme. Of course, if larger documents need to be stored, we can simply increase the word size that we use in the data structure. </p><p>Tier 0: From Lemma 5.4.1 of Section 5.4.2, tier 0 can take up at most 2^^ bits space (or = 2^^ blocks). </p><p>Tier 1: We need Ig |23| - 15 bits for each variable m^, D^) within a tuple. Each T^ tuple requires a total of 51g|i^| = 80 bits including bit alignments and based on this calculation, each tier 1 block can then store up Ifil to L|T0|J = 409 tuples, Since the maximum number of nodes can be stored in tier 0 is then we only need fg1^1f = tuples to represent all tier 0 2en M rioMfkli] ^ 321 tier 1 blocks. blocks and they can be stored in IBf/LlTOlJ </p><p>Tier 2: We need a total of 24 bits for each variable D^) within a T^ tuple. This is derived from Ig = Igig^), where each variable holds the size of a tier 1 tuple and total number of bits required to represent the total number of tuples per tier 1 block. So each T^ tuple requires a total of 1 _ 5ig(_^) = 120 bits and each tier 2 block holds up to L||i|J = 273 T^ </p><p> rT.1 -11 1 J 4- 1 r rlOlglBlen /J^-i 50 Ig |g| lg(sfL)6n tuples. Thus, we will only need a total ot —^¡2 / = \b\3 = ^ tier 2 blocks to store the 321 tier 1 tuples. </p><p>Since we only need a maximum of two tier 2 blocks, we can just keep them in main </p><p> memory. In fact, the entire tier 1 can also be kept in main memory, since it requires </p><p> at most 321 * 4KB = 1MB. In summary, the space required by the topology layer </p><p>(in bits) is: </p><p>10 Ig B • en 2en-\- + = 2en + o{en) J 6 8 and the space required by the internal node layer (in bits) is: </p><p> en\gE-{-0{E) </p><p>We can use the above equations to estimate the space used by an XML file, using as our example a 100 MB copy of DBLP, which was roughly 5 million nodes. If we assume there are no updates after the initial loading, we can set e = 1. According to the equation, we will use roughly 2en = 1MB for the topology layer, and cnlgE-^ 0{E) = 8MB, which is consistent with the storage size in Table 5.1 and Table 5.2. This, of course, disregards the space needed for the text data in the document. </p><p>Algorithm 5.6 Offset calculation for block and indices within the block in all tiers </p><p>L|B|J a: mod \B </p><p>Bl - 1^1 J' li [{Blb\g\B\)mod\B </p><p> n L 1^1 J' n n </p><p>Based on the block size \B\, we know the exact size of tuples and tiers in our topology layer. Therefore, given a bit position we can calculate which tier 0 block this bit belongs to and which tier 1 block contains summary information for the tier 0 block. For a given Algorithm 5.6 lists all the calculations needed to find its resident tier 0 to tier 2 blocks and the index within the blocks to get the summary. </p><p>5.5.4 Theoretically Fast Navigation </p><p>Our experiments will demonstrate that the above scheme has impressive speed in practice, because there are only two tier 2 blocks for a 32-bit word machine. However, in theory, there are 0(n/lg^n) tier 2 blocks, and hence the worst case for naviga- tional accesses is also 0(n/lg^n), which is not much of an improvement on 0(n). </p><p>Fortunately, it is relatively simple to fix this limitation: instead of having 3 tiers, </p><p> we generalize the above structure in a straightforward fashion to use 0(lgn/ Iglgn) </p><p> tiers. This means that the top-most tier has 0(n/Ig^®"^/^®^^"^ n) = ©(1) blocks, re-</p><p> ducing the worst case navigational access time to 0(lgn/Iglgn). It might appear </p><p> that this increases the update cost to 0(lg^n/Iglgn), since moving a node requires </p><p> updating 0(lgn/lglgn) tiers. We can eliminate this overhead by updating the up-</p><p> per tiers once per redistribution, instead of once per node. A simple proof then </p><p> demonstrates that the overall update cost is unaffected, and remains O(lg^n). </p><p>5.5.5 Persistent Identifiers and Indexes </p><p>We stress that the primary purpose of this chapter is to minimize update costs </p><p> while also using theoretical optimal space. By their very definition, an index is an </p><p> alternative access path for the data, which is a form of redundancy that we are </p><p> trying to avoid here. While a surprising amount can be done without persistent </p><p> node identifiers, in some circumstances they are a useful feature that allow the </p><p> creation of additional indices upon the data. It is possible to extend our data </p><p> structure to support O(lgn) bit persistent identifiers using an additional 21gn -h c </p><p> bits space, without affecting the asymptotic time and space complexity. This allows </p><p> any traditional index to be built upon our structure if desired. </p><p>As this is not the main focus of the chapter, in here we will show one of the sim-</p><p> plest approaches that can achieve our claim in the last paragraph. We can support </p><p> persistent identifiers easily by mirroring the topology layer with an additional linear </p><p> array of blocks, using ^ bits per parentheses, that are used to maintain a map from </p><p> the parentheses to persistent identifiers. We create a second linear array indexed by </p><p> the persistent identifiers, the entries of which give the absolute excess of the node </p><p>(including empty bits). Obviously this additional data structure does not affect the </p><p> original lookup time, and requires one redirection if we use the persistent identifier. For updates, when we need to shift n parentheses, we also need to update n records in the array to reflect the new absolute excesses. This asymptotically does not affect the update cost, and hence this simple augmented structure serves our purpose of supporting persistent identifiers with only a constant time difference in space and time. </p><p>5.5.6 Querying and Indexing the Database </p><p>Apart from the navigational operators we have described so far, our proposed struc- ture can also support other commonly used operators such as structural and twig joins for processing complex XML queries. Using our storage scheme, we only need to scan through the internal node layer once to select all of the candidate node lists. As we have mentioned in Section 5.3.1, a single scan of the internal node layer auto- matically provides a region encoding [45, 96, 97] of each node. If we are processing a twig join operation, then the single scan of the internal node layer produces unique sets of solution nodes corresponding to the unique twig join patterns. We can then employ any region encoding based join algorithms to perform the join process. </p><p>For our experiments in Section 5.6, we implemented such join algorithms by extend- ing the skip join proposed in our previous chapter for structural joins and twig joins. The linear scan of internal node layer for processing join operations on our proposed scheme may sound expensive; however, we show later in the experiments (Table 5.1 and Table 5.2) that a 500MB XML document requires only 42MB to store its entire document (excluding text nodes), which means given a reasonable buffer size (e.g., 8MB), it only needs 6 block reads to scan the entire document. </p><p>5.6 Performance Evaluation </p><p>This section presents our experimental results, which demonstrate the superior per- formance of our succinct storage scheme in a variety of ways, such as physical storage Size Number of Number of j^O T' T2 </p><p>(MB) Text Nodes Markup Nodes (Bytes) (Bytes) (Bytes) </p><p>DBLP XML </p><p>1 19,950 27,387 10,752 6,148 5,028 </p><p>5 107,402 144,859 56,832 6,148 5,028 </p><p>10 209,967 312,205 111,104 6,148 5,028 </p><p>50 1,038,758 1,406,980 548,352 18,436 5,028 </p><p>100 2,065,320 2,832,060 1,089,536 30,724 5,028 </p><p>500 10,613,430 14,280,334 5,588,992 135,172 5,028 </p><p>Shakespeare XML </p><p>8 148,924 179,618 144,896 6,148 5,028 </p><p>Table 5.1: Statistical information of the physical storage of different size XML doc-</p><p> uments (Structural) </p><p>Size Number of Number of H E D </p><p>(MB) Text Nodes Markup Nodes (Bytes) (KB) (KB) </p><p>DBLP XML </p><p>1 19,950 27,387 255 79 555 </p><p>5 107,402 144,859 262 417 2,744 </p><p>10 209,967 312,205 262 814 5,480 </p><p>50 1,038,758 1,406,980 284 4,067 27,480 </p><p>100 2,065,320 2,832,060 284 7,990 55,003 </p><p>500 10,613,430 14,280,334 316 41,435 275,513 </p><p>Shakespeare XML </p><p>8 148,924 179,618 206 2,122 6771 </p><p>Table 5.2: Statistical information of the physical storage of different size XML doc-</p><p> uments (Labels) </p><p> size, update cost, and navigational and query performance. All experiments were performed on a PC with a LlGHz AMD Athlon processor, 768MB of main memory, </p><p>1GB of swap partition and 40GB of 10,000 RPM SCSI hard disk. The PC was running Debian Linux 3.0, kernel build 2.4.25. For all experiments, we compared the performance of our storage scheme with the implementation presented by Zhang et al [98], since they demonstrated experimen- tally that their system outperformed other related systems in almost all cases. We used several data sets covering a wide range of XML appHcations: the Protein Se- quence Database (PSD) [10], DBLP [1] and Tree Bank [64] database. Both PSD and DBLP are extremely regular data sets, whereas Tree Bank's deep recursive tree structure and its over 300,000 unique paths make it an interesting and chal- lenging dataset to handle. We prepared samples of each data set of varying sizes: </p><p>5MB, 10MB, 50MB, 100MB and 500MB. The larger sized samples were created by repeatedly duplicating and merging the same dataset until it reached the desired size. </p><p>5.6.1 Physical Storage Size of Data </p><p>In our first experiment, we loaded DBLP into our data structure, and measured the sizes of various portions of the structure, which are given in Table 5.1. Columns </p><p>T^, T^ and T^ represent the disk usage for tier 0, tier 1 and tier 2. The result shows the size of tier 0 increases the most as document size increases. This is due to the fact that the size of tier 0 is linearly proportional to the number of elements in the document. Tier 1 on the other hand grows more slowly, and for all practical purposes the size of tier 2 remains constant, since gigabytes of XML data would have to be loaded into our database before tier 2 would increase in size (and even then only negligibly). The columns in Table 5.2 show the size of the hash table used to hold the tag names (column H), the internal node layer (column E), and the text data layer (column D). As can be clearly seen, the majority of the space consumed is used up by the internal node label layer {E) and the text data block (D). </p><p>The last row of Table 5.1 and Table 5.2 is the result for storing the Shakespeare XML data set, which is used to compare the performance of our proposed scheme against </p><p>Natix [51]. Table 5.1 and Table 5.2 shows that the total size of all the structures used by our storage scheme is actually smaller than the original plain XML text file 18000 / Tree Bank Z DBLPS —I— / PSDS —X— 16000 Tree Banks / fSDZ DBLPZ Q / PSDZ - - / Tree Bank Z --o- 14000 / / POBLPZ </p><p>^ 12000 T3 C o ! / / / ^ 10000 / / / ' / 0} ! / / • : / E .••• / / / / / D) 8000 / /• C ' / / ••5 ' / / (0 ' / ' o / / / 6000 / / •> </p><p>:// ! / i ! •••' / 4000 / / / / / / </p><p>2000 A Tree Bank S PSDS ^BLPS 100 200 300 400 500 600 Size of Document (MB) </p><p>Figure 5.7: Loading time using Scheme (S) vs. Scheme (Z) </p><p>10000 Insertions on 1 MB Document Insertions on 5 MB Document Insertions on 10 MB Document Insertions on 50 MB Document 1000 Insertions on 100 MB Document </p><p>V) T3 100 C </p><p>10 (U E H 0) E c o (i> (0 c 0.1 </p><p>0.01 </p><p>0.001 20000 40000 60000 80000 100000 Number of Inserts </p><p>Figure 5.8: Average worst case insertion time using DBLP 10000 Insertions on 1 MB Document Insertions on 5 MB Document Insertions on 10 MB Document 1000 Insertions on 50 MB Document Insertions on 100 MB Document </p><p>•(O0 100 c 8 wa; 10 </p><p> oc ••0C) (!) C 0.1 </p><p>0.01 </p><p>0.001 20000 40000 60000 80000 100000 Number of Inserts </p><p>Figure 5.9: Average random insertion time on DBLP </p><p> itself! To compare, using the same Shakespeare data set, Natix uses approximately twice as much space as the raw XML alone requires. </p><p>In order to compare the growth of our space consumption against previous schemes, we loaded the data sets into both our storage scheme (scheme S) and Zhang's storage scheme (scheme Z). A comparison of the total space used is given in Figure 5.10. Unfortunately, Zhang's implementation (which was memory based) [98] was unable to load the 500MB dataset due to insufficient memory, and hence we omitted the storage size for the 500 MB category. Figure 5.10 not only further confirms our expectation of the final disk usage storage size, but it also shows that our storage scheme uses at most 20% of the disk usage of Zhang's storage scheme for all three data sets. Since Zhang's scheme was considerably more concise than competing schemes, our scheme gives a fivefold improvement over the state of the art. This gain can be partially attributed to the fact that we manage to avoid using any indices for querying or navigating data (apart from the auxiliary tiers defined earlier), whereas Zhang's storage scheme relies on the use of a B-tree to index both element tags and • Our Scheme (DBLP S) </p><p>• Our Scheme (TreeBank S) </p><p>• Our Scheme (PSD S) </p><p>^ Zhang et al (DBLP Z) </p><p> m Zhang et al (TreeBank Z) </p><p>• Zhang et al (PSD Z) </p><p>STORAGE RATIO: - DBLP (S : Z): 20% - PSD (S: Z): 19% - TreeBank (S: Z) 17% </p><p>10 50 100 Document Size (MB) </p><p>Figure 5.10: Physical storage size </p><p> text data. The above shows our structure is not only sound in theory but also works </p><p> in practice. </p><p>5.6.2 Update Performance </p><p>In our second set of experiments, we tested the scalability of our structure under </p><p> updates by doing frequent insertions in both a worst case manner and in a random </p><p> manner. The worst case for Algorithm 5.3 is to insert nodes at the beginning of </p><p> an already completely packed database, with no gaps between blocks. The random </p><p> insertion scenario simply inserts a new node as a child of any randomly selected </p><p> node. </p><p>For both worst case and random insertions, we pre-loaded a set of 1, 5, 10, 50 and </p><p>100MB of XML documents into our databases and packed each one of them, leaving </p><p> no gaps. For each experiment, we did multiple runs (resetting the database after </p><p> each run). The average insertion times per node for both worst case and random are </p><p> shown in Figures 5.8 and 5.9. In Figure 5.8, we see an initial spike in the execution time for the worst case insertion. This corresponds to the initial packed state of the Get First Child Get Next Sibling </p><p>0.8 - w •co </p><p>.y 0.6 -</p><p> c o ro 0.4 - •O>) ro E </p><p>0.2 -</p><p>0 50 100 150 200 250 300 350 400 450 500 Document Size (MB) </p><p>Figure 5.11: Accessing next child and next sibling of random nodes using Scheme (S) </p><p>250 </p><p>200 300 500 Document Size (MB) </p><p>Figure 5.12: Path evaluation using Scheme (S Ql-6) vs. Scheme (Z Zl-6) database, in which case the very first node insertion requires the redistribution of the entire leaf node layer. Clearly, in practice this is extremely unlikely to happen, but the remainder of the graph demonstrates that even this contrived situation has little effect on the overall performance. The graph also shows that the cost of all subsequent insertions increases at a rate of approximately 0(lg^ n), which conforms to Lemma 5.4.1 proposed in Section 5.4. In fact, all subsequent insertions up to </p><p>100,000 took no more than 0.5 milliseconds. </p><p>The average random node insertion times are plotted in Figure 5.9. It is interesting to notice how similar Figure 5.9 is to the worst case insertions of Figure 5.8. The initial jump in time for random insertion is also due to the redistribution of the whole leaf node layer, since the database was packed at the beginning. However, after the redistribution process, we have enough gaps between blocks such that any random insertion of nodes will at most require redistribution of a few blocks, not the entire leaf node layer. In fact, when a database is fully packed, the initial redistribution will make equivalent the random and worst case insertion scenarios. Eventually, when the number of gaps gets smaller, more redistribution is required. </p><p>5.6.3 Node Navigation </p><p>To test the performance and scalabihty of random node navigation, we pre-loaded our XML data sets, and for each database, we randomly picked a node and called </p><p>NEXT-SIBLING and FIRST-CHILD multiple times. The average access time for these two operations are plotted in Figure 5.11. The graph shows that as the database size gets bigger, the running time for FIRST-CHILD and NEXT-SIBLING function both remained constant. This is not surprising, since in reahty most nodes he close to their sibhngs, and hence are Hkely to he in the same block. Therefore, it generally only takes a scan of a few bits on average to access either the first child node or the next sibling node. As Figure 5.11 shows, FIRST-CHILD performed slightly faster than NEXT-SIBLING function, which again is unsurprising, because the first child is always adjacent to a node, whereas its next sibling might lie some distance away. Q# Data Set XPath Expression Q1 Tree Bank //EMPTY//NP Q2 PSD //ProteinEntry//refinfo//year Q3 DBLP //inproceedings//pages Q4 Tree Bank //EMPTY[.//NP1//VBN Q5 PSD //ProteinEntryC.//feature-type/text="modified site"] [.//status/text()="predicted"] [.//author/text0="Needleman, S.B."]//year Q6 DBLP //inproceedings[.//i] //ee Table 5.3: Query Categories 5.6.4 Path Evaluation </p><p>One of the most important features of any XML system is its ability to evaluate path expressions quickly. Using both our storage scheme (with the skip-join algorithm 57]) and Zhang's implementation (with their NoK algorithm), we repeated the execution of the queries listed in Table 5.3 on DBLP, PSD and Tree Bank databases three times. As can be seen, the queries selected test the performance of branch queries and ancestor-descendant queries. As we reported before, our PC ran out of memory when trying to load a 500MB XML document using Zhang's storage scheme. However, our storage scheme was able to process the document without any problem, so we have included the run time for path evaluation for the 500MB data set to show the scalability of both systems. </p><p>Figure 5.12 shows the overall run time of each queries on different size databases, using existing skip-join algorithm on Scheme (S) and the NoK algorithm [98] on Scheme (Z). Lines labeled Zi_6 are the run-times of the NoK algorithm and the hnes labeled Qi-e are the run-times of our skip based join algorithms. The NoK implementation obtained from its author was unable to successfully evaluate Q4, and hence Z4 is omitted from the figure. </p><p>Figure 5.12 suggests path evaluation is doable on our storage scheme. In fact, the path evaluation for QI-Q using skip-join algorithms and our storage scheme yields a linear performance curve. This is because the skip based join algorithms require the system to first scan through the internal nodes to select sets of candidate nodes before either the structural join or twig join can be performed. However, for most queries, we only need a maximum of one scan of the internal node layer for selecting all necessary candidate nodes. The higher run-time for query Q^ compared to other queries is mainly due to the testing of text node values, since we have to fetch each text node's value. Overall, Figure 5.12 shows that our proposed skip based join algorithms are significantly more scalable when used on the proposed storage scheme. </p><p>5.7 Conclusions </p><p>A compact and efficient XML repository is critical for a wide range of applica- tions such as mobile XML repositories running on devices with severe resources constraints. For a heavily loaded system, a compact storage scheme could be used as an index storage that can be manipulated entirely in memory and hence sub- stantially improve the overall performance. In this chapter, we proposed an elegant succinct data structure for storing XML data. There are several strengths to our data structure: </p><p>• Our data structure is exceptionally concise, without sacrificing update perfor- mance, a fact which we have demonstrated both theoretically and practically. </p><p>• We support all standard navigational primitives (parent, first child, and sibhng navigation) in near constant time. </p><p>• Our data structure imphcitly maintains document ordering information, and the relative order of two nodes can be determined extremely quickly (by simply comparing their relative position in the structure). </p><p>• In addition to traditional navigational primitives, our data structure supports both structural and twig joins, using only a single pass of the data structure. We have demonstrated in our experiments that the conciseness of the data structure means that making a single pass to evaluate such joins is an effective evaluation technique. </p><p>There are still a few open issues that need considering. First, we intend to extend our system to cope with traditional database issues such as concurrency and trans- actions. We believe that the structure lends itself to scalable implementations of both of these concepts, due to its small size, and the fact that during updates most of the work is done in a few sequential scans of the structure. We also plan to improve storage utilization for text data. Our storage scheme works best on "data heavy" documents, where there are many element nodes (an informal survey indi- cated that most collections available on the Internet fall into this category). It is an open problem to design efficient representation for XML document collections which primarily consist of text data. Such collections fall more into the realm of informa- tion retrieval than databases, and hence it would be expected that IR techniques would be highly applicable. </p><p>5.8 Acknowledgments </p><p>We would like to thank Zhang et al [98] for providing the implementation of their NoK system. Chapter 6 </p><p>Synchronization for Mobile XML Data </p><p>You can always tell when a man's well informed. </p><p>His views are pretty much like your own. </p><p>— H. Jackson Brown, Jr. </p><p>The previous chapters presented different approaches to tackle the fundamental </p><p> problems of maintaining XML, the ordering maintenance issue, performing struc-</p><p> tural join and storing XML in optimal space. Those problems have lower bound for </p><p> time complexity and space complexity, the bounds are also unavoidable. However, </p><p> on a higher level, many difficult XML problems having high time complexity often belong to a much smaller subclass, which occur in most practical situations. Thus, a simple suboptimal solution is often good enough on practice. Here we present one of many such problems. 6.1 Introduction </p><p>The growing trend towards mobile computing and the increasing popularity of XML has resulted in more and more hand-held applications accepting their data in XML format. Due to this, some vendors have provided hand-held XML database manage- ment systems for integrating enterprise applications such as sales force automation systems with a mobile workforce. Others have used XML for defining synchro- nization protocols between the global database servers and mobile databases. For instance, SyncML is a proposed synchronization protocol which runs over different Internet and wireless transports. An updategram used by Oracle and SQL Server is XML generated by agents to notify the client of changes to the data on the server, and vice versa. </p><p>Consider a database environment where an XML server database system shares portions of data (e.g., legacy data with exchange in XML format, a part of a large XML document, or a subset of document collections) with a set of intermittently connected clients. The connectivity is intermittent due to an unstable or expensive connection. Hence clients retrieve a copy of the shared data from the server and maintain it in their local database. In this chapter, the retrieval language is XPath 85] extended with update operators as proposed similarly in [81, 90]. Updates made to this local database are propagated to the server database when the chent connects. The data shared between the server and some Chent A may also be shared with another Client B; therefore, changes to that data at Chent A should be reflected at Client B. Since the clients are only intermittently connected and cannot directly send changes to other clients, the server acts as a conduit for updates by forwarding the updates to its relevant clients. In fact, the server is responsible for tracking chent updates to shared data and batching those updates for dissemination to other clients which share the data. </p><p>To solve this problem, we could adopt the current approach used in most inter- mittently connected relational databases. In these systems, each client is treated individually such that update files are created containing updates relevant to each particular client (on a per-client basis). That is, for each client, the server prepares a client-specific update file. This is called the client-centric approach [63] because it aggregates database changes based on the data needed by each client. Unfortu- nately, the processing and sending of each client-specific file is expensive in terms of server processing and network bandwidth consumption; therefore, the server pro- cessing load is on the order of the number of clients. That is, the server incurs additional cost for each and every client, so the number of clients that can be served is limited. </p><p>Mahajan et al [63] proposed exploiting the overlap of data shared between various clients to increase the scalability of the server. This was accomphshed with data- centric processing, rather than client-centric processing, by grouping data according to how it is shared between clients. In the data-centric approach, the server creates an update file for each data group. Unlike the client-centric approach which builds an update file for each chent, the data-centric approach builds update files for data groupings and requires the clients to merge the correct set of update files to retrieve the needed updates. Hence, the data-centric approach reduces the complexity of update file maintenance from the order of the number of chents to the order of the number of groups, thereby increasing the scalability of server processing. </p><p>However, as XML information is semistructured and may not have a rigid schema, the techniques proposed in Mahajan et al [63] and also Yee et al [95] cannot be appUed. In this chapter, we exploit this data-centric grouping idea and propose a hierarchical grouping structure based on data sharing. In particular, data sharing is determined by a chent's subscription. Moreover, determination of whether an update is related to a client group becomes diflftcult due to the complexity of XML data and query structures. Produci </p><p>Computers Components </p><p>Monitors </p><p>Item Item / \ Item </p><p>"XYZ" 50 "ABC" 40 Brand Qty Price Brand Price Brand Brand / ¿ ^o è t T U "XYZ" 80 120 "ABC" 110 "XYZ" "ABC" </p><p>(a) Architecture of an XML Based (b) An Example XML Document Represent in Graph Synchronization System Model </p><p>Figure 6.1: Synchronization scenario and XML graph model </p><p>6.2 Background </p><p>In this section we present the architecture of an XML-based mobile database system. We then describe how we use XPath expressions as a retrieval language to specify the subset of data to be stored in the local cache of a mobile client. </p><p>6.2.1 An XML-based Mobile Database Architecture </p><p>Figure 6.1(a) shows the general architecture for an XML-based mobile database architecture. The backend server, S', stores information which is shared between mobile devices {A, B, C, D, E, F). This information, which may exist in a different format, is converted to XML. Mobile devices can identify a subset of the data that is of interest by specifying a general path expression for their subscription. 6.2.2 XPath as an Access Language </p><p>XPath is a query language, similar to XSL pattern syntax. It is used to address and filter the elements and text of XML documents. XML documents can be viewed as a tree where every XML element can be represented as a node and an edge is represented as a relation between two nodes (Figure 6.1(b)). An XPath expression consists of a set of path expressions combined using binary and set operators. A path expression contains a list of literal strings or wildcard (*) operators, dehmited by either the child (/) or descendant (//) operator. Literal strings and wildcard operators are used to match against XML element names, while the child and de- scendant operators are used to match the relationship between those XML elements. </p><p>Each literal string and wildcard operator can optionally contain predicates ([]) for filtering. A predicate contains an XPath expression with the matched element name </p><p> acting as the root of the tree. </p><p>We use an XML database management system which manages computer hardware </p><p> sales force automation systems as an example throughout this chapter. Different </p><p> clients issue different XPath expressions to identify the subset of data they are </p><p> interested in: </p><p>• Retrieve all computer systems. /Product/Computers/Item </p><p>• Retrieve all components. /Product/Components/*/Item </p><p>• Retrieve all products manufactured by XYZ. /Product//Item[Brand = "XYZ"] </p><p>• Retrieve all harddisks manufactured by ABC or XYZ. </p><p>/Product/Components/Harddisks/Item[Brand = "ABC" or Brand="XYZ"] </p><p>In this chapter, we follow the notation of other related work by indicating the XPath </p><p> fragment (subset of XPath limited by the permitted operators). XPath consists of the following operators: /, //, [], * and I. For example, XP(//,*) denots an XPath fragment where only the descendant and wildcard are allowed. Also we followed Miklau and Suciu [69] to view an XPath fragment as a tree pattern. For instant, the pattern a/b//c[d] [*/e] corresponds to the tree pattern in Figure 6.2(a). </p><p>6.3 Related Work </p><p>6.3.1 The XPath Query Containment Problem </p><p>Definition The XPath query containment problem is to determine the partial order of two XPath expressions p, q. For any node n in XML document D, whenever n matches p, it also matches q. We denote such a partial order as p C q. </p><p>Solving the XPath query containment problem means solving the synchronization problem of multiple subscriptions of XML. Thus, it has received a lot of attention 12, 28, 69, 73, 91, 92]. However, the XPath query containment problem cannot be done efficiently. Miklau and Suciu [69] were the first to show that XP(/,//, [] ,*) is CONP-complete, Neven and Schwentick [73] further proved that XP(/,//, [] ,*, I) remains in CONP. Deutsch and Tannen [28] considered existential semantics for variables and showed that XP(/,//, [] ,*, vars) is Neven and Schwentick </p><p>73] further proved that XP(/,//, [] ,*, I) with finite alphabet is under PSPACE, XP(/,//, • ,*, I, DTD) is under EXPTIME, and XP(/,//, [] ,*, I, DTD) with nodeset equality is undecidable. Olteanu et al [2] showed that each XPath expression has an equivalent XPath expression without backward axes by using exponential space. Clearly, none of the above are desirable as we need to achieve subhnear performance. As there can be huge number of chents, we need sub optimal approach. 6.3.2 The XPath Filtering Problem </p><p>By using XPath as a profile language, an efficient filtering mechanism that takes structure information into account for matching each subscription against XML stream data was presented in [8] using XPath. However, the merging of similar sub- scriptions for further optimization was not addressed. Our work also has similarities with the recent work proposed by [19, 69], in which the containment of XPath queries was investigated in detail. In particular, a new data structure based on the string Trie index was proposed in [19]. Their proposed data structure is similar to ours in that paths are encoded in a directed acyclic graph. However, ours differs in the han- dling of wildcards and descendant operators. [69] focused mainly on the tract ability and analysis of methods for determining the containment of tree-pattern queries, in which XPath was selected as the query language. It described how to determine the containment of XPath queries efficiently, but did not explore the merging and han- dhng of contained queries. Furthermore, both works did not address the problem from a mobile synchronization perspective. Hence, containment of queries was not apphed for clustering clients into groups according to their subscription interests such that their updates are efficiently synchronized within each group. More impor- tantly, selective propagation of updates (e.g., based on the containment of updates and subscriptions of different groups of chents) was not addressed. </p><p>Furthermore, our work also shares similar motivations with several other efforts including [5, 6, 24, 63, 94, 95]. However, all these efforts only considered primitive or less expressive subscription languages. For instance, [6] considered conjunctions of simple event predicates, where each event is checked against an attribute value. Although efficient index structures for selective dissemination was presented in [94], only the boolean model was considered. In [24], efficient algorithms for merging geographic queries were proposed. However, an efficient data structure for handling merged subscriptions was not addressed. </p><p>Similar motivations can also be found in mobile database applications. In [63], scal- abihty is enhanced by grouping mobile clients according to their interests in sharing data in relational databases. A similar concept was recently applied for efficiently maintaining replica in an intermittently connected environment in [95]. With the exception of [24], the works above did not attempt to reduce costs by automati- cally merging similar queries. Finally, an extensive survey on recent research and development related to semistructured and web data, ranging from data models to query languages to database systems, was presented in [36]. Information regarding recent standards, techniques, and systems can be found at many XML portals such as xml.com and xml.org. </p><p>Other noteworthy mobile computing work, include Bayou [25] and Deno [54], which focus on conflict resolution and consistency maintenance. These works use mecha- nisms such as compensating transactions and voting protocols to enforce constraints. Moreover, as the number of clients maintained by each server increases, clients must be serviced in groups in order to maintain scalabihty. Broadcast databases [5] ad- dressed this problem in the wireless domain but is primarily aimed at reducing the response time for data requests. </p><p>Also some analogous concepts of identifying and resolving conflicts have been in- troduced in the field of Computer Supported Cooperative Work. In CSCW, direct conflict occurs when two or more users target the same object, and certain operations may need to be undone to provide document consistency. The above is analogous to users targeting the same path in an XML database and rollbacks required for XML data. </p><p>6.4 Overview of Solution </p><p>This section provides a basic idea of how the proposed query merging mechanism works. The key to our solution is an efficient mechanism to determine if two XPath expressions are overlapping. Overlapping expressions are merged so that the server can process fewer updates and the amount of information sent may be reduced (e.g., by exploiting the advantages of multicasting). However, we assume here that the client applies a post-filtering query over the received data in order to perform the update to its local data. </p><p>Two XPath expressions are considered overlapping if the XML segments retrieved from the same XML document by these two expressions are also overlapping or completely contained from one to another. </p><p>Consider the following example of computer hardware sales force automation system: </p><p>/Product/Computers D /Product/Computers/Item[Brand = "XYZ"] </p><p>A computer item with brand name XYZ^ represented by a path element Item, is a child element of element Computers. Their relationship is reflected in the above XPath expressions. In this case information regarding that Item should be dehvered to both subscribers. However, the first subscriber is interested in more general computer product, which may or may not be interesting to the second subscriber depending on whether the expression is about Item. Consider another example below: </p><p>/Product/Computers fl /Product/Components = (j) </p><p>These two subscriptions are mutually exclusive (since elements Computers and Components are two distinct children of element Product) so they cannot be merged. Similarly, even though both subscriptions below are interested in the Item elements, they cannot be merged as these two Item elements are under two independent (i.e., mutually exclusive) parents (Computers and Components). </p><p>/Product/Computers/Item fl /Product/Components/Harddisks/Item = 0 </p><p>Wildcard (*) is more general than any literal tokens since it can match any literals within the specified scope. For instance, the wildcard below can match any child elements of /Product, including the element Components. /Product/* D /Product/Computers </p><p>When descendant operators are involved, we cannot determine whether two sub- scriptions are independent by observing the XPath expressions. However, we can still determine their dependency if the schema information is available. For instance, we can confirm the following two subscriptions are not independent as Computers contains Brand. </p><p>//Brand fl /Product/Computers ^ 4> </p><p>Finally, further dependency information can be obtained by observing the predicates inside the XPath filter conditions. For example, the following two subscriptions are independent (because of their exclusive price ranges) although they are both interested in the child elements under Product. </p><p>//Product//Item [Price < 10] fl //Product//Item [Price >30] </p><p>All XPath Functions like the fn: count () function are based on the full result set of element nodes, and all their disjoint relations cannot be determined statistically as the following example illustrates. Therefore they can be treated as though they do not exist. </p><p> f n:max(//ltem/Price) = 0 if fn : count(//ltem/Price) > 1 ^(j) if fn : count(//ltem/Price) < 1 fn:min(//Item/Pricen ) </p><p>6.4.1 Transactions from Other Computers </p><p>The update statement in SQL plays a crucial role to make the manipulation and transactions of data stored in relational databases convenient and expressive. While the original XPath proposal did not include any update capabilities, the extended XPath [90] supports a complete set of update constructs from create to copy and move. These constructs are implemented as functions in XPath and can be invoked as other standard XPath functions. </p><p>Constructors </p><p>New elements, attributes, or texts can be created interactively by the insert function. </p><p>The function accepts a plain path (i.e. a path without filters or subqueries) as its </p><p> only argument. The quoted string in the path will be treated as the value of a text </p><p> or an attribute value. For example, the following will create an empty element Name </p><p> under every Restaurant element: </p><p>*/RestaurcLnt/xf n: insert (Name) </p><p>The following will create an Entree element under every Restaurant, then create a </p><p>Name element under Entree, and finally create the text "Black bean soup" under </p><p>Name: </p><p>*/Restaurant/xfn:insert(Entree/Name/"Black bean soup") </p><p>Finally the example below will create an attribute Note with value "Sunday Only" </p><p> for the second Entree of each Restaurant: </p><p>*/Restaurant/Entree [2]/xfn:insert(@Note/"Sunday Only") </p><p>Similarly, there are fn: insert-before() and insert-after() constructors to in-</p><p> sert path as a sibhng before and after the current reference node, respectively. For </p><p> example, the following will insert an element Cafe on the same level as Restaurant, </p><p> just before the second Restaurant: </p><p>Restaurant[1]/xfn:insert-before(Cafe) Delete </p><p>Delete can be executed by invoking the function xfn: delete () with no argu- ments. It will delete all the nodes (and their descendants) from the current context. </p><p>For example, the following will delete all the Names (and their descendants) from Restaurant elements. Note that the Restaurant elements will not be deleted in this case. </p><p>Restaurant/Name/xfn: delete() </p><p>Another example below will delete everything from the current context. If the root context is set to the global entry point to the whole database, it will delete all data in the database. </p><p>*/xfn:delete 0 </p><p>Copy </p><p>Cloning elements, attributes, or texts is possible by using the copy function. It accepts one argument which is the source of the copying. It will copy all the nodes (and their descendants) from the resultant context set of the evaluated argument, to every node in the current context. For example, the following will copy the content of the first Entree of Restaurants in the whole repository to the second Entree of the Restaurants. </p><p>(Restaurant/Entree)[1]/xfn:copy(//(Restaurant/Entree)[0]/*) </p><p>Note that very node in the current context will get a copy of the argument path, so the following example will make a copy of the content of the first Entree to every Entree including the first Entree itself. The argument path will be evaluated every time against every node in the current context. We define the subpath of an operation to be the argument path of the operation. </p><p>Restaurant/Entree/xfn:copy(//(Restaurant/Entree)[0] /*) </p><p>As with Create, there are xfn:copy-before() and xfn: copy-after() operations besides xfn: copy (), which will copy the source to be just before or after the refer- ence node, at the same level. </p><p>Move </p><p>The move operation will move the resultant reference nodes from the evaluated argument path to become the child of the nodes in the current context. As with xfn: create () and xfn: copy(), there are xfn: move-before () and xfn:move-after () operations. For example, the following will move the Ratings of Entrees to just after the Price elements of Entrees: </p><p>*/Restaurant/Entree/Price/xfn: move-after (//Restaurant/Entree/Rating) </p><p>Note that before the actual move operation is executed, vaUdity checking needs to be done to ensure that ancestor nodes are not being moved to become the descendant nodes of the nodes in the current context. Otherwise, nodes in the current context would become invahd and no longer accessible from the root entry point(s). </p><p>Update </p><p>The value of an element or attribute can be updated by using the update function with the new value as an input argument. For example, the following will update </p><p>Name and Price of the 2nd Entree of each Restaurant to "Onion soup" and "2.04" respectively. Brand Computers —• Item - • d Qty Brand / Product - 2 ^ Harddisks —^ Item Qty' a — b=Cc) Components - Price ^ Monitors • Item —• Brand </p><p>(a) The tree pattern corresponding to (b) DataGuide of Figure 6.1(b) a/b//c[d] [*/e] </p><p>Figure 6.2: Tree Pattern and DataGuide </p><p>• */Rest aurant/Entree [2]/Naine/*/xfn: update ("Onion soup") </p><p>• */Restaurant/Entree[2]/Price/*/xfn:update("2.04") </p><p>Update can also be used to update the tagname of an element. For example, the following will rename all the Restaurant tag names to Cafes: </p><p>*/Restaurant/xfn:update(Cafe) </p><p>6.5 Data Structure and Algorithms </p><p>Definition A DataGuide is a dynamically generated structural graph summary G of database graph D, purposed by Goldman and Widom [41], we limit it to tree instead of graph and modify the definition for the purpose of our chapter. Every root-to- leave path instinct of D can be found on G, all root-to-leave path instinct of G can be found on D, but every root-to-leave path of G must be distinct. Figure 6.2(b) is an example of DataGuide derived from Figure 6.1(b). </p><p>The main problem of having a descendant operator in an XPath fragment is that a particular tag name can occur in different depth, however, the root to node path must be unique. We can eliminate the descendant operator by maintaining the dataguide instance at runtime, and match the XPath fragments with the descendant against the dataguide and simplify it to multiple XPath fragments without the descendant operator. a) Example of sorted XPath b) Client Table & Containment Index before creating Containment Index cid path [ cid path ptrs 10 / 1 //a//b 12 /* 2 //a 15 /a 3 /b/a 6 fb 4 /a/*/d 16 //* 5 /a//a [m ino Sd~5d ozi 2 //a 6 /b 7 /a/b/a/c/a Dp: 9 /a/b CID 8 /e/f/g 13 /a/c I 7 ^ cid index node 9 /a/b 5 l2Jl2i 10 11 / c) row in client table IdJIh 11 /a//b 3 /b/a 12 /* 1 //a//b 13 /a/c 14 /a/b/a 14 /a/b/a 4 15 /a 8 /e/f/g 16 //* 7 1 /a/b/a/c/a| I -1 I /c II path token nodes </p><p>Figure 6.3: Data structure of Containment Index </p><p>6.5.1 Merging Simple Path Expressions </p><p>A naive approach to merging subscriptions has very poor performance. For instance, whenever a new subscription is created, it needs to be checked against all existing subscriptions or groups of subscriptions to determine if it is overlapping with any of them. </p><p>To address this problem, we present a the following index structure. This index structure is briefly illustrated by the diagram shown in Figure 6.3b. </p><p>With this index structure, we are able to improve the performance of merging sub- scriptions substantially as it captures the subscription containment relationships between chents. The Containment Index is a directed acychc graph (in practice, it is a tree with some index nodes pointed to by more than one parent nodes). Each index node holds a hst of cHent identifiers {cid). Each cid uniquely identifies a sub- scription cUent. The parent-child relationship of the index structure represents the subscription containment relationship, in which the data interested by each cid of an index node is a superset of the data interested by the cids of all its child index nodes In other words, data of interest to the cids of an index node are also of interests for those cids of its parent index node. Cids held by the same index node implies equivalence, i.e., the clients share the same interest or subscription. </p><p>Each index node contains the following variables: </p><p>Cids: Ghent Identifiers. Note that the maximum number of client identifiers an index node can hold depends on an adjustable, predefined constant. </p><p>Next Pointer: For performance and efficient implementation of the paging mech- anism, index node is implemented as a fixed size block. If the number of cids exceeds the maximum number allowed, another index block will be created and it will be chained to the current block using next pointer (in a hnear manner). </p><p>Running level: Every XPath expression of cids in the same index node has the same number of path tokens. Running level is an integer value to represent the number of tokens each XPath expression has, for each cid in the same index node. The running level of an XPath expression containing descendant operators is treated as if the expression was expanded with respect to the schema of the document (e.g., DTD). </p><p>Note that the running level of an index node, which has XPath expressions containing descendant operators, changes at run-time depending on the other path expressions in the Index. This is based on the assumption that no schema is provided. </p><p>It is a requirement of the Containment Index that each path token, for a given path, has to be represented by an index node. For example, the Containment Index in Figure 6.3b has an index node which contains cid = -1. In this case, we suppose that /c/a exists in the database, hence, the index node containing cid = -1 acts as a 'dummy' node for the path token /c for cid = 2. Tokenization </p><p>The XPath parser used in our prototype development is an event-based parser which breaks XPath expressions into path tokens via callback functions. Each wildcard operator (*), child operator (/), descendant operator (//) and literal (e.g. Stock) is considered as a single path token. Although predicates ([]) need to be checked for subscription dependency, they will be treated separately using the technique similar to the one presented in [6 . </p><p>The index structure also stores the parse tree of each XPath expression. Common subscriptions can be located in constant time using the Client Table (Figure 6.3b). The Client Table stores basic information about each chent as well as its XPath expressions. </p><p>A path token node contains the following variables (see Figure 6.3c) along with other runtime variables based on the parse tree structure: </p><p>Start Position: The starting character's position of the representing path token in the XPath literal string. </p><p>End Position: The last character's position of the representing path token in the XPath literal string. Together with the Starting Position attribute, The end position can be utihzed for string comparisons. </p><p>Filters and Predicates: There is a list of parsed predicates from the XPath parse tree. This hst allows the index engine to further refine the disjoint detection mechanism, especially for those with predicates. </p><p>Each tokenized XPath expression is annotated with its total number of tokens, which is the running level of the expression. To construct the index from a set of existing subscriptions, all XPath expressions will first pass through the tokenizer. They are then sorted by the number of path tokens in increasing order. The second criteria for the sorting is by the following path token order: </p><p>130 path op ^ descendant op ^ wildcard op literal </p><p>Figure 6.3a shows a sorted list of tokenized XPath expressions that respects the above ordering. The sorting will enable construction of the whole Containment Index in effective order. </p><p>Insertion </p><p>When a tokenized XPath expression is inserted into the Containment Index, it starts from the root index node and keeps track of the current running level (1) variable. This is necessary as the Containment Index represents the overlap between subscriptions, not the XPath expression itself. The depth of the Containment Index does not directly correlate to the position of path tokens in the XPath expression that we are comparing. Therefore the running level is necessary to identify the token in the XPath token hst that is being compared. </p><p>Traversing the Containment Index and inserting an XPath expression without wild- card and descendant operators is simple. We first describe the algorithm of insertion by assuming no wildcard or descendant operators are available. The algorithm can then be extended to include the handling of wildcard and descendant operators. For clarity, present the algorithms below using recursion, while their actual implemen- tations use an iterative approach. </p><p>When a cid is inserted to an index node, the reverse pointer in the Client Table for the current cid is also inserted for quick look up. Also when a new index node is created, its running level is set according to the running level of the given XPath expression. Algorithm 6.1 INDEX-NODE-CREATE(dii) I I return a new empty indexnode n ^ ALLOCATE-INDEX-NODE() for z ^ 1 to CI DM AX do n.cid oo n.{parent^child^sibling^next} 0, n.d(i[0] cid n.runLvl client[cid].PE.sizeO - 1 return n IS-EQUIV-IN {cid,node,I) while node ^ 0 do for 2 ^ 0 to CI DM AX do if node.cid[i] — oo then return false if client[node.cid[i]].PE.token{l) = then client[cid].PE.token{l) return true node node.next return false </p><p>6.5.2 Handling Wildcard/Descendant Operators </p><p>The pseudo-code above greatly simplifies the insertion of a client subscription to illustrate the main structure of the algorithm. This was done by disregarding all issues involving wildcard (*) and descendant (//) operators. A wildcard operator is treated as the parent for all literal operators if all their ancestors, without predicates, are equal. For example, /a/b/* is the parent of /a/b/c. During insertion, if the current path token of the XPath expression being inserted is a wildcard operator, instead of checking the node's last child, we need to perform the insertion on every child node. This idea is illustrated in the pseudo-code below. </p><p>Every XPath expression that contains descendant operators has to be checked against the schema of the XML documents in the server. This checking process involves the retrieval of all possible paths in the schema and inserting them accord- ingly However as the elements of the schema form an acyclic graph and due to the </p><p>132 Algorithm 6.2 </p><p>CLIENT-INSERT (cid) 1: T.root ^ CONTAm-mSERT{cid,T.rootfi) 2: // node is the root of subtree for insertion I is runninglevel denotes which 3: token to check and assume all inserting PEs are pre-sorted CONTAIN-INSERT {cid,node,l) 1: if node = cj) then 2: return INDEX-NODE-CREATE(dd) 3: if node.runLvl = I then 4: if IS-EQUIV-IN(di/,noc/e,/) then 5: if client[cid].PE.size{) -1 = 1 then 6: node.insertCidicid) 7: client[cid\.ptr node 8: else 9: static c 10: if c.parentQ ^ node then 11: c ^ node.firstChildQ 12: while c ^ (/) A -IS-EQUIV-IN(dc/,c,/) do 13: c c.nextSibling{) 14: n ^ CONTAIN-INSERT(dii,c,/+l) 15: if n ^ c then 16: node.insertChild{n) 17: return node 18: else 19: return INDEX-NODE-CREATE(dci) 20: else 21: if IS-EQUIV-IN(dii,noc/e,/) then 22: n ^ INDEX-NODE-CREATE(-l) 23: n.runLvl I 24: n.insertChild{node) 25: CONTAIN-INSERT(dd,n,/ + 1) 26: return n 27: else 28: return INDEX-NODE-CREATE(dd) Algorithm 6.3 </p><p>I I insert after line 3 in CONTAIN-INSERT if client[node.cid[{)\].PE.token{l) = '*' then if client[cid].PE.token{l) = '*' then if client[cid].PE.size{) - 1 = / then node.insertCid{cid) client[cid].ptr node else n c G node.childsQ s.t. c.dc/[0] = '*' 9 if n 0 then 10 CONTAIN-INSERT(dd,n,/+l) 11 for each c G node.childsQ - n do 12 CONTAIN-INSERT(di/,c,/) 13 else 14 n ^ CONTAlN-mSERT{aid,node.lastChildO,I) 15 if n node.lastChildQ then 16 node.insertChild{n) 17 return node </p><p>18 else </p><p> nature of the inchision, only the first occurrence of the expression in such cycles will </p><p> be considered. For example, if /a/b/a exists in the schema, //a will expand as /a </p><p> only. </p><p>6.5.3 Synchronization Engine </p><p>When a mobile chent issues an update request and sends it to the server, the Integra-</p><p> tion Module communicates with the XML Database. If the transaction is successful, </p><p> it passes the mobile chent identifier (cid) and the query (q) to the Synchronization </p><p>Engine. The Synchronization Engine locates the pointers associated with cid in the </p><p>Chent Table. It then uses the pointer (or pointers if the XPath expression con-</p><p> tains wildcard and/or descendant operators) to locate the index nodes within the </p><p>Containment Index which contains the client identifier. Notices that only XPath ex- pressions with wildcards and/or descendant operators will contain a list of pointers, other XPath expressions will only be pointing to a single node. </p><p>In the non-enhanced version of XSync, once the index node is found, all the client identifiers which are in index nodes that are ancestors and descendants of the orig- inal index node will be broadcast the update. Although this approach achieves a relatively good result, it can be greatly improved. </p><p>The equivalence binary operator (=) always evaluates to true when comparing a wildcard operator to a literal string. </p><p>6.6 Enhancements of Synchronization </p><p>As all ancestor index nodes in the DataGuide represent subscriptions to data that are supersets of the subscription node itself (without considering predicates), it is necessary to forward all updates performed by a client to it's ancestors. However, this is not the case for descendants. If the client subscription covers a large portion of the XML document, forwarding updates to all descendants will result in a large amount of communications between clients and the Synchronization Engine. How- ever by combining a mobile client's update query with its own subscription XPath expression, the Engine is able to compute a disjoint set in its descendant nodes. </p><p>For example, if a chent with a subscription /a issues an update operation: </p><p> xfn:client(15)/d/e/f/xfn:update("g") </p><p>The Engine can merge the update query with its XPath subscription to form a new path expression /a/d/e/f. It can then match these against the descendants of the </p><p>DataGuide node containing client 15. In this example, all descendants will match as disjoint and thus all chent IDs in the subtrees are not considered as part of the set of broadcasting clients. Algorithm 6.4 CLIENT-SEARCH(dd, q) </p><p>1: 2: for each ptr G client[cid].ptr do 3: node ^ptr 4: // include equivalent PEs 5: C <— C U Vq, Ci G node.cid 6: for each c G node.parentsi) do 7: C ^ C U CLIENT-SEARCH-UP(c) 8: if processLoad{) > bandwidthLoadQ then 9: for each c G node.childsQ do 10: C ^CU CLIENT-SEARCH-DOWN-ALL(c) 11: else 12: for each c G node.childsQ do 13: C ^CU CLIENT-SEARCH-DOWN (c, g, 0) 14: return C CLIENT-SEARCH-UP (node) C <r— node.cid 2: for each n G node.parentsQ do 3: C ^CU CLIENT-SEARCH-UP(n) 4: return C CLIENT-SEARCH-DOWN-ALL {node) C node.cid 2: for each n G node.childsQ do 3: C CLIENT-SEARCH-DOWN (n) 4: return C CLIENT-SEARCH-DOWN {node, q, I) 1: C node.cid 2: c noc?e.do?[0 3: if client[c].PE.token{l) = q.token{l) then 4: for each c G noiie do 5: C ^C UC 6: for each c G noc/e.childs() do 7: C ^ C U CLIENT-SEARCH-DOWN(c, g, /-hi) 8: return C The update statements in Section 6.4 can be classified into two categories: </p><p>Statements that do not affect other disjoint paths: These statements include xfn: insert () and xfn: delete (). If the statements do not contain wildcards </p><p> or descendant operators, the Engine executes CLIENT-SEARCH-DOWN. Oth- erwise, it expands the descendant operator to determine all unique paths from the DTD. For each of these paths the Engine searches the descendant nodes and checks the path tokens against the path of update statement. Eventually either the Dataguide reaches a leaf node or the path of the update statement runs out. At that point, the current DataGuide node and its descendants are treated as affected. </p><p>Statements that affect other disjoint sets: In this situation, the Engine has to perform two separate steps of overlapping expression detection, hence increas- ing the runtime cost. Firstly, we need to check the overlap for the target path as mentioned above, then the sub path expression has to be treated as a sepa- rate update, searching from the root DataGuide node as with a normal search. All client IDs located by the two overlapping expression detection mechanisms represent clients that are affected and have to be notified of the update. Ex- amples of update statements in this category include xfn:move(PE), where PE is the sub path expression. </p><p>6.6.1 Update Merging </p><p>When mobile clients perform updates on their local cache of the database, they forward each update to the server so as to allow the server to forward the updates to appropriate clients. The server determines whether an update should be forwarded to a given client based on that cHent and the updating cHent's subscription. The server only forwards updates to those clients which are interested in the update. </p><p>Consider the situation where Client A has an overlapping subscription with Ghent B and Client C has an overlapping subscription with Ghent B. When both Client A and C perform updates to their local cache, their update operations are forwarded to the server. A naive solution to keeping Client B up-to-date would involve broadcasting two separate update operations to Client B. However, XSync performs a merge between the two operations and encapsulates the operations into a single message to Client B. Hence, this reduces the communication costs between the clients and server. </p><p>However, being able to merge several update operations from different clients into a single message leads to issues of conflict detection and resolution. In the situation where clients perform updates on the same subset of nodes remotely, their update operations may be conflicting in terms of their target and/or sub path. Hence, conflict detection is necessary to merge the updates of several mobile clients. </p><p>To analyze the problem of update merging, we first consider a specific example involving two clients. We next generalize our analysis to merging the update oper- ations of n clients that are forwarded to the server. </p><p>Consider the simplified problem where two clients (CHent A and Chent B) have an overlapping subscription, where both have chent IDs in the same DataGuide node, and each perform an update operation on their local database. The updates performed by Client A and Client B are forwarded to the server and it is the re- sponsibility of the server to detect and resolve any conflicts, while forwarding these updates to the appropriate chents. </p><p>To do this, the server constructs a Containment Index structure similar to Fig- ure 6.2(b), using the algorithm described in the previous section. In contrast to the figure, the Containment Index captures the containment relationships among the update operations performed by the chents. Hence, instead of client IDs stored in each DataGuide node, oids are stored. The server also maintains an Operation Table (similar to the Client Table) which contains basic information about each operation including: </p><p>Oid: The Operation Identifier of the update operation. Each operation has a unique </p><p>138 Oid, hence, an operation with a sub path has a different Oid from the operation </p><p> with its target path. </p><p>Cid: The Ghent Identifier of the chent which performed the operation. This allows </p><p> the identification of conflicting operations between clients. </p><p>Operation: The type of operation that was performed on the target path (e.g. </p><p> xfn: insert (PE)). </p><p>Tcirget Path: A value which indicates if the path being represented is the target </p><p> path or the sub path. This aids in the conflict detection of overlapping paths. </p><p>Types of Conflicts </p><p>Given two edit operations, a Gonflict occurs if and only if they have paths that are overlapping. We define two disjoint subclasses of Gonflict: </p><p>Direct Conflict (DC): A DG is a Gonflict such that the order that the operations </p><p> are carried out is important. That is, if update operations x and y are in DG </p><p> then one of the operations, x or y, cannot be performed if the other operation </p><p> is performed first. For example, let x be an insert operation and y be a delete </p><p> operation. If y is performed first, x cannot be performed as it deals with a </p><p> node that has already been deleted. </p><p>This situation occurs when one of the operations in DG is xfn:update() or </p><p> xfn:delete0 and their target paths are in Gonflict, or one of the operations </p><p> is a move operation and its subpath is in conflict with the path of the other </p><p> operation. Note that the xfn:update() operation may participate in a DG, </p><p> as the operation can update the tagname of an element. </p><p>Syntax Conflict (SC): A SG occurs when two update operations are Conflicting </p><p> in terms of their target path or (if apphcable) the sub path of one of the </p><p> operations is Conflicting with a path of the other operation. The order in op opAfter opBefore op Yes No No opAfter No Yes Yes opBefore No Yes Yes </p><p>Table 6.1: SC between update operations in the same index node </p><p> which two SC operations are performed on the database affects the resulting database as we are deahng with the ordered model. </p><p>Table 6.1 hsts the update operations that are in SC, given that the operations are in the same DataGuide node, op indicates an update operation including insert^ move and copy. </p><p>It is noteworthy that in Table 6.1, opBefore is in SC with op After. This occurs in the situation where one of the operations is position specific. For example, letx = a/b[0]/xfn:insertAfter(c) andy = a/b [0]/xfn: insertBef ore(b) X and y are in SC because if x is performed first, then y, the resulting database would be different from when y is performed before x. </p><p>Note that despite the ordering of a pair of update operations that are in SC, the operations can still be applied ot the database. This is not the case for operations in DC. </p><p>Conflict Detection and Resolution </p><p>Once the update operations of Chent A and Client B have been processed to con- struct the Containment Index, we traverse the data structure to identify path con- flicts. This is similar in concept to the steps carried out by the server in response to the update by chent 15 at the beginning of Section 6. By constructing the Con- tainment Index based on Chent A and B's update operations, we are able to detect conflicts. </p><p>In order to resolve conflicts, we have to consider each subclass of Conflicts individ- ually. Direct Conflict (DC): The architecture of XSync impUcitly orders the update operations that it receives from its chents. That is, the server receives the update operations serially. Hence, for operations that are in DC, if the opera- tions are received in an order such that both operations can be performed to the database in that order, then the conflict has been resolved. </p><p>On the other hand, if the order in which the DC operations arrive at XSync result in one of the operations not being able to be performed on the database, XSync provides a resolution to this conflict. On detection of such a conflict, XSync selects an operation (out of the two in DC) to undo based on the DC resolution rules listed below: </p><p>1. A delete operation is always selected to be undone over any operation. </p><p>2. A move operation is always selected to be undone over any operation if </p><p>Rule 1 does not apply. </p><p>3. A update operation is always selected to be undone over any operation </p><p> if Rule 1 and 2 do not apply. </p><p>The rules are listed in order of precedence. That is, Rule 1 is evaluated first </p><p> and if it does not apply. Rule 2 is evaluated, etc. </p><p>The intuition behind the resolution rules listed above is to undo the opera- tion which has a more 'costly' effect on the database. For example, delete operations are always chosen by XSync to be undone because the operation, if executed, would result in a significant amount of data being removed from the database. </p><p>Syntax Conflict (SC): As XSync receives update operations from its clients in a serialized manner, the order that the SC operations are performed has already been resolved. However, given this serialized order, some operations that are in SC still may not be able to be apphed directly onto the database. </p><p>For example, given two insert operations with the same target path, one of the operations will have to be modified syntactically to allow it to be performed on the database. This is because, after the first operation has been executed, the target path is no longer a leaf node and hence an insert operation cannot </p><p> be performed on it (rather the operation has to be changed to an insertAfter </p><p> with a modified target path). </p><p>After all the conflicts have been resolved, we forward the merged sequence of up- date operations to all clients that have overlapping subscriptions with the initiating client (s). </p><p>Note that the responsibility for conflict detection and resolution is with XSync. That is, clients need not provide facilities to deal with conflicts. Hence, this reduces the complexity of the clients that communicate with the server. </p><p>Multiple Clients </p><p>The above solution can be generahzed to n number of clients. In this case, the </p><p>Synchronization Engine maintains a single Containment Index for all client updates that arrive at the server. The Engine periodically broadcasts update operations to the appropriate cHents based on the Containment Index for operations. Once the appropriate clients have been notified, the corresponding operations can be deleted from the Index. </p><p>The Containment Index is constructed by extracting all paths associated with each chent's update operation, passing each path through the tokenizer (detailed in Sec- tion 5.1.1), sorting the tokens according to path token order and finally inserting each path into the Index. </p><p>The Engine first executes DETECT-CONFLICTS to detect and resolve any conflicts. </p><p>It then executes UPDATE-MERGE to broadcast the update operations issued by the clients so far to all apphcable clients. </p><p>Note that this solution also scales to the situation where a chent issues several update operations to the server before the server broadcasts the updates to all the appropriate clients. Algorithm for Detecting and Resolving Conflicts </p><p>The algorithm below detects conflicts between n client update operations. </p><p>Algorithm 6.5 Detect conflicts between update operations of different clients, where C is the set of clients that forwarded update operations to the server. DETECT-CONFLICTS (C) </p><p>1: dc (p 2: for each cid G C do 3: cidSet ^ CLIENT-SEARCH(ciii) - {cid} 4: dcU DETECT-DIRECT-CONFLICT(ROOT(T), cid, cidSet) 5: RESOLVE-DlRECT-CONFLICT(SORT(dc)) 6: for each cid G C do 7: cidSet CLIENT-SEARCH(C2C/) - {cid} 8: DETECT-SYNTAX-CONFLICT(ROOT(r), cid, cidSet) </p><p>To detect DCs involving the delete operation involves traversing through the Con- tainment Index. All aids that are in nodes which are descendants of a delete oper- ation node are in DC. Similarly for the move and update operation. We then have to undo some operations (if applicable) in order to resolve the conflict. In order to handle the situation when a client forwards multiple update statements and one of the operations have to be undone in RESOLVE-DIRECT-CONFLICTS, we keep track of the clients (undoCids) which have operations that have to be undone. </p><p>We define a function FILTER(dci, Ops), for use in DETECT-DIRECT-CONFLICTS, which returns a subset of Ops which were operations performed by cid. </p><p>We also define =op to be a function in DETECT-SYNTAX-CONFLICTS that returns true if and only if its arguments correspond to a 'Yes' entry in Table 6.1. Qp-</p><p>DELETE deletes the argument oid from the Containment Index, while OP-INSERT inserts the argument into the Containment Index. The implementation for Qp-</p><p>INSERT is equivalent to CLIENT-INSERT. </p><p>We also define extra parameters in the operation table for each operation to maintain information on whether the operation was modified in order to resolve a SC and if so, Algorithm 6.6 Detect direct conflicts between update operations of different clients, where node is a DataGuide node that contains a Ust of cUents DETECT-DIRECT-CONFLICTS {node, aid, cidSet) // L is a hst that contains pairs of operations that are in DC. I I It is constructed such that the first of each pair is the operation that I I has to be undone later. Also, DCs with delete operations are considered I I first followed by move operations and finally update operations. 1: C ^ (/), {D, M, t/} ^ [ 2: N ^ FILTER(DC/, node.oidQ) 3: for each o E N do 4: if operation[o].op{) = ^^delete" then 5: D ^ DVJo 6: elif operation[o].op{) = ^^move^' A ^operation[o\.tar get Path then 7: M M U o 8: elif operation[o].op{) = ^^update^^ then 9: U^UDo 10: 11: for each child n G node do 12: for each o G FlLTER(dc/5ei, n.ozii()) do 13: for each / G L do 14: if operation[o],order > operation[l].order A (/,o) ^ C then 15: C^CU (o, I) </p><p>16: C ^ C U DETECT-DLRECT-CONFLICTS(N, cid, cidSet) 17: return C </p><p> it's original target/sub path. We also include the order that the operation arrived at </p><p> the server in order to resolve operations that are in conflict. NEW-PATH generates </p><p> the new path that should be the target/sub path of oid in order to resolve the SC. </p><p>Algorithm for Update Merging </p><p>On receipt of m update operations from n chents, C, XSync performs the algo-</p><p> rithm detailed above to detect and resolve conflicts. XSync then executes UPDATE-</p><p>MERGE(L) where L is the list of operations returned by DETECT-CONFLICTS() </p><p>We assume that the client has facilities to undo specific operations. </p><p>144 Algorithm 6.7 Resolve a list direct conflicts S between different clients that has to be undone by the order they were received </p><p>RESOLVE-DIRECT-CONFLICTS (S) 1: undoCids [ 2: for each {undoOp^ conflictOp) e S do 3: if 3{cid, order) € undoCids s.t. operation[undoOp].cid = cid then 4: continue 5: if -^3{cid, order) G undoCids s.t. operation[conflictOp].cid = cid A operation[conflictOp\.order > order 6 op operation[undoOp 7 undoCids ^ undoCids U {op.cid, op.order) 8 for each {cid, o) G undoCids do 9 for all op G operation s.t. op.cid = cid A op.order > o do 10 op.orig ^ op.PE -h op.op{) 11 op.op ^^undd^ </p><p>We handle chents in C differently from clients with overlapping subscriptions with clients in C. This is because operation(s) have already been carried out on their local cache database, and hence may need to be undone to maintain consistency. </p><p>6.7 Performance Evaluation </p><p>6.7.1 Settings </p><p>In this section, we present our cost model for calculating the worst and average case scenario. Let |F| denotes the total number of XPath expressions, Davg as the Z) Dp average number of path tokens in all expressions calculated by and S is the number of cids stored in one index node. Algorithm 6.8 Detect and resolve syntax conflicts DETECT-SYNTAX-CONFLICTS {node, cid, cidSet) N < FiLTER{cid,node.oid{)) 4> for each o e N do if operation[6\.op{) ^ {"^¿pí¿aíe", ^^delete'\ 'Wc/o"} then L^ LVJo for each n G node.childsQ do for each o G FlLTER{cidSet, n.oid) do for each / G L do 9 if operation[o\.op =op operation[l].op then 10 if operation[o\.order > operation[l].order then 11 OP-DELETE(O) 12 operation[o].changed true 13 operation[o].orig operation[o].PE 14 operation[o].PE <— NEW-PATH(/) 15 OP-INSERT(O) 16 else 17 OP-DELETE(/) 18 operation[l] .changed true 19 operation[l].orig operation[l].PE 20 operation[l].PE ^ NEW-PATH(O) 21 OP-INSERT(/) 22 DETECT-SYNTAX-CONFLICTS(N, cid, cidSet) </p><p>XPath Expression Insertion Cost </p><p>Tokenization and Sorting of the original XPath expression set P is 0{\P\ • log\P </p><p> avg Jan d the insertion cost to the Containment Index is bounded by 0{S-Davg'\P\) Therefore, the cost of generating the whole index is: </p><p>0{\P\-log\P\-Davg + S'Davg-\P </p><p>For deletion, it is trivial, near constant time. Algorithm 6.9 </p><p>UPDATE-MERGE (L) message <r- for each o G L do if operation[o].op{) = ''UNDO'' then op operation message[op.cid] UNDO {op.or ig) + message continue C ^ CLIENT-SEARCH(o.dii, o) for each c € C do 9 if c = o.cid then 10 if operation\o\.changed then 11 op operation 12 message[c\ \]NDO{op.orig) + message[c] + o 13 else 14 message[c].append{o) 15 else 16 message [c].appenc?(o) 17 for each c € message do 18 SEND-UPDATE(messape[c], cid) </p><p>Search Cost </p><p>Practically, both faninavgO and fanoutavgO are very close to one. </p><p>C = 1 when the Synchronization Engine broadcasts to all descendants. </p><p>6.7.2 Experimental Results </p><p>The description of the experiment parameters are shown in Table 6.2. </p><p>For simplicity, we assume each mobile client (cj G C) contains only one query {qi e Q) to describe its cache, g^iength is defined as the total number of path tokens Parameter Range Description </p><p>Q 1 - IM # of queries </p><p>^length 0- 10 Length of query </p><p>'^length 0- 10 Length of update </p><p>I 1 - 20 Average number of cids / index node </p><p>DTD 1 - 100 Size of DTD </p><p>Max{Xiength) 10 Max. Depth of XML element </p><p>Table 6.2: Experiment Parameters </p><p> in a query and i^iength is defined as the total number of path tokens in the update </p><p> statement that a particular client issues. The size of the DTD is defined as the total </p><p> number of unique element names in the Document Type Definition. </p><p>We develop a random DTD generator which generates DTD with \DTD\ as input </p><p> to hmit the number of element names. The DTD that is generated also allows for </p><p> cyclic inclusion of element names for testing the correctness of descendant operator. </p><p>Also we made our own simple random XPath expression generator which can produce </p><p> client queries and updates according to the Q'length and ^¿length length parameters. </p><p>Queries that are generated using g'length are used to populate the index and they </p><p> do not contain extended method invocations such as xfn:move() or xfn: delete(). </p><p>Also the XPath expression generator accepts extra parameters to limit the number </p><p> of wildcard, descendant operators and their position within the expression. </p><p>The maximum depth of XML elements in the XML document has a direct relation-</p><p> ship with the length of the query and the length of update. As any mobile client can specify its subscription to be the leaf node of the XML tree, g'length cannot exceed </p><p>Max(Xiength)- Also, the combined length of the update and query of a particular client cannot exceed Max (Xiength)-</p><p>For example, in Figure 6.4 and Figure 6.5, the length of an update is fixed, therefore only cHents with giegnth < Maa:(Xiength) - 'i^iength would be considered. 50000 </p><p>45000 -</p><p>4 6 Length of update </p><p>Figure 6.4: # of Operations / Length of Update </p><p>CLIENT-SEARCH-DOWN-ALL, </p><p>\Q\ = 1M,|/| = 20,\DTD\ = 100,Maa;(Xiength) = 10,giength = O..Marr(Xiength) - ^¿length </p><p>Figure 6.4 shows the number of operations required to search for overlapping clients when a client sends an update statement to the server. The number of queries are fixed at 1,000,000 and the method of search only utilizes client subscriptions, ignor- ing client updates. In this situation, the server forwards updates to both ancestors and all its descendants. </p><p>Simple PE refers to XPath expressions which do not contain wildcards or descendant operators. We also specifically chose to test path expressions which included only one wildcard or descendant operator for both the chent query and its update. However we did not limit the location of such an operator within the query or update. By doing so, we were able to analyze the behaviour of the most complex operators in the expression. </p><p>As the length of an update is fixed at i/iength, we randomly chose a client with its ^'length between 0 (e.g. /) and Max{X\ength) - "i^iength- Therefore, this graph shows a scalability of (9(^¿length^). However, ixiength can only approach Max{Xiength) as any updates that are any longer will always yield no result. Hence, the worst case is the same as simple broadcasting. This is applicable when the client's view is the whole document and the update statement is ignored. </p><p>Operations / Length of update 8000 simple PE —I- PE with a wildcard —x- PE with a descendant op 7000 </p><p>6000 </p><p>5000 </p><p>2 4000 </p><p>3000 </p><p>2000 </p><p>1000 1;: * 4 6 10 Length of update </p><p>Figure 6.5: # of Operations / Length of Update </p><p>CLIENT-SEARCH-DOWN, </p><p>IQI = 1M,|/| = - 100,Maa;(Xiength) = 10,giength = 0..Max(Xiength) - ^¿length </p><p>In contrast to Figure 6.4, Figure 6.5 shows the number of operations required to search for overlapping clients by matching the path expression to the update state- ments issued by the chent. Comparing this with Figure 6.4, both have similar costs when ^length is equal. By checking the update statement, the cost is significantly reduced when the length of the update statement increases. </p><p>Instead of fixing the length of update statement and randomly choosing chent queries, Figure 6.6 and Figure 6.7 shows the behaviour of the system when the length of the update statement is random. It is interesting to note that although the length of the query and length of the update has an inverse relationship in these graphs, Figure 6.5 and Figure 6.7 illustrate a similar curve, unhke Figure 6.4 and Figure 6.6. This is because when the length of the update or the length of the query is short, the overall length, on average, with the chosen counterpart is short as well. Together with the randomness for the location of the operator, it is more likely that Operations / Length of query 60000 </p><p>50000 ; : </p><p>40000 -</p><p>5 30000 -</p><p>20000 </p><p>10000 </p><p>4 6 10 Length of query </p><p>Figure 6.6: # of Operations / Length of Query </p><p>CLIENT-SEARCH-DOWN-ALL, </p><p>IQI = 1M,|/| = 2Q,\DTD\ = 100,Maa;(Xiength) = 10,iiiength = O..Maa;(Xiength) - ^length </p><p>Operations / Length of query 45000 : simple PE —H- PE with a wildcard —x- PE with a descendant op 40000 </p><p>35000 </p><p>30000 </p><p> c 25000 I § 20000 </p><p>15000 </p><p>10000 </p><p>5000 t » 0 4 6 10 Length of query </p><p>Figure 6.7: # of Operations / Length of Query </p><p>CLIENT-SEARCH-DOWN, </p><p>IQI = 1M,|/| = 20,\DTD\ = 100,Maa;(Xiength) = 10,wiength = 0..Marc(Xiength) - ^length a large subtree is included. </p><p>Operations / Number of queries </p><p> simple PE —I PE with wildcard —x-* / PE with descendant op^*-*-* kf / iKX ^ : ^ * </p><p>200000 400000 600000 800000 1e+06 Number of queries </p><p>Figure 6.8: # of Operations / Number of Queries </p><p>CLIENT-SEARCH-DOWN-ALL, </p><p>|/| = 20,\DTD\ = 10,Maa:(Xiength) = 10,giength = 3..5,liiength = 0..Max(Xiength) - ^length </p><p>Figure 6.8 and Figure 6.9 shows the growth of the search time against the number of queries, ^length and liiength is chosen at random and the average number of operations it taken. It is observed that the growth of the search time is logarithmic. In addition, the size of the DTD was chosen relative to the number of queries. This is the main factor resulting in the growth in the number of operations for complex queries (such as those which contain the descendant operator). </p><p>Figure 6.10 and Figure 6.11 illustrates that as the size of the DTD increase, the number of expressions that can be obtained by expanding an update operation with descendant operators decreases, and hence reducing the cost of a search. </p><p>Notice that from Figure 6.8 to Figure 6.11, the performance of PE with a wildcard is only slightly higher than the simple PE because g'length is chosen to be from 3 to 5 Other parameters were also set at such reahstic values. However if we increase the fanout value by lowering the depth of the XML Document (Max (Xiength)) or by reducing the size of the DTD, the cost of searching wildcard queries and updates Operations / Number of queries 1600 simple PE —I— ^ PE with wildcard PE with descendant op ji'-^-v I 1400 * ** " </p><p>1200 </p><p>! </p><p>1000 </p><p>2 800 </p><p>600 </p><p>400 ** </p><p>200 </p><p>200000 400000 600000 800000 1e+06 Number of queries </p><p>Figure 6.9: # of Operations / Number of Queries </p><p>CLIENT-SEARCH-DOWN, </p><p>|7| = 20,\DTD\ = 10,Maa;(Xiength) = 10,^length = 3..5,Wlength = O..Max{Xiength) - ^length </p><p>Operations / Size of DTD 1600 simple PE —H- PE with a wildcard —^x- PE with a descendant op — 1500 </p><p>1400 -</p><p>1300 -</p><p>* 4, i 2 1200 - (D Q. o * Hi' X „ M M , »« ^ • ' » f , 1100 h </p><p>1000 -</p><p>900 -</p><p>800 0 10 20 30 40 50 60 70 80 90 100 Size of DTD </p><p>Figure 6.10: # of Operations / Size of DTD, simple traversal </p><p>CLIENT-SEARCH-DOWN-ALL, </p><p>IQI = 1M,|/| = 20,Maa;(Xiength) = 10,ilength = 3..5,iilength = O..Maa;(Xiength) - ^length 2000 1 1 1 1 1 1 1 1 r— simple PE —1— * PE with a wildcard —x— 1800 PE with a descendant op ---•*--- -</p><p>1600 </p><p>1400 "i - * 1200 -</p><p>0E) 1000 800 s - 600 </p><p>400 </p><p>200 • •-f • T-rr-r • -T' -TT^-'T T -rr -T>t-r-• -T-I-T^ -i- x ^ -r- jpi-r T-T-T T- • • i-tH-t-nf </p><p>0 1 1 1 1 1 1 1 1 1 10 20 30 40 50 60 70 80 90 100 Size of DTD </p><p>Figure 6.11: # of Operations / Size of DTD </p><p>CLIENT-SEARCH-DOWN, </p><p>IQI == 1M,|7| = 20,Maa;(Xlength) = 10,giength = 3..5,^¿length = O..Max(Xiength) - ^length increases. Decreasing ^legnth will also have the same effect, as shown in Figure 6.7. </p><p>6.8 Conclusions </p><p>In this chapter, we presented an efficient synchronization server for handling mobile </p><p>XML data. The proposed server, XSync, consists of an Integration Module (for com- munication with the XML database) and a Synchronization Engine (for handling all synchronization issues). The Synchronization Engine utilizes a sophisticated index structure, which provides a significant improvement compared to current methods available. We also explored several enhanced synchronization algorithms for update merging and disjoint predicates and ranges, to further improve the performance of the system. Chapter 7 </p><p>Conclusions </p><p>Every advantage in the past is judged in the light of the final issue. </p><p>— Demosthenes (384-322 BC) </p><p>This thesis has addressed several essential and fundamental problems of querying and maintaining XML data - maintenance of document ordering, efficient evaluation of structural join, intrinsic skew handling on sort-merge join, succinct storage and update synchronization. </p><p>Our first contribution in this thesis was an attempt to focus on order maintenance of highly dynamic XML databases. While many proposed schemes solved the or- der determination issue, we focused on approaches that can update such schemes efficiently. We presented a theoretical optimal approach that guarantees the worst case time bound and we presented a practical approach that works well on practice. Solving this efficiently greatly differentiates between an XML database management system and an XML repository. </p><p>Similarly for the ordering maintenance problem, a theoretically optimal bound for both structural join and sort-merge join were discovered. Our next contribution in this thesis presented several improvements on structural joins, that can gain an order of magnitude improvement in performance (or even more in practice), without using any prebuilt indices. We also cover subtleties of the effects of intrinsic skew on sort-merge join, and cover the techniques that minimize the performance degrade when intrinsic skew occurs, while applying the similar idea of improving structural join to improve the merge phrase of sort-merge join. The improvements benefit not only native and relational XML database systems, but also relational data in general. </p><p>The main contribution of this thesis illustrated that it is possible to dehver theo- retically fast insertions, updates, structural queries, order determination, and path navigations on XML data while keeping the structure itself succinct, using only asymptotically optimal space — under all adverse conditions. Theoretically optimal space means optimal cache locaUty and low disk read, which implies improvements on access time using secondary storage is substantial. By separating the structural information, we demonstrated that the topology is small enough to be held entirely in primary storage permanently. We established the finding on both theoretical and experimental point of view and showed redundant information, such as node identifiers and extra indices, can be built on top of the structure itself. </p><p>The final contribution of this thesis, differing from our previous contributions men- tioned above is: while we addressed the previous problems in an optimal matter, here we solve the practical XML synchronization problem between large numbers of remote cHents. We approached the problem of cUent updates by simphfying the hard XPath containment problem which the full set is undecidable into a subclass that is applicable on practice. </p><p>As a final note, there are recent advances of self indexing structures that allow the classic dictionary problem to be encoded with size close to the entropy of the text while allowing efficient operation. Those techniques exploit the relationship between the Burrows-Wheeler Transform and Suffix Array, which the former a well known for compression and the later is known for information retrieval. The future research direction would be applying those techniques to future improve the practical, not theoretical space and time bound to implement XML DBMS succinctly. The theme of this thesis surrounds the different problems that arise when deahng with highly dynamic XML data. While the problems we investigated are fundamen- tal, these low level problems serve as stepping stones to higher level optimization problems for further XML database system research. Bibliography </p><p>1] Dblp bibliography. See hUp://www.informatik.uni-trier.de/'"ley/db/. </p><p>2] XPath: Looking Forward, volume 2490 of Lecture Notes in Computer Science. Springer, 2002. </p><p>3] Serge Abiteboul. Querying semi-structured data. In Proceedings of ICDT, </p><p> volume 1186 of Lecture Notes in Computer Science, pages 1-18, Delphi, Greece, </p><p>8-10 January 1997. Springer. </p><p>4] Serge Abiteboul, Haim Kaplan, and Tova Milo. Compact labeling schemes </p><p> for ancestor queries. In Proceedings of the twelfth annual ACM-SIAM sympo-</p><p> sium on Discrete algorithms, pages 547-556. Society for Industrial and Applied </p><p>Mathematics, 2001. </p><p>5] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik. Broadcast disks: Data </p><p> management for asymmetric communication environments. In Proceedings of </p><p>ACM SIGMOD International Conference on Management of Data, May 1995. </p><p>6] M.K. Aguilera, R.E. Strom, D.C. Sturman, M. Astley, and T.D. Chandra. </p><p>Matching events in a content-based subscription system. In Proceedings of </p><p>ACM PODC, pages 53-61, 1999. </p><p>7] Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, and Jignesh M. Patel. Struc-</p><p> tural joins: A primitive for efficient XML query pattern matching. In Proceed-</p><p> ings of the 18th International Conference on Data Engineering (ICDE), pages </p><p>141-153. IEEE Computer Society, 2002. 8] M. Aitine! and M.J. Franklin. Efficient filtering of xml documents for selective </p><p> dissemination of information. In Proceedings of the 26th VLDB Conference, </p><p> pages 53- 64, 2000. </p><p>9] Toshiyuki Amagasa, Masatoshi Yoshikawa, and Shunsuke Uemura. QRS: A </p><p>Robust Numbering Scheme for XML Documents. In Proceedings of IEEE ICDE, </p><p>March 2003. </p><p>10] Rolf Apweiler, Amos Bairoch, and Cathy H. Wu. Protein sequence databases. </p><p>Current Opinion in Chemical Biology, 8:76-80, 2004. </p><p>11] Michael A. Bender, Richard Cole, Erik D. Demaine, Martin Farach-Colton, and </p><p>Jack Zito. Two simplified algorithms for maintaining order in a list. In Pro-</p><p> ceedings of the 10th Annual European Symposium on Algorithms (ESA 2002), </p><p> volume 2461 of Lecture Notes in Computer Science, pages 152-164, Rome, Italy, </p><p>September 17-21 2002. </p><p>12] M. Benedikt, W. Fan, and G. Kuper. Structural properties of xpath fragments, </p><p>2003. </p><p>13] David Benoit, Erik D. Demaine, J. Ian Munro, and Venkatesh Raman. Repre-</p><p> senting Trees of Higher Degree. In Proceedings of 6th Workshop on Algorithms </p><p> and Data Structures (WADS), volume 1663 of Lecture Notes in Computer Sci-</p><p> ence (LNCS), pages 169-180, 1999. </p><p>14] M. W. Blasgen and K. R Eswaran. Storage and access in relational databases. </p><p>IBM System Journal 4, 16(4):363, 1977. </p><p>15] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, and Eve Maler. Extensible </p><p>Markup Language (XML) 1.0 (second edition). http://www.w3.org/TR/2000/ </p><p>REC-xml-20001006, 2000. </p><p>16] Nicolas Bruno, Nick Koudas, and Divesh Srivastava. Holistic twig joins: opti-</p><p> mal XML pattern matching. In Michael J. Franklin, Bongki Moon, and Anas-</p><p> tassia Ailamaki, editors. Proceedings of the 2002 ACM SIGMOD international </p><p> conference on Management of data, pages 310-321. ACM Press, 2002. 17] Peter Buneman, Martin Grohe, and Christoph Koch. Path Queries on Com-</p><p> pressed XML. In Proceedings of the 29th International Conference on Very </p><p>Large Databases (VLDB), pages 141-152. Morgan Kaufmann, 2003. </p><p>18] OnHne Computer Library Center. Introduction to the Dewey Decimal Classifi-</p><p> cation. http://www.oclc.org/oclc/fp/about/about_the_ddc.htm. </p><p>19] C.Y. Chan, P. Felber, M.N. Garofalakis, and R. Rastogi. Efficient filtering of </p><p> xml documents with xpath expressions. In Proceedings of IEEE International </p><p>Conference on Data Engineering, February 2002. </p><p>20] Shu-Yao Chien, Zografoula Vagena, Donghui Zhang, Vassihs J. Tsotras, and </p><p>Carlo Zaniolo. Efficient structural joins on indexed XML documents. In VLDB </p><p>Conference, pages 263- 274, Berlin, Germany, 2002. </p><p>21] David R. Clark and J. Ian Munro. Efficient suffix trees on secondary storage. </p><p>In Proceedings of the 7th Annual Symposium on Discrete Algorithms (SODA), </p><p> pages 383-391. SIAM, 1996. </p><p>22] E. F. Codd. A relational model of data for large shared data banks. Commun. </p><p>ACM, 13(6):377-387, 1970. </p><p>23] Edith Cohen, Haim Kaplan, and Tova Milo. Labeling Dynamic XML Trees. In </p><p>PODS Conference, pages 271 281, New York, June 3-5 2002. ACM Press. </p><p>24] A. Crespo, O. Buyukkokten, and H. Garcia-Molina. Efficient query subscription </p><p> processing in a multicast environment. Technical report, Stanford University, </p><p>1999. </p><p>25] A. Demers, K. Petersen, M. Spreitzer, D. Terry, M. Theimer, and B. Welch. </p><p>The bayou architecture: Support for data sharing among mobile users. In </p><p>Proceedings of the Workshop on Mobile Computing Systems and Applications, </p><p>1994. </p><p>26] Kurt Deschler and Elke Rundenstiner. MASS: A Multi-Axis Storage Struc-</p><p> ture for Large XML Documents. In To appear in Proceedings of the Twelfth </p><p>International Conference on Information and Knowledge Management, 2003. 27] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A query language for XML. Computer Networks (Amsterdam, Netherlands: 1999), 31(11 16):1155 1169, May 1999. </p><p>28] Alin Deutsch and Val Tannen. Containment of regular path expressions under integrity constraints. In Knowledge Representation Meets Databases, 2001. </p><p>29] David J DeWitt, Randy H Katz, Frank Olken, Leonard D Shapiro, Michael R Stonebraker, and David Wood. Implementation techniques for main memory database systems. In Proceedings of the 1984 ACM SIGMOD international conference on Management of data, pages 1-8. ACM Press, 1984. </p><p>30] David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, and S. Seshadri. Practical skew handhng in parallel joins. In VLDB, 1992. </p><p>31] P. Dietz and D. Sleator. Two algorithms for maintaining order in a hst. In Proceedings of the nineteenth annual ACM conference on Theory of computing, pages 365 372. ACM Press, 1987. </p><p>32] P. F. Dietz, J. I. Seiferas, and J. Zhang. A tight lower bound for on-hne monotonic list labeling. Lecture Notes in Computer Science, 824:131-142, 1994. </p><p>33] Paul F. Dietz. Maintaining order in a hnked list. In Proceedings of the fourteenth annual ACM symposium on Theory of computing, pages 122-127, 1982. </p><p>34] Mary Fernández, Yana Kadiyska, Dan Suciu, Atsuyuki Morishima, and Wang- Chiew Tan. SilkRoute: A framework for publishing relational data in XML. ACM Transactions on Database Systems, 27(4):438~493, December 2002. </p><p>35] Damien K. Fisher, Franky Lam, Wilham M. Shui, and Raymond K. Wong. Fast ordering for changing XML data. Technical Report UNSW-CSE-0317, School of CSE, University of New South Wales, Sydney, Austraha, 2003. </p><p>36] D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the world- wide web: A survey. SIGMOD Record, 27(3):59-74, 1998. </p><p>37] Daniela Florescu and Donald Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(l):27-34, March 1999. 38] Richard F. Geary, Rajeev Raman, and Venkatesh Raman. Succinct ordinal trees with level-ancestor queries. In Proceedings of the 15th Annual Symposium on Discrete Algorithms (SODA), pages 1-10. SIAM, 2004. </p><p>39] R. Goldman, J. McHugh, and J. Widom. From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. In Workshop on the Web and Databases (WebDB '99), pages 25 30, 1999. </p><p>40] Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, and Hec- tor Garcia-Molina. Proximity Search in Databases. In VLDB Conference, pages 26-37. Morgan Kaufmann Publishers, 1998. </p><p>41] Roy Goldman and Jennifer Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In VLDB Conference, pages 436 445, August 1998. </p><p>42] Roy Goldman and Jennifer Widom. Summarizing and Searching Sequential Semistructured Sources. Technical Report, Stanford University, 2000. </p><p>43] Goetz Graefe. Query evaluation techniques for large databases. In ACM Com- puting Surveys, June 1993. </p><p>44] Torsten Grust. Accelerating XPath location steps. In Proceedings of the 2002 ACM SIC MOD international conference on Management of data, pages 109- 120. ACM Press, 2002. </p><p>45] Alan Halverson, Josef Burger, Leónidas Galanis, Ameet Kini, Rajasekar Kr- ishnamurthy, Ajith Nagaraja Rao, Feng Tian, Stratis Vigías, Yuan Wang, Jef- frey F. Naughton, and David J. DeWitt. Mixed Mode XML Query Processing. In Proceedings of the 29th International Conference on Very Large Databases (VLDB), pages 225-236. Morgan Kaufmann, 2003. </p><p>46] Kien A. Hua and Chiang Lee. Handling data skew in multiprocessor database computers using partition tuning. In VLDB, 1991. </p><p>•47] Guy Jacobson. Succinct Static Data Structures. PhD thesis, Carnegie Mellon </p><p>University, 1988. 48] Guy Jacobson. Space-efficient static trees and graphs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE Computer Society, 1989. </p><p>;49] H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, and C. Yu. TIMBER: A native XML database. VLDB Journal: Very Large Data Bases, 11(4):274-291, 2002. </p><p>50] Haifeng Jiang, Hongjun Lu, Wei Wang, and Beng Chin Ooi. XR-Tree: Index- ing XML Data for Efficient Structural Join. In ICDE, pages 253-263. IEEE Computer Society, 2003. </p><p>51] Carl-Christian Kanne and Guido Moerkotte. Efficient storage of XML data. In Proceedings of the 16th International Conference on Data Engineering (ICDE), page 198. IEEE Computer Society, 2000. </p><p>52] Haim Kaplan, Tova Milo, and Ronen Shabo. A comparison of labeling schemes for ancestor queries. In Proceedings of the thirteenth annual ACM-SI AM sympo- sium on Discrete algorithms, pages 954-963. Society for Industrial and Applied Mathematics, 2002. </p><p>53] Jyrki Katajainen and Erkki Makinen. Tree compression and optimization with applications. In International Journal of Foundations of Computer Science (FOCS), Vol. i, pages 425-447. IEEE Computer Society, 1990. </p><p>54] P.J. Keleher and U. Cetintemel. Consistency management in deno. Journal on Special Topics in Mobile Networking and Applications (MONET), 1999. </p><p>55] Won Kim. A new way to compute the product and join of relations. In Pro- ceedings of the 1998 ACM SIC MOD international conference on Management of data. ACM Press, 1980. </p><p>56] W. Ehot Kimber. HyTime and SGML: Understanding the HyTime HYQ Query Language. Technical Report Version 1.1, IBM Corporation, August 1993. 57] Pranky Lam, William M. Shui, Damien K. Fisher, and Raymond K. Wong. Skipping Strategies for Efficient Structural Joins. In Proceedings of the 9th In- ternational Conference on Database Systems for Advanced Applications (DAS- FA A), pages 196 207. Springer, 2004. </p><p>58] Yong Kyu Lee, Seong-Joon Yoo, Kyoungro Yoon, and P. Bruce Berra. Index structures for structured documents. In ACM International Conference on Digital Libraries^ pages 91-99, 1996. </p><p>59] Alberto Lerner and Dennis Shasha. AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments. In VLDB Conference, to appear, 2003. </p><p>60] Quanzhong Li and Bongki Moon. Indexing and querying XML data for regular path expressions. In Proceedings of the 27th International Conference on Very Large Databases (VLDB), pages 361-370. Morgan Kaufmann, 2001. </p><p>61] Hartmut Liefke. Horizontal query optimization on ordered semistructured data. In WebDB (Informal Proceedings), pages 61-66, 1999. </p><p>'62] Hartmut Liefke and Dan Suciu. XMill: an efficient compressor for XML data. In Proceedings of the 2000 ACM SIC MOD international conference on Man- agement of data, pages 153-164. ACM Press, 2000. </p><p>63] S. Mahajan, M.J. Donahoo, S.B. Navathe, M. Ammar, and S. Malik. Grouping techniques for update propagation in intermittently connected databases. In Proceedings of the IEEE International Conference on Data Engineering, Febru- ary 1998. </p><p>64] Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Build- ing a large annotated corpus of Enghsh: the Penn Treebank. Computational Linguistics, 19, 1993. </p><p>"65] Norman May, Sven Helmer, Carl-Christian Kanne, and Guido Moerkotte. XQuery Processing in Natix with an Emphasis on Join Ordering. In Proceedings of the 1st International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P), pages 49-54, 2004. 66] J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Wid om. Lore: A database management system for semistructured data. SIGMOD Record^ 26(3):54-66, September 1997. </p><p>67] Jason McHugh and Jennifer Widom. Query optimization for XML. In Mal- colm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scot- land, UK^ pages 315-326. Morgan Kaufmann, 1999. </p><p>68] MEDLINE, http://www.nlm.nih.gov/bsd/licensee/data_elements_doc.html. </p><p>69] G. Miklau and D. Suciu. Containment and equivalence of xpath expressions. In Proceedings of ACM Principles of Database Systems (PODS), 2002, to appear. </p><p>70] Jun-Ki Min, Myung-Jae Park, and Chin-Wan Chung. XPRESS: A Queriable Compression for XML Data. In Proceedings of the 2003 ACM SIGMOD in- ternational conference on Management of data, pages 122-133. ACM Press, 2003. </p><p>71] J. Ian Munro and Venkatesh Raman. Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOGS), pages 118-126. IEEE Computer Society, 1997. </p><p>72] J. Ian Munro, Venkatesh Raman, and Adam J. Storm. Representing Dynamic Binary Trees Succinctly. In Proceedings of the 12th Annual Symposium on Discrete Algorithms (SODA), pages 529-536. SIAM, 2001. </p><p>73] Frank Neven and Thomas Schwentick. Xpath containment in the presence of disjunction, dtds, and variables. In Proceedings of the 9th International Conference on Database Theory, pages 315-329. Springer-Verlag, 2002. </p><p>74] Robert M. Pecherer. Efficient exploration of product spaces. In James B. Roth- nie, editor. Proceedings of the 1976 ACM SIGMOD international conference on Management of data, pages 169-177. ACM Press, 1976. 75] Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct dynamic data structures. In Proceedings of the 7th International Workshops on Algorithms and Data Structures (WADS), volume 2125 of LNCS, pages 426 437, 2001. </p><p>76] Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Pro- ceedings of the 13th Annual Symposium on Discrete Algorithms (SODA), pages 233-242. SIAM, 2002. </p><p>77] Rajeev Raman and S. Srinivasa Rao. Succinct dynamic dictionaries and trees. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP), pages 345 356. Springer, 2003. </p><p>78] Gerard Saltón and Michael J. McGill. Introduction to Modern Information Retrieval McGraw-Hill, New York, 1983. </p><p>79] Albrecht Schmidt, Florian Waas, Martin L. Kersten, Michael J. Carey, loana Manolescu, and Ralph Busse. Assessing XML data management with XMark. In Stéphane Bressan, Akmal B. Chaudhri, Mong-Li Lee, Jeffrey Xu Yu, and Zoé Lacroix, editors, EEXTT, volume 2590 of Lecture Notes in Computer Science, pages 144 145. Springer, 2003. </p><p>80] Adam Silberstein, Hao He, Ke Yi, and Jun Yang. BOXes: Efficient mainte- nance of order-based labehng for dynamic XML data. In the 21st International Conference on Data Engineering (ICDE), 2005. Accepted, yet to appear. </p><p>81] I. Tatarinov, Z.G. Ives, A.Y. Halevy, and D.S. Weld. Updating XML. In </p><p>SIGMOD Conference, 2001. </p><p>82] Igor Tatarinov, Stratis D. Vigías, Kevin Beyer, Jayavel Shanmugasundaram, Eugene Shekita, and Chun Zhang. Storing and querying ordered XML using a relational database system. In Proceedings of the 2002 ACM SIGMOD in- ternational conference on Management of data, pages 204-215. ACM Press, 2002. 83] Pankaj M. Tolani and Jayant R. Haritsa. XGRIND: A query-friendly XML compressor. In Proceedings of the 18th International Conference on Data En- gineering (ICDE), pages 225-234. IEEE Computer Society, 2002. </p><p>84] W3C Recommendation. XML Path Language (XPath) Version 1.0. http:// www.w3.org/TR/xpath, November 1999. </p><p>85] W3C Working Draft. XML Path Language (XPath) 2.0. http://www.w3.org/ TR/2002/WD-xpath20-20021115, November 2002. </p><p>86] W3C Working Draft. XQuery 1.0: An XML Query Language, http://www.w3. org/TR/2002/WD-xquery-20021115, November 2002. </p><p>87] Christopher B. Walton, Alfred G. Dale, and Roy M. Jenevein. A taxonomy and performance model of data skew effects in parallel joins. In VLDB, 1991. </p><p>88] Dengfeng Gao Wei Li and Richard T. Snodgrass. Skew handling techniques in sort-merge join. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM Press, 2002. </p><p>89] Dan E. Willard. Good Worst-Case Algorithms for Inserting and Deleting Records in Dense Sequential Files. In Carlo Zaniolo, editor. Proceedings of the 1986 ACM SIGMOD international conference on Management of data, pages 251-260. ACM Press, 1986. </p><p>90] R.K. Wong. The extended xql for querying and updating large xml databases. In </p><p>Proceedings of ACM Symposium on Document Engineering (DocEng), Novem-</p><p> ber 2001. </p><p>91] Peter T. Wood. On the equivalence of xml patterns. In Proceedings of the First </p><p>International Conference on Computational Logic, pages 1152-1166. Springer-</p><p>Verlag, 2000. </p><p>92] Peter T. Wood. Containment for xpath fragments under dtd constraints. In </p><p>Proceedings of the 9th International Conference on Database Theory, pages </p><p>300-314. Springer-Verlag, 2002. 93] Xiaodong Wu, Mong Li Lee, and Wynne Hsu. A Prime Number Labeling Scheme for Dynamic Ordered XML Trees. In To appear in Proceedings of the twentieth International Conference on Data Engineering, 2004. </p><p>94] T.W. Yan and H. Garcia-Molina. Index structures for selective dissemination of information under the boolean model. ACM TODS, 19(2):332 364, June 1994. </p><p>95] W.G. Yee, E. Omiecinski, M.J. Donahoo, and S.B. Navathe. Scaling replica maintenance in intermittently synchronized mobile databases. In Proceedings of ACM CIKM, pages 450 457, 2001. </p><p>96] Masatoshi Yoshikawa, Toshiyuki Amagasa, Takeyuki Shimura, and Shunsuke Uemura. XRel: a path-based approach to storage and retrieval of xml docu- ments using relational databases. ACM Transactions on Internet Technology (TOIT), 1(1):110 141, 2001. </p><p>97] Chun Zhang, Jeffrey Naughton, David DeWitt, Qiong Luo. and Guy Lohman. On supporting containment queries in relational database management sys- tems. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pages 425 436. ACM Press, 2001. </p><p>98] Ning Zhang, Varun Kacholia, and M. Tamer Ozsu. A Succinct Physical Storage Scheme for Single-Pass Evaluation of Next-of-Kin Path Queries in XML. In Proceedings of the 20th International Conference on Data Engineering (ICDE), pages 54-65. IEEE Computer Society, 2004. \r> </p> </div> </article> </div> </div> </div> <script type="text/javascript" async crossorigin="anonymous" src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8519364510543070"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.1/jquery.min.js" crossorigin="anonymous" referrerpolicy="no-referrer"></script> <script> var docId = '7eb8ebec4dce0c7301f2eb4e44b73678'; var endPage = 1; var totalPage = 188; var pfLoading = false; window.addEventListener('scroll', function () { if (pfLoading) return; var $now = $('.article-imgview .pf').eq(endPage - 1); if (document.documentElement.scrollTop + $(window).height() > $now.offset().top) { pfLoading = true; endPage++; if (endPage > totalPage) return; var imgEle = new Image(); var imgsrc = "//data.docslib.org/img/7eb8ebec4dce0c7301f2eb4e44b73678-" + endPage + (endPage > 3 ? ".jpg" : ".webp"); imgEle.src = imgsrc; var $imgLoad = $('<div class="pf" id="pf' + endPage + '"><img src="/loading.gif"></div>'); $('.article-imgview').append($imgLoad); imgEle.addEventListener('load', function () { $imgLoad.find('img').attr('src', imgsrc); pfLoading = false }); if (endPage < 7) { adcall('pf' + endPage); } } }, { passive: true }); </script> <script> var sc_project = 11552861; var sc_invisible = 1; var sc_security = "b956b151"; </script> <script src="https://www.statcounter.com/counter/counter.js" async></script> </html>