Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases*
Total Page:16
File Type:pdf, Size:1020Kb
Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases* Reinhard Braumandl Jens Claussen Alfons Kemper Lehrstuhlfir Informatik, Universitiit Passau,94030 Passau,Germany (lastname) @db.fmi. uni-passau.de - http://www.db.fmi.uni-passau.de/ 1 Introduction Abstract Inter-object references are one of the key concepts of object-relational and object-oriented database systems. These references allow to directly traverse from one ob- Previous work on functional joins was constrained ject to its associated (referenced) object(s). This is very in two ways: (1) all approaches we know as- efficient for navigating within a limited context-so-called sume references being implemented as physical “pointer chasing”-applications. However, in query process- object identifiers (OIDs) and (2) most approaches ing a huge number of these inter-object references has to be are, in addition, limited to single-valued refer- traversed to evaluate so-called functional joins. Therefore, ence attributes. Both are severe limitations since naive pointer chasing techniques cannot yield competitive most object-relational and all object-oriented da- performance. Consequently, several researchers have in- tabase systems do support nested reference sets vestigated more advanced pointer join techniques to opti- and many object systems do implement references mize the functional join. First of all, there are approaches as location-independent (logical) OIDs. In this to materialize (i.e., precompute) the functional joins in the work, we develop a new functional join algo- form of generalized join indices [Val87]. [BK89] was the rithm that can be used for any realization form first proposal for indexing path expressions, and the access for OIDs (physical or logical) and is particu- support relations [KM901 are another systematic approach larly geared towards supporting functional joins for materializing the functional join along arbitrarily long along nested reference sets. The algorithm can path expressions; it was later augmented to join index hier- be applied to evaluate joins along arbitrarily long archies [XH94]. Many more proposals exist by now. path expressions which may include one or more [SC901 were the first who systematically evaluated three reference sets. The new algorithm generalizes pointer join techniques (naive, sorting, and hash partition- previously proposed partition-based pointer joins ing) in comparison to a value-based join. Their work by repeatedly applying partitioning with inter- was augmented by [DLM93] to parallel database systems. leaved re-merging before evaluating the next func- [DLM93] also allowed nested sets of references, but they tional join. Consequently, the algorithm is termed did not report on how to re-establish the grouping of the P(PM)*M where P stands for partitioning and M nested sets after performing the functional join. [CSL+90] denotes merging. Our prototype implementation incorporated some of the key ideas of pointer joins in the as well as an analytical assessmentbased on a cost Starburst extensible database system. The emphasis in this model prove that this new algorithm performs su- work was on supporting hierarchical structures, i.e., one- perior in almost all database configurations. to-many relationships, with hidden pointers. [GGT96] con- centrate on finding the optimal evaluation order for chains of functional joins. Thus it complements our work: We *This work was supported in part by the German National Research Foundation DFG under contracts Ke 401/6-2 and Ke 40117-I. devise an algorithm to efficiently evaluate a functional join Permission to copy without,fee ull or part of this materiul is grunted pro- within the query engine while they are concerned with find- vided that the copies crre not made or distributed for direct commercial ing the best evaluation order. advuntuge, the VLDB copyright notice und the title qf the public&m and Unfortunately, the previous work on functional joins irs dute appear; and notice is given rhcrt copying is by permission of the Very Large Dutu Base Endowment. 7i1 copy otherwise, or to republish, was constrained in two ways: (1) all approaches we know requires u,fee an&r special permission,from the Endowment. assume references being implemented as physical object Proceedings of the 24th VLDB Conference identifiers (OIDs) and (2) most approaches are, in addition, New York, USA, 1998 limited to single-valued reference attributes. Both are se- 110 vere limitations since most object-relational and all object- The remainder of this paper is organized as follows: In oriented database systems do support nested reference sets Section 2 we give a short overview of physical and logical for modelling many-to-many and one-to-many object asso- OID realization techniques. Section 3 overviews the known ciations and many object systems do implement references algorithms and introduces our new P(PM)*M-algorithm. as location-independent (logical) OIDs. In this work, we Section 4 explains the integration of our algorithm into develop a new functional join algorithm that can be used our iterator-based query engine and gives an initial perfor- for any realization form for OIDs (physical or logical) and mance comparison. In Section 5 we present a more com- is particularly geared towards supporting functional joins prehensive performance analysis based on a cost model by along reference sets. The algorithm can be applied to evalu- comparing the algorithms for different database configura- ate joins along arbitrarily long path expressions which may tions. Section 6 concludes the paper. include one or more reference sets. The new algorithm gen- eralizes previously proposed partition-based pointer joins 2 Realization of Object Identifiers by repeatedly applying partitioning with interleaved re- Object identity is a fundamental concept to enable object merging before evaluating the next functional join. Conse- referencing in object-oriented and object-relational data- quently, the algorithm is termed P(PM)*M where P stands base systems. Each object has a unique object identifier for partitioning and M denotes a merging. (OID) that remains unchanged throughout the object’s life Before going into the technicalities let us motivate our time. There are two basic implementation concepts for work by way of a simplified order-inventory example ap- OIDs: physical OIDs and logical OIDs [KC86]. plication. In an object-oriented or object-relational schema the LineItems that model the many-to-many relationship 2.1 Physical Object Identifiers between Orders and the ordered Products would most natu- rally be modelled as a nested set within the Order objects:’ Physical OIDs contain parts of the initial permanent ad- create type Order as ( create type Product as ( dress of an object, e.g., the page identifier. Based on this OrderNumbernumber, ProductID number, information, an object can be directly accessed on a data LineItems set(tuple( Quantity number, Cost number, page. This direct access facility is advantageous as long ProductRef ref(Product))) . ); as the object is in fact stored at that address. Updates to . ..). the database may require, however, that objects are moved Here, Order refers to the ordered Products via a nested set to other pages. In this case, a place holder (forward ref- of references in attribute Lineltems. Let Orders be a rela- erence) is stored at the original location of the object that tion (or type extension) storing Order objects. Then, in an holds the new address of the object. When a moved object example query we could retrieve the Orders’ total values: is referenced, two pages are accessed: the original page select o.OrderNumber,(select sum(l.Quantity * 1.ProductRef.Cost) containing the forward and the page actually carrying the from o.LineItems I)) object. With increasing number of forwards, the perfor- from Orders o; mance of the DBMS gradually degrades, at some point Logically, the query starts at each Order o and traverses making reorganization inevitable. 02 [02T94], Object- via the nested set of references to all the ordered Prod- Store [LLOW91], and (presumably) Illustra [Sto96, p. 571 ucts to retrieve the Cost from which the total cost of the are examples of commercial systems using physical OIDs. Order is computed. Note that, unlike in a pure (flat) re- lational schema, the nested set of LineItems constitutes an 2.2 Logical Object Identifiers explicit grouping of the Lineltems belonging to one Order Logical OIDs do not contain the object address and are thus that is, in most systems, also maintained at the physical location independent. To find an object by OID, however, level. Our new algorithm exploits this physically main- an additional mapping structure is required to map the logi- tained grouping. However, we do, of course, avoid the cal OID to the physical address of the object. If an object is danger of “thrashing” that is inherent in a naive nested moved to a different address, only the entry in the mapping loops pointer chasing approach. Our prototype implemen- structure is updated. In the following, we describe three tation as well as a comprehensive analytical assessment data structures for the mapping. [EGK95] give details and based on a cost model prove that this new algorithm per- a performance comparison. forms superior in almost all configurations. In particular, our P(PM)*M-algorithm performs very well even for small 2.2.1 Mapping with a B+-Tree memory sizes. The logical OID serves as key to access the tree entry con- ‘Throughoutthe paperwe will use some(pseudo) SQL syntaxthat taining the actual object address (cf. Figure 1 (a)). In this is closeto the commercialORDBMS product that we usedfor compari- graph, a letter represents a logical OID and a number de- sonpurposes. Unfortunately, the commercialORDBMS products do not notes the physical address of the corresponding object (e.g., entirelyobey the SQL3standard. Length limitations prevent us from ad- ditionally showingthe standardizedODMG object types and OQL queries the object identified by a is stored at address 6).