Generalized Search Trees for Database Systems
Joseph M. Hellerstein, Je rey F. Naughton, Avi Pfe er
fhellers, [email protected], [email protected] erkeley.edu
June, 1995
Abstract
This pap er intro duces the Generalized SearchTree (GiST), an index structure supp orting an extensi-
ble set of queries and data typ es. The GiST allows new data typ es to b e indexed in a manner supp orting
queries natural to the typ es; this is in contrast to previous work on tree extensibility which only sup-
p orted the traditional set of equality and range predicates. In a single data structure, the GiST provides
all the basic search tree logic required by a database system, thereby unifying disparate structures such
as B+-trees and R-trees in a single piece of co de, and op ening the application of search trees to general
extensibility.
To illustrate the exibility of the GiST, we provide simple metho d implementations that allowitto
b ehave like a B+-tree, an R-tree, and an RD-tree, a new index for data with set-valued attributes. We
also present a preliminary p erformance analysis of RD-trees, which leads to discussion on the nature of
tree indices and how they b ehave for various datasets.
1 Intro duction
An ecient implementation of search trees is crucial for any database system. In traditional relational
systems, B+-trees [Com79] were sucient for the sorts of queries p osed on the usual set of alphanumeric
data typ es. Today, database systems are increasingly b eing deployed to supp ort new applications suchas
geographic information systems, multimedia systems, CAD to ols, do cument libraries, sequence databases,
ngerprint identi cation systems, bio chemical databases, etc. To supp ort the growing set of applications,
search trees must b e extended for maximum exibility. This requirement has motivated two ma jor research
approaches in extending search tree technology:
1. SpecializedSearch Trees: A large variety of search trees has b een develop ed to solve sp eci c problems.
Among the b est known of these trees are spatial search trees such as R-trees [Gut84]. While some of
this work has had signi cant impact in particular domains, the approach of developing domain-sp eci c
search trees is problematic. The e ort required to implement and maintain such data structures is
high. As new applications need to b e supp orted, new tree structures have to b e develop ed from scratch,
requiring new implementations of the usual tree facilities for search, maintenance, concurrency control
and recovery.
2. Search Trees For Extensible Data Types: As an alternative to developing new data structures, existing
data structures such as B+-trees and R-trees can be made extensible in the data typ es they sup-
p ort [Sto86]. For example, B+-trees can b e used to index any data with a linear ordering, supp orting
equality or linear range queries over that data. While this provides extensibility in the data that can
b e indexed, it do es not extend the set of queries which can b e supp orted by the tree. Regardless of the
typ e of data stored in a B+-tree, the only queries that can b ene t from the tree are those containing
equality and linear range predicates. Similarly in an R-tree, the only queries that can use the tree are
those containing equality, overlap and containment predicates. This in exibility presents signi cant 1
problems for new applications, since traditional queries on linear orderings and spatial lo cation are
unlikely to b e p ertinent for new data typ es.
In this pap er we present a third direction for extending search tree technology. Weintro duce a new data
structure called the Generalized SearchTree (GiST), which is easily extensible b oth in the data typ es it can
index and in the queries it can supp ort. Extensibility of queries is particularly imp ortant, since it allows
new data typ es to b e indexed in a manner that supp orts the queries natural to the typ es. In addition to
providing extensibility for new data typ es, the GiST uni es previously disparate structures used for currently
common data typ es. For example, b oth B+-trees and R-trees can b e implemented as extensions of the GiST,
resulting in a single co de base for indexing multiple dissimilar applications.
The GiST is easy to con gure: adapting the tree for di erent uses only requires registering six methods
with the database system, which encapsulate the structure and b ehavior of the ob ject class used for keys in
the tree. As an illustration of this exibility,we provide metho d implementations that allow the GiST to b e
used as a B+-tree, an R-tree, and an RD-tree, a new index for data with set-valued attributes. The GiST
can b e adapted to work likeavariety of other known search tree structures, e.g. partial sum trees [WE80],
+
k-D-B-trees [Rob81], Ch-trees [KKD89], Exo dus large ob jects [CDG 90], hB-trees [LS90], V-trees [MCD94],
TV-trees [LJF94], etc. Implementing a new set of metho ds for the GiST is a signi cantly easier task than
+
implementing a new tree package from scratch: for example, the POSTGRES [Gro94] and SHORE [CDF 94]
implementations of R-trees and B+-trees are on the order of 3000 lines of C or C++ co de each, while our
metho d implementations for the GiST are on the order of 500 lines of C co de each.
In addition to providing an uni ed, highly extensible data structure, our general treatment of search trees
sheds some initial light on a more fundamental question: if any dataset can b e indexed with a GiST, do es
the resulting tree always provide ecient lo okup? The answer to this question is \no", and in our discussion
we illustrate some issues that can a ect the eciency of a search tree. This leads to the interesting question
of how and when one can build an ecient search tree for queries over non-standard domains | a question
that can now b e further explored by exp erimenting with the GiST.
1.1 Structure of the Pap er
In Section 2, we illustrate and generalize the basic nature of database search trees. Section 3 intro duces the
Generalized SearchTree ob ject, with its structure, prop erties, and b ehavior. In Section 4 we provide GiST
implementations of three di erent sorts of search trees. Section 5 presents some p erformance results that
explore the issues involved in building an e ective search tree. Section 6 examines some details that need to
b e considered when implementing GiSTs in a full- edged DBMS. Section 7 concludes with a discussion of
the signi cance of the work, and directions for further research.
1.2 Related Work
A go o d survey of search trees is provided byKnuth [Knu73], though B-trees and their variants are covered in
more detail by Comer [Com79]. There are a varietyofmultidimensional search trees, such as R-trees [Gut84]
and their variants: R*-trees [BKSS90] and R+-trees [SRF87]. Other multidimensional search trees include
quad-trees [FB74], k-D-B-trees [Rob81], and hB-trees [LS90]. Multidimensional data can also b e transformed
into unidimensional data using a space- lling curve [Jag90]; after transformation, a B+-tree can b e used to
index the resulting unidimensional data.
Extensible-key indices were intro duced in POSTGRES [Sto86, Aok91], and are included in Illustra [Sto93],
b oth of whichhave distinct extensible B+-tree and R-tree implementations. These extensible indices allow
manytyp es of data to b e indexed, but only supp ort a xed set of query predicates. For example, POSTGRES
B+-trees supp ort the usual ordering predicates (<; ; =; ;>), while POSTGRES R-trees supp ort only the
predicates Left, Right, OverLeft, Overlap, OverRight, Right, Contains, Contained and Equal [Gro94].
Extensible R-trees actually provide a sizable subset of the GiST's functionality. To our knowledge this
pap er represents the rst demonstration that R-trees can index data that has not b een mapp ed into a spatial
domain. However, b esides their limited extensibility R-trees lacka numb er of other features supp orted by 2 key1 key2 ...
Internal Nodes (directory)
Leaf Nodes (linked list)
Figure 1: Sketch of a database search tree.
the GiST. R-trees provide only one sort of key predicate (Contains), they do not allow user sp eci cation of
the PickSplit and Penalty algorithms describ ed b elow, and they lack optimizations for data from linearly
ordered domains. Despite these limitations, extensible R-trees are close enough to GiSTs to allow for the
initial metho d implementations and p erformance exp eriments we describ e in Section 5.
Classi cation trees such as ID3 and C4.5 [Qui93] are similar in spirit to the GiST, but have some ma jor
di erences. Most signi cantly, classi cation trees are not intended as search structures, and are not suited
for indexing large amounts of data. Additionally, ID3 and C4.5 are built over pre-de ned, discrete classes,
whereas the GiST is a dynamic structure that develops its own, unsp eci ed data buckets (leaves), with
p otentially overlapping keys. Finally, classi cation trees are not generally extensible: they are de ned
over traditional tuple-like ob jects with attributes from numeric or nitely enumerated domains. However,
extensions of the GiST may b e able to leverage some of the lessons learned in the development of classi cation
trees, particularly in implementing metho ds for picking a way to split no des (the PickSplit metho d describ ed
b elow) and for cho osing go o d keys (the Union and Compress metho ds describ ed b elow.)
The Predicate Trees of Valduriez and Viemont [VV84] are similar to classi cation trees, in that they are
not used to index data sets, and can b e de ned only on traditional alphanumeric predicates. Predicate Trees
are used to generate hash keys. Aeven more limited notion is that of the Interval Hierarchy [WE77], which
can b e used to index a le of simple conjunctive ordering predicates.
Analyses of R-tree p erformance have app eared in [FK94] and [PSTW93]. This work is dep endentonthe
spatial nature of typical R-tree data, and thus is not generally applicable to the GiST. However, similar ideas
may prove relevant to our questions of when and how one can build ecient indices in arbitrary domains.
2 Getting The Gist of Database Search Trees
As an intro duction to GiSTs, it is instructive to review search trees in a simpli ed manner. Most p eople with
database exp erience haveanintuitive notion of how search trees work, so our discussion here is purp osely
vague: the goal is simply to illustrate that this notion leaves many details unsp eci ed. After highlighting the
unsp eci ed details, we can pro ceed to describ e a structure that leaves the details op en for user sp eci cation.
The canonical rough picture of a database search tree app ears in Figure 1. It is a balanced tree, with
high fanout. The internal no des are used as a directory. The leaf no des contain p ointers to the actual data,
and are stored as a linked list to allow for partial or complete scanning.
Within each internal no de is a series of keys and p ointers. To search for tuples which match a query 3
predicate q , one starts at the ro ot no de. For each p ointer on the no de, if the asso ciated key is consistent
with q , i.e. the key do es not rule out the p ossibility that data stored b elow the p ointer may match q , then
one traverses the subtree b elow the p ointer, until all the matching data is found. As an illustration, we
review the notion of consistency in some familiar tree structures. In B+-trees, queries are in the form of
range predicates (e.g. \ nd all i such that c i c "), and keys logically delineate a range in which the
1 2
data b elow a p ointer is contained. If the query range and a p ointer's key range overlap, then the two are
consistent and the p ointer is traversed. In R-trees, queries are in the form of region predicates (e.g. \ nd
all i such that (x ;y ;x ;y )overlaps i"), and keys delineate the b ounding b ox in which the data b elowa
1 1 2 2
p ointer is contained. If the query region and the p ointer's key b oxoverlap, the p ointer is traversed.
Note that in the ab ove description, the only restriction placed on a key is that it must logically match
each datum stored b elow it, so that the consistency check do es not miss anyvalid data. In B+-trees and R-
trees, keys are essentially \containment" predicates: they describ e a contiguous region in which all the data
b elowapointer are contained. Containment predicates are not the only p ossible key constructs, however.
For example, the predicate \elected ocial(i) ^ has criminal record(i)" is an acceptable key if every data item
i stored b elow the asso ciated p ointer satis es the predicate. As in R-trees, keysonanodemay\overlap",
i.e. twokeys on the same no de may hold simultaneously for some tuple.
This exibility allows us to generalize the notion of a search key: a search key may be any arbitrary
predicate that holds for each datum below the key. Given a data structure with such exible searchkeys, a
user is free to form a tree by organizing data into arbitrary nested sub-categories, lab elling each with some
characteristic predicate. This in turn lets us capture the essential nature of a database search tree: it is a
hierarchy of categorizations, in which each categorization holds for al l data stored under it in the hierarchy.
Searches on arbitrary predicates may b e conducted based on the categorization. In order to supp ort searches
on a predicate q , the user must provide a Bo olean metho d to tell if q is consistent with a given searchkey.
When this is so, the search pro ceeds by traversing the p ointer asso ciated with the searchkey. The grouping of
data into categories maybecontrolled by a user-supplied no de splitting algorithm, and the characterization
of categories can b e done with user-supplied searchkeys. Thus by exp osing the key metho ds and the tree's
split metho d to the user, arbitrary search trees may b e constructed, supp orting an extensible set of queries.
These ideas form the basis of the GiST, whichwe pro ceed to describ e in detail.
3 The Generalized Search Tree
In this section we present the abstract data typ e (or \ob ject") Generalized Search Tree (GiST). We de ne
its structure, its invariant prop erties, its extensible metho ds and its built-in algorithms. As a matter of
convention, we refer to each indexed datum as a \tuple"; in an Ob ject-Oriented or Ob ject-Relational DBMS,
each indexed datum could b e an arbitrary data ob ject.
3.1 Structure
2 1
A GiST is a balanced tree of variable fanout b etween kM and M , k , with the exception of the ro ot
M 2
no de, whichmayhave fanout b etween 2 and M . The constant k is termed the minimum l l factor of the tree.
Leaf no des contain (p; ptr) pairs, where p is a predicate that is used as a searchkey, and ptr is the identi er
of some tuple in the database. Non-leaf no des contain (p; ptr) pairs, where p is a predicate used as a search
key and ptr is a p ointer to another tree no de. Predicates can contain anynumb er of free variables, as long
as any single tuple referenced by the leaves of the tree can instantiate all the variables. Note that by using
\key compression", a given predicate p may take as little as zero bytes of storage. However, for purp oses of
exp osition we will assume that entries in the tree are all of uniform size. Discussion of variable-sized entries
is deferred to Section 6.3. We assume in an implementation that given an entry E =(p; ptr), one can access
the no de on which E currently resides. This can prove helpful in implementing the key metho ds describ ed
b elow. 4
3.2 Prop erties
The following prop erties are invariant in a GiST:
1. Every no de contains b etween kM and M index entries unless it is the ro ot.
2. For each index entry (p; ptr) in a leaf no de, p is true when instantiated with the values from the
indicated tuple (i.e. p holds for the tuple.)
3. For each index entry (p; ptr) in a non-leaf no de, p is true when instantiated with the values of any
0 0
tuple reachable from ptr. Note that, unlike in R-trees, for some entry (p ; ptr ) reachable from ptr,
0 0 0
we do not require that p ! p, merely that p and p b oth hold for all tuples reachable from ptr .
4. The ro ot has at least twochildren unless it is a leaf.
5. All leaves app ear on the same level.
0
Prop erty3 is of particular interest. An R-tree would require that p ! p, since b ounding b oxes of an
R-tree are arranged in a containment hierarchy. The R-tree approach is unnecessarily restrictive, however:
the predicates in keys ab ove a no de N must hold for data b elow N , and therefore one need not havekeys
on N restate those predicates in a more re ned manner. One mightcho ose, instead, to have the keys at N
characterize the sets b elow based on some entirely orthogonal classi cation. This can b e an advantage in
b oth the information content and the size of keys.
3.3 Key Metho ds
In principle, the keys of a GiST may be arbitrary predicates. In practice, the keys come from a user-
implemented ob ject class, which provides a particular set of metho ds required by the GiST. Examples of
n
key structures include ranges of integers for data from Z (as in B+-trees), b ounding b oxes for regions in R
(as in R-trees), and b ounding sets for set-valued data, e.g. data from P (Z) (as in RD-trees, describ ed in
Section 4.3.) The key class is op en to rede nition by the user, with the following set of six metho ds required
by the GiST:
Consistent(E ,q ): given an entry E =(p; ptr), and a query predicate q , returns false if p ^ q can b e
guaranteed unsatis able, and true otherwise. Note that an accurate test for satis ability is not required
here: Consistentmay return true incorrectly without a ecting the correctness of the tree algorithms.
The p enalty for such errors is in p erformance, since they may result in exploration of irrelevant subtrees
during search.
Union(P ): given a set P of entries (p ; ptr );:::(p ;ptr ), returns some predicate r that holds for
1 n
1 n
all tuples stored b elow ptr through ptr . This can be done by nding a predicate r such that
1 n
(p _ ::: _ p ) ! r.
1 n
Compress(E ): given an entry E =(p; ptr) returns an entry (; ptr) where is a compressed repre-
sentation of p.
Decompress(E ): given a compressed representation E =(; ptr), where = Compress(p), returns
an entry (r; ptr) such that p ! r . Note that this is a p otentially \lossy" compression, since wedonot
require that p $ r .
Penalty(E ;E ): given twoentries E =(p ;ptr );E =(p ;ptr ), returns a domain-sp eci c p enalty
1 2 1 1 2 2
1 2
for inserting E into the subtree ro oted at E . This is used to aid the Split and Insert algorithms
2 1
(describ ed b elow.) Typically the p enalty metric is some representation of the increase of size from
2
E :p to Union(E ;E ). For example, Penalty for keys from R can b e de ned as area(Union (E ;E ))
1 1 1 2 1 2
area(E :p ) [Gut84].
1 1 5
PickSplit(P ): given a set P of M +1 entries (p; ptr), splits P into two sets of entries P ;P , eachof
1 2
size at least kM . The choice of the minimum ll factor for a tree is controlled here. Typically,itis
desirable to split in suchaway as to minimize some badness metric akin to a multi-wayPenalty, but
this is left op en for the user.
The ab ove are the only metho ds a GiST user needs to supply. Note that Consistent, Compress, Union
and Penaltyhave to b e able to handle any predicate in their input. In full generality this could b ecome very
dicult, esp ecially for Consistent. But typically a limited set of predicates is used in any one tree, and this
set can b e constrained in the metho d implementation.
There are a number of options for key compression. A simple implementation can let b oth Compress
and Decompress be the identity function. A more complex implementation can have Compress((p; ptr))
generate a valid but more compact predicate r , p ! r , and let Decompress b e the identity function. This
is the technique used in SHORE's R-trees, for example, which up on insertion take a p olygon and compress
it to its b ounding box, which is itself a valid p olygon. It is also used in pre x B+-trees [Com79], which
truncate split keys to an initial substring. More involved implementations might use complex metho ds for
b oth Compress and Decompress.
3.4 Tree Metho ds
The key metho ds in the previous section must b e provided by the designer of the key class. The tree metho ds
in this section are provided by the GiST, and may invoke the required key metho ds. Note that keys are
Compressed when placed on a no de, and Decompressed when read from a no de. We consider this implicit,
and will not mention it further in describing the metho ds.
3.4.1 Search
Search comes in two avors. The rst metho d, presented in this section, can b e used to searchany dataset
with any query predicate, by traversing as much of the tree as necessary to satisfy the query. It is the most
general search technique, analogous to that of R-trees. A more ecient technique for queries over linear
orders is describ ed in the next section.
Algorithm Search(R; q )
Input: GiST ro oted at R , predicate q
Output: all tuples that satisfy q
Sketch: Recursively descend all paths in tree whose keys are consistent with q .
S1: [Search subtrees] If R is not a leaf, check each entry E on R to determine whether
Consistent(E; q). For all entries that are Consistent, invoke Search on the subtree
whose ro ot no de is referenced by E:ptr.
S2: [Search leaf no de] If R is a leaf, check each entry E on R to determine whether
Consistent(E; q). If E is Consistent, it is a qualifying entry. At this p oint E:ptr could
b e fetched to check q accurately, or this check could b e left to the calling pro cess.
Note that the query predicate q can b e either an exact match (equality) predicate, or a predicate satis able
by many values. The latter category includes \range" or \window" predicates, as in B+ or R-trees, and
also more general predicates that are not based on contiguous areas (e.g. set-containment predicates like \all
sup ersets of f6, 7, 68g".) 6
3.4.2 Search In Linearly Ordered Domains
If the domain to b e indexed has a linear ordering, and queries are typically equality or range-containment
predicates, then a more ecient search metho d is p ossible using the FindMin and Next metho ds de ned in
this section. To make this option available, the user must take some extra steps when creating the tree:
1. The ag IsOrdered must be set to true. IsOrdered is a static prop erty of the tree that is set at
creation. It defaults to false.
2. An additional metho d Compare(E ;E )must b e registered. Given twoentries E =(p ;ptr ) and
1 2 1 1
1
E =(p ;ptr ), Compare returns a negativenumber if p precedes p , returns a p ositivenumber if p
2 2 1 2 1
2
follows p and returns 0 otherwise. Compare is used to insert entries in order on each no de.
2
3. The PickSplit metho d must ensure that for anyentries E on P and E on P , Compare(E ;E ) < 0.
1 1 2 2 1 2
4. The metho ds must assure that no twokeys on a no de overlap, i.e. for any pair of entries E ;E on a
1 2
no de, Consistent(E ;E :p) = false.
1 2
If these four steps are carried out, then equality and range-containment queries may be evaluated by
calling FindMin and rep eatedly calling Next, while other query predicates may still b e evaluated with the
general Search metho d. FindMin/Next is more ecient than traversing the tree using Search, since FindMin
and Next only visit the non-leaf no des along one ro ot-to-leaf path. This technique is based on the typical
range-lo okup in B+-trees.
Algorithm FindMin(R; q )
Input: GiST ro oted at R , predicate q
Output: minimum tuple in linear order that satis es q
Sketch: descend leftmost branch of tree whose keys are Consistent with q . When a leaf no de
is reached, return the rst key that is Consistent with q .
FM1: [Search subtrees] If R is not a leaf, nd the rst entry E in order such that
1
Consistent(E; q) . If suchanE can b e found, invoke FindMin on the subtree whose
ro ot no de is referenced by E:ptr. If no suchentry is found, return NULL.
FM2: [Search leaf no de] If R is a leaf, nd the rst entry E on R such that Consistent(E; q),
and return E . If no suchentry exists, return NULL.
Given one element E that satis es a predicate q , the Next metho d returns the next existing element that
satis es q , or NULL if there is none. Next is made suciently general to nd the next entry on non-leaf levels
of the tree, which will prove useful in Section 4. For search purp oses, however, Next will only b e invoked on
leaf entries.
1
The appropriate entry may b e found by doing a binary search of the entries on the no de. Further discussion of intra-no de
search optimizations app ears in Section 6.1. 7
Algorithm Next(R; q ; E )
Input: GiST ro oted at R , predicate q , currententry E
Output: next entry in linear order that satis es q
Sketch: return next entry on the same level of the tree if it satis es q . Else return NULL.
N1: [next on no de] If E is not the rightmost entry on its no de, and N is the next entry
to the rightof E in order, and Consistent(N; q), then return N . If :Consistent(N; q),
return NULL.
N2: [next on neighb oring no de] If E is the righmost entry on its no de, let P b e the next
no de to the rightof R on the same level of the tree (this can b e found via tree traversal,
or via sideways p ointers in the tree, when available [LY81].) If P is non-existent, return
NULL. Otherwise, let N b e the leftmost entry on P . If Consistent(N; q), then return
N , else return NULL.
3.4.3 Insert
The insertion routines guarantee that the GiST remains balanced. They are very similar to the insertion
routines of R-trees, which generalize the simpler insertion routines for B+-trees. Insertion allows sp eci cation
of the level at which to insert. This allows subsequent metho ds to use Insert for reinserting entries from
internal no des of the tree. We will assume that level numb ers increase as one ascends the tree, with leaf
no des b eing at level 0. Thus new entries to the tree are inserted at level l =0.
Algorithm Insert(R; E ; l )
Input: GiST ro oted at R ,entry E =(p; ptr), and level l , where p is a predicate such that p
holds for all tuples reachable from ptr.
Output: new GiST resulting from insert of E at level l .
Sketch: nd where E should go, and add it there, splitting if necessary to make ro om.
I1. [invoke Cho oseSubtree to nd where E should go] Let L = Cho oseSubtree(R; E ; l )
I2. If there is ro om for E on L, install E on L (in order according to Compare, if IsOr-
dered.) Otherwise invoke Split(R ; L; E ).
I3. [propagate changes upward] AdjustKeys(R; L).
Cho oseSubtree can b e used to nd the b est no de for insertion at any level of the tree. When the IsOrdered
prop erty holds, the Penalty metho d must b e carefully written to assure that Cho oseSubtree arrives at the
correct leaf no de in order. An example of how this can b e done is given in Section 4.1. 8
Algorithm Cho oseSubtree(R; E ; l )
Input: subtree ro oted at R ,entry E =(p; ptr), level l
Output: no de at level l b est suited to hold entry with characteristic predicate E:p
Sketch: Recursively descend tree minimizing Penalty
CS1. If R is at level l , return R ;
0
CS2. Else among all entries F = (q; ptr ) on R nd the one such that Penalty(F; E ) is
0
minimal. Return Cho oseSubtree(F:ptr ;E;l).
The Split algorithm makes use of the user-de ned PickSplit metho d to cho ose how to split up the elements
of a no de, including the new tuple to be inserted into the tree. Once the elements are split up into two
groups, Split generates a new no de for one of the groups, inserts it into the tree, and up dates keys ab ove the
new no de.
Algorithm Split(R; N ; E )
Input: GiST R with no de N , and a new entry E =(p; ptr).
Output: the GiST with N split in two and E inserted.
Sketch: split keys of N along with E into two groups according to PickSplit. Put one group
onto a new no de, and Insert the new no de into the parentof N.
SP1: Invoke PickSplit on the union of the elements of N and fE g, put one of the two
0
partitions on no de N , and put the remaining partition on a new no de N .
0 0
0
SP2: [Insert entry for N in parent] Let E =(q; ptr ), where q is the Union of all entries
N
0 0 0
0 0
on N , and ptr is a p ointer to N . If there is ro om for E on Parent(N ), install E
N N
2
0
on Parent(N ) (in order if IsOrdered.) Otherwise invoke Split(R; Parent(N );E ) .
N
SP3: Mo dify the entry F which p oints to N , so that F:p is the Union of all entries on N .
Step SP3 of Split mo di es the parent no de to re ect the changes in N . These changes are propagated
upwards through the rest of the tree by step I3 of the Insert algorithm, which also propagates the changes
0
due to the insertion of N .
The AdjustKeys algorithm ensures that keys ab ove a set of predicates hold for the tuples b elow, and are
appropriately sp eci c.
2
Weintentionally do not sp ecify what technique is used to nd the Parent of a no de, since this implementation interacts
with issues related to concurrency control, which are discussed in Section 6. Dep ending on techniques used, the Parentmaybe
found via a p ointer, a stack, or via re-traversal of the tree. 9
Algorithm AdjustKeys(R; N )
Input: GiST ro oted at R , tree no de N
Output: the GiST with ancestors of N containing correct and sp eci c keys
Sketch: ascend parents from N in the tree, making the predicates b e accurate characterizations
of the subtrees. Stop after ro ot, or when a predicate is found that is already accurate.
PR1: If N is the ro ot, or the entry which p oints to N has an already-accurate representation
of the Union of the entries on N , then return.
PR2: Otherwise, mo dify the entry E which p oints to N so that E:p is the Union of all
entries on N . Then AdjustKeys(R ,Parent(N ).)
Note that AdjustKeys typically p erforms no work when IsOrdered = true, since for such domains predi-
cates on eachnodetypically partition the entire domain into ranges, and thus need no mo di cation on simple
insertion or deletion. The AdjustKeys routine detects this in step PR1, whichavoids calling AdjustKeys on
higher no des of the tree. For such domains, AdjustKeys may b e circumvented entirely if desired.
3.4.4 Delete
The deletion algorithms must maintain the balance of the tree, and attempt to keep keys as sp eci c as
p ossible. The CondenseTree algorithm uses the B+-tree \b orrow or coalesce" technique for under ow if
there is a linear order. Otherwise it uses the R-tree reinsertion technique.
Algorithm Delete(R; E )
Input: GiST R , leaf entry E =(p; ptr)
Output: balanced GiST with E removed
Sketch: Remove E from its leaf no de. If this causes under ow, adjust tree accordingly. Up date
predicates in ancestors to keep them as sp eci c as p ossible.
D1: [Find no de containing entry] Invoke Search(R; E :p) and nd leaf no de L containing
E . Stop if E not found.
D2: [Delete entry.] Remove E from L.
D3: [Propagate changes.] Invoke CondenseTree(R; L);
D4: [Shorten tree.] If the ro ot no de has only one child after the tree has b een adjusted,
make the child the new ro ot. 10
Algorithm CondenseTree(R; L)
Input: GiST R containing leaf no de L
Output: GiST with invariant prop erties maintained
Sketch: If L has fewer than kM elements, either eliminate L and relo cate its entries, or b orrow
entries from elsewhere to put on L. Propagate no de elimination upwards as neces-
sary. Adjust all predicates on the path to the ro ot, making them more sp eci c as
appropriate.
CT1: [Initialize.] Set N = L. Set Q, the set of eliminated no des, to b e empty.
CT2: If N is the ro ot, go to CT6. Otherwise let P =Parent(N ), and let E b e the entry
N
on P that p oints to N .
CT3: [Handle under-full no de] If N has fewer than kM entries:
CT3.1 [If not IsOrdered, delete N ]If not IsOrdered, add the elements of N to set Q,
delete E from P , and invoke AdjustKeys(R; P ).
N
CT3.2 [If IsOrdered, try to b orrow entries, else coalesce with a neighb or] Otherwise
0 0
let N be the neighb oring no de in order. If the number of keys on N and N
0
combined is 2kM or greater, then evenly split the entries on N and N between
0
the two no des. Otherwise, place the entries from N onto N , and delete E from
N
0
P . Invoke AdjustKeys(R; N ), and AdjustKeys(R; P ).
CT4: [Adjust covering predicate] If E was not deleted from P , then AdjustKeys(R; N );
N
CT5: [Propagate deletes] If E was deleted from P , let N = P , and goto CT2;
N
CT6: [Re-insert orphaned entries] If Q is not empty, invoke Insert(R; E ; level(E )) for all
elements E of Q.
In some implementations, it is considered preferable to leave a no de under-full after a delete, in the
exp ectation that it will ll up so on thereafter [JS93]. To supp ort such b ehavior, step D3 of algorithm Delete
could invoke AdjustKeys(R; L), and avoid CondenseTree.
4 The GiST for Three Applications
In this section we brie y describ e implementations of key classes used to make the GiST b ehave like a
B+-tree, an R-tree, and an RD-tree, a new R-tree-like index over set-valued data.
4.1 GiSTs Over Z (B+-trees)
In this example we index integer data. Before compression, each key in this tree is a pair of integers,
representing the interval contained b elow the key. Particularly, a key represents the predicate
Contains([a; b);v) with variable v . The query predicates we supp ort in this key class are Contains(interval,
v ), and Equal(numb er, v ). The interval in the Contains query maybe closed or op en at either end. The
b oundary of any interval of integers can be trivially converted to be closed or op en. So without loss of
generality,we assume b elow that all intervals are closed on the left and op en on the right.
The implementations of the Contains and Equal query predicates are as follows:
Contains([x; y );v) If x v Equal(x; v ) If x = v return true. Otherwise return false. Now, the implementations of the GiST metho ds: Consistent(E; q) Given entry E = (p; ptr) and query predicate q , we know that p = Contains([x ;y );v), and either q = Contains([x ;y );v) or q = Equal(x ;v). In the rst case, p p q q q return true if (x p q p q p q p and false otherwise. Union(E =([x ;y );ptr );:::;E =([x ;y );ptr )) Return [min (x ;:::;x );max(y ;:::;y )). 1 1 1 n n n 1 n 1 n 1 n Compress(E = ([x; y ); ptr)) If E is the leftmost key on a non-leaf no de, return a 0-byte ob ject. Otherwise return x. Decompress(E =(; ptr)) Wemust construct an interval [x; y ). If E is the leftmost key on a non-leaf no de, let x = 1. Otherwise let x = . If E is the rightmost key on a non-leaf no de, let y = 1. If E is any other key on a non-leaf no de, let y b e the value stored in the next key (as found by the Next metho d.) If E is on a leaf no de, let y = x +1. Return ([x; y ); ptr). Penalty(E = ([x ;y );ptr );F = ([x ;y );ptr )) If E is the leftmost p ointer on its no de, return 1 1 2 2 1 2 max(y y ; 0). If E is the rightmost p ointer on its no de, return max(x x ; 0). Otherwise return 2 1 1 2 max(y y ; 0) + max(x x ; 0). 2 1 1 2 jP j jP j PickSplit(P ) Let the rst b c entries in order go in the left group, and the last d e entries go in 2 2 M the right. Note that this guarantees a minimum ll factor of . 2 Finally, the additions for ordered keys: IsOrdered = true Compare(E =(p ;ptr );E =(p ;ptr )) Given p =[x ;y ) and p =[x ;y ), return x x . 1 1 2 2 1 1 1 2 2 2 1 2 1 2 There are a number of interesting features to note in this set of metho ds. First, the Compress and Decompress metho ds pro duce the typical \split keys" found in B+-trees, i.e. n 1 stored keys for n p ointers, with the leftmost and rightmost b oundaries on a no de left unsp eci ed (i.e. 1 and 1). Even though GiSTs use key/p ointer pairs rather than split keys, this GiST uses no more space for keys than a traditional B+-tree, since it compresses the rst p ointer on each no de to zero bytes. Second, the Penalty metho d allows the GiST to cho ose the correct insertion p oint. Inserting (i.e. Unioning) a new key value k into a interval [x; y ) will cause the Penalty to b e p ositive only if k is not already contained in the interval. Thus in step CS2, the Cho oseSubtree metho d will place new data in the appropriate sp ot: any set of keys on a no de partitions the entire domain, so in order to minimize the Penalty, Cho oseSubtree will cho ose the one partition in which k is already contained. Finally, observe that one could fairly easily supp ort more complex predicates, including disjunctions of intervals in query predicates, or ranked intervals in key predicates for supp orting ecient sampling [WE80]. 2 4.2 GiSTs Over Polygons in R (R-trees) In this example, our data are 2-dimensional p olygons on the Cartesian plane. Before compression, the keys in this tree are 4-tuples of reals, representing the upp er-left and lower-right corners of rectilinear b ounding rectangles for 2d-p olygons. Akey (x ;y ;x ;y ) represents the predicate Contains((x ;y ;x ;y );v), ul ul lr lr ul ul lr lr where (x ;y ) is the upp er left corner of the b ounding b ox, (x ;y ) is the lower right corner, and v is the ul ul lr lr free variable. The query predicates we supp ort in this key class are Contains(b ox, v ), Overlap(b ox, v ), and Equal(b ox, v ), where b ox is a 4-tuple as ab ove. The implementations of the query predicates are as follows: 12 1 1 1 1 2 2 2 2 Contains((x ;y ;x ;y ); (x ;y ;x ;y )) Return true if ul ul lr lr ul ul lr lr 1 2 1 2 1 2 1 2 (x x ) ^ (x x ) ^ (y y ) ^ (y y ): lr lr ul ul lr lr ul ul Otherwise return false. 1 1 1 1 2 2 2 2 Overlap((x ;y ;x ;y ); (x ;y ;x ;y )) Return true if ul ul lr lr ul ul lr lr 1 2 2 1 1 2 2 1 (x x ) ^ (x x ) ^ (y y ) ^ (y y ): ul lr ul lr lr ul lr ul Otherwise return false. 1 1 1 1 2 2 2 2 Equal((x ;y ;x ;y ); (x ;y ;x ;y )) Return true if ul ul lr lr ul ul lr lr 1 2 1 2 1 2 1 2 (x = x ) ^ (y = y ) ^ (x = x ) ^ (y = y ): ul ul ul ul lr lr lr lr Otherwise return false. Now, the GiST metho d implementations: 1 1 1 1 Consistent((E; q) Given entry E = (p; ptr), we know that p = Contains((x ;y ;x ;y );v), and ul ul lr lr 2 2 2 2 q is either Contains, Overlap or Equal on the argument(x ;y ;x ;y ). For any of these queries, ul ul lr lr 1 1 1 1 2 2 2 2 return true if Overlap((x ;y ;x ;y ); (x ;y ;x ;y )), and return false otherwise. ul ul lr lr ul ul lr lr 1 1 1 1 n n n n Union(E =((x ;y ;x ;y ); ptr );:::;E =(x ;y ;x ;y )) q n 1 ul ul lr lr ul ul lr lr 1 n 1 n 1 n 1 n Return (min (x ;:::;x ), max(y ;:::;y ), max(x ;:::;x ), min(y ;:::;y )). ul ul ul ul lr lr lr lr Compress(E =(p; ptr)) Form the b ounding b ox of p olygon p, i.e., given a p olygon stored as a set of i i i i i i i i line segments l =(x ;y ;x ;y ), form =(8 min(x ), 8 max(y ), 8 max(x ), 8 min(y )). Return i i i i i 1 1 2 2 ul ul lr lr (; ptr). Decompress(E =((x ;y ;x ;y ); ptr)) The identity function, i.e., return E . ul ul lr lr Penalty(E ;E ) Given E =(p ;ptr ) and E =(p ;ptr ), compute q = Union(E ;E ), and return 1 2 1 1 2 2 1 2 1 2 area(q ) area(E :p). This metric of \change in area" is the one prop osed by Guttman [Gut84]. 1 PickSplit(P ) A variety of algorithms have b een prop osed for R-tree splitting. We thus omit this metho d implementation from our discussion here, and refer the interested reader to [Gut84] and [BKSS90]. The ab ove implementations, along with the GiST algorithms describ ed in the previous chapters, give b ehavior identical to that of Guttman's R-tree. A series of variations on R-trees have b een prop osed, notably the R*-tree [BKSS90] and the R+-tree [SRF87]. The R*-tree di ers from the basic R-tree in three ways: in its PickSplit algorithm, which hasavariety of small changes, in its Cho oseSubtree algorithm, whichvaries only slightly, and in its p olicy of reinserting a numberofkeys during no de split. It would not b e dicult to implement the R*-tree in the GiST: the R*-tree PickSplit algorithm can b e implemented as the PickSplit metho d of the GiST, the mo di cations to Cho oseSubtree could b e intro duced with a careful implementation of the Penalty metho d, and the reinsertion p olicy of the R*-tree could easily b e added into the built-in GiST tree metho ds (see Section 7.) R+-trees, on the other hand, cannot b e mimicked by the GiST. This is b ecause the R+-tree places duplicate copies of data entries in multiple leaf no des, thus violating the GiST principle of a search tree b eing a hierarchyof partitions of the data. Again, observe that one could fairly easily supp ort more complex predicates, including n-dimensional analogs of the disjunctive queries and ranked keys mentioned for B+-trees. Other examples include arbitrary variations of the usual overlap or ordering queries, e.g. \ nd all p olygons that overlap more than 30% of this box", or \ nd all p olygons that overlap 12 to 1 o'clo ck", which for a given p oint p returns all p olygons that are in the region b ounded bytworays that exit p at angles 90 and 60 in p olar co ordinates. Note that this in nite region cannot be de ned as a p olygon made up of line segments, and hence this query cannot be expressed using typical R-tree predicates. 13 4.3 GiSTs Over P (Z) (RD-trees) In the previous two sections we demonstrated that the GiST can provide the functionality of two known data structures: B+-trees and R-trees. In this section, we demonstrate that the GiST can provide supp ort for a new search tree that indexes set-valued data. The problem of handling set-valued data is attracting increasing attention in the Ob ject-Oriented data- base community [KG94], and is fairly natural even for traditional relational database applications. For example, one mighthave a university database with a table of students, and for each student an attribute courses passed of typ e setof(integer). One would like to eciently supp ort containment queries such as \ nd all students who have passed all the courses in the prerequisite set f101, 121, 150g." We handle this in the GiST by using sets as containmentkeys, much as an R-tree uses b ounding b oxes as containmentkeys. We call the resulting structure an RD-tree (or \Russian Doll" tree.) The keys in an RD-tree are sets of integers, and the RD-tree derives its name from the fact that as one traverses a branchof the tree, eachkey contains the key b elow it in the branch. We pro ceed to give GiST metho d implementations for RD-trees. Before compression, the keys in our RD-trees are sets of integers. A key S represents the predicate Contains(S; v ) for set-valued variable v . The query predicates allowed on the RD-tree are Contains(set, v ), Overlap(set, v ), and Equal(set, v ). The implementation of the query predicates is straightforward: Contains(S; T ) Return true if S T , and false otherwise. Overlap(S; T ) Return true if S \ T 6= ;, false otherwise. Equal(S; T ) Return true if S = T , false otherwise. Now, the GiST metho d implementations: Consistent(E =(p; ptr);q) Given our keys and predicates, we know that p = Contains(S; v ), and either q = Contains(T; v), q = Overlap(T; v) or q = Equal(T; v). For all of these, return true if Overlap(S; T ), and false otherwise. Union(E =(S ;ptr );:::;E =(S ;ptr )) Return S [ ::: [ S . 1 1 n n 1 n 1 n Compress(E =(S; ptr)) Avariety of compression techniques for sets are given in [HP94]. We brie y describ e one of them here. The elements of S are sorted, and then converted into a set of n disjoint ranges f[l ;h ];[l ;h ];:::;[l ;h ]g where l h , and h < l . The conversion is done using the 1 1 2 2 n n i i i i+1 following algorithm: Initialize: consider each element a 2 S to be a range [a ;a ]. m m m while (more than n ranges remain) f find the pair of adjacent ranges with the least interval between them; form a single range of the pair; g The resulting structure is called a rangeset. It can b e shown that this algorithm pro duces a rangeset of n items with minimal addition of elements not in S [HP94]. Decompress(E = (rangeset; ptr)) Rangesets are easily converted back to sets by enumerating the elements in the ranges. Penalty(E =(S ;ptr );E =(S ;ptr ) Return jE :S [ E :S j jE :S j. Alternatively, return the 1 1 2 2 1 1 2 2 1 1 1 2 change in a weighted cardinality, where each element of Z has a weight, and jS j is the sum of the weights of the elements in S . 14 0, 1 1,1 B+-trees Data Overlap 0,0 1, 0 Compression Loss Figure 2: Space of Factors A ecting GiST Performance PickSplit(P ) Guttman's quadratic algorithm for R-tree split works naturally here. The reader is referred to [Gut84] for details. This GiST supp orts the usual R-tree query predicates, has containment keys, and uses a traditional R-tree algorithm for PickSplit. As a result, wewere able to implement these metho ds in Illustra's extensible R-trees, and get b ehavior identical to what the GiST b ehavior would b e. This exercise gave us a sense of the complexity of a GiST class implementation (c. 500 lines of C co de), and allowed us to do the p erformance studies describ ed in the next section. Using R-trees did limit our choices for predicates and for the split and p enalty algorithms, which will merit further exploration when we build RD-trees using GiSTs. 5 GiST Performance Issues In balanced trees such as B+-trees whichhave non-overlapping keys, the maximum numb er of no des to b e examined (and hence I/O's) is easy to b ound: for a p oint query over duplicate-free data it is the heightof the tree, i.e. O (log n) for a database of n tuples. This upp er b ound cannot b e guaranteed, however, if keys onanodemayoverlap, as in an R-tree or GiST, since overlapping keys can cause searches in multiple paths in the tree. The p erformance of a GiST varies directly with the amount that keys on no des tend to overlap. There are two ma jor causes of key overlap: data overlap, and information loss due to key compression. The rst issue is straightforward: if many data ob jects overlap signi cantly, then keys within the tree are likely to overlap as well. For example, any dataset made up entirely of identical items will pro duce an inecient index for queries that match the items. Such workloads are simply not amenable to indexing techniques, and should b e pro cessed with sequential scans instead. Loss due to key compression causes problems in a slightly more subtle way: even though two sets of data may not overlap, the keys for these sets mayoverlap if the Compress/Decompress metho ds do not pro duce exact keys. Consider R-trees, for example, where the Compress metho d pro duces b ounding b oxes. If ob jects are not b ox-like, then the keys that represent them will b e inaccurate, and may indicate overlaps when none are present. In R-trees, the problem of compression loss has b een largely ignored, since most spatial data 3 ob jects (geographic entities, regions of the brain, etc.) tend to b e relatively b ox-shap ed. But this need not b e the case. For example, consider a 3-d R-tree index over the dataset corresp onding to a plate of spaghetti: although no single spaghetto intersects any other in three dimensions, their b ounding b oxes will likely all intersect! 3 Better approximations than b ounding b oxes have b een considered for doing spatial joins [BKSS94]. However, this work prop oses using b ounding boxes in an R*-tree, and only using the more accurate approximations in main memory during p ost-pro cessing steps. 15 The two p erformance issues describ ed ab ove are displayed as a graph in Figure 2. At the origin of this graph are trees with no data overlap and lossless key compression, whichhave the optimal logarithmic p erformance describ ed ab ove. Note that B+-trees over duplicate-free data are at the origin of the graph. As one moves towards 1 along either axis, p erformance can b e exp ected to degrade. In the worst case on the x axis, keys are consistent with any query, and the whole tree must b e traversed for any query. In the worst case on the y axis, all the data are identical, and the whole tree must b e traversed for any query consistent with the data. In this section, we present some initial exp eriments wehave done with RD-trees to explore the space of Figure 2. Wechose RD-trees for two reasons: 1. Wewere able to implement the metho ds in Illustra R-trees. 2. Set data can be \co oked" to have almost arbitrary overlap, as opp osed to p olygon data which is contiguous within its b oundaries, and hence harder to manipulate. For example, it is trivial to construct n distant \hot sp ots" shared by all sets in an RD-tree, but is geometrically dicult to do the same for p olygons in an R-tree. We thus b elieve that set-valued data is particularly useful for exp erimenting with overlap. To validate our intuition ab out the p erformance space, we generated 30 datasets, each corresp onding to a p oint in the space of Figure 2. Each dataset contained 10000 set-valued ob jects. Each ob ject was a regularly spaced set of ranges, much like a comb laid on the number line (e.g. f[1; 10]; [100001; 100010]; [200001; 200010];:::g). The \teeth" of each combwere 10 integers wide, while the spaces b etween teeth were 99990 integers wide, large enough to accommo date one to oth from every other ob ject in the dataset. The 30 datasets were formed bychanging twovariables: numranges, the numb er of ranges p er set, and overlap, the amount that each comboverlapp ed its predecessor. Varying numranges adjusted the compression loss: our Compress metho d only allowed for 20 ranges per rangeset, so a comb of t > 20 teeth had t 20 of its inter-to oth spaces erroneously included into its compressed representation. The amountofoverlap was controlled by the left edge of each comb: for overlap 0, the rst combwas started at 1, the second at 11, the third at 21, etc., so that no two combs overlapp ed. For overlap 2, the rst combwas started at 1, the second at 9, the third at 17, etc. The 30 datasets were generated by forming all combinations of numranges in f20, 25, 30, 35, 40g, and overlap in f0, 2, 4, 6, 8, 10g. For eachof the 30 datasets, ve queries were p erformed. Each query searched for ob jects overlapping a di erent to oth of the rst comb. The query p erformance was measured in numb er of I/Os, and the ve numb ers averaged p er dataset. Achart of the p erformance is shown in App endix A. More illustrative is the 3-d plot shown in Figure 3, where the x and y axes are the same as in Figure 2, and the z axis represents the average numb er of I/Os. The landscap e is muchaswe had exp ected: it slop es upwards as wemoveaway from 0 on either axis. While our general insights on data overlap and compression loss are veri ed by this exp eriment, a number of p erformance variables remain unexplored. Two issues of concern are hot spots and the correlation factor across hot sp ots. Hot sp ots in RD-trees are integers that app ear in many sets. In general, hot sp ots can b e thoughtofasvery sp eci c predicates satis able by many tuples in a dataset. The correlation factor for two integers j and k in an RD-tree is the likeliho o d that if one of j or k app ears in a set, then b oth app ear. In general, the correlation factor for two hot sp ots p; q is the likeliho o d that if p _ q holds for a tuple, p ^ q holds as well. An interesting question is how the GiST b ehaves as one denormalizes data sets to pro duce hot sp ots, and correlations b etween them. This question, along with similar issues, should provetobearich area of future research. 6 Implementation Issues In previous sections we describ ed the GiST, demonstrated its exibility, and discussed its p erformance as an index for secondary storage. A full- edged database system is more than just a secondary storage manager, 16 Avg. Number of I/Os 5000 4000 3000 1 2000 0.8 0.6 1000 0.4 Data Overlap 0.2 0 0.1 0.2 0.3 0 Compression Loss 0.4 0.5 Figure 3: Performance in the Parameter Space This surface was generated from the data in App endix A. Compression loss was calculated as (numranges 20)=numranges, while data overlap was calculated as overlap=10. however. In this section we brie y address how the GiST can b e implemented, taking into account imp ortant database system issues. 6.1 In-Memory Eciency: Streamlining Intra-no de Op erations The GiST is an index for secondary storage. As a result, the algorithms describ ed ab ove fo cus on minimizing no de accesses, which are the only op erations that involve I/O. However, even secondary storage structures should be implemented with an eye on algorithmic eciency, and in some situations it is b ene cial to streamline in-memory op erations on a single no de. We facilitate such optimizations in the GiST for situations where they can b e exploited, although the default b ehavior is general-purp ose and can b e relied up on for correctness in all scenarios. For the sake of simplicity and generality, the algorithms describ ed ab ove compare input predicates with eachentry on a no de. More ecientschemes may b e used for particular domains. For example, in ordered domains the appropriate entries on a no de may b e found via binary search. Another example is the hB-tree, in which the entries on a no de are themselves indexed via a k-d tree [Ben79], which is used for nding the appropriate entries on the no de. Many alternative optimizations may exist, dep ending on the domain of the key predicates. To facilitate these techniques, additional extensibility in the GiST may optionally b e leveraged for high p erformance. The No de ob ject in the GiST maybe sub classed, and metho ds for searching, inserting and deleting entries on a no de may b e sp ecialized. For the sake of brevity,we do not describ e the No de interface in detail here, but instead give an illustrative example. The default b ehavior of the No de ob ject is to store entries in a typical slotted-page format, and do intra-no de search in a linear fashion. The b ehavior of an hB- tree maybeachieved instead, by sp ecifying a sub class of the No de ob ject which has a k-d tree memb er, and overloaded memb er functions for intra-no de insertion, deletion, and search which maintain and exploit the k-d tree. Improved No de metho ds such as these can streamline the intra-no de search op erations mentioned in steps S1, S2, FM1, FM2, and CS2. Small overheads may be incurred in steps which insert, delete or mo dify entries on no des. 17 Another in-memory optimization is to avoid the overhead of calling user metho ds for each op eration. This can b e done by writing metho ds as inline co de, and recompiling the system. This is discussed in more detail in Section 6.6. 6.2 Concurrency Control, Recovery and Consistency High concurrency, recoverability, and degree-3 consistency are critical factors in a full- edged database system. Concurrency control for B+-trees is a well-understo o d problem, with Lehman and Yao's B-link variant[LY81] b eing atypical way of implementing high-concurrency B+-trees. Recovery for B-link trees has b een explored by Lomet and Salzb erg [LS92], who show that a -tree, which generalizes the B-link tree, can b e reconstructed gradually from anyinterim state. Until recently, no analogous results existed for R-trees. The main stumbling blo ck was the fact that Lehman and Yao's techniques were based on linear ordering in the data | sideways p ointers were intro duced at each level of the tree to connect the no des of each level into a linked list, ordered bykey value. R-tree data has no such natural ordering, and imp osing an arti cial ordering upsets the balancing techniques for the tree. Fortunately, recent work extends Lehman and Yao's sideways p ointer techniques, along with recovery techniques, to R-trees. Ng and Kameda [NK94] do so by generating a pending update list at each no de of the tree, and applying Lomet and Salzb erg's results on -trees to this context. Banks, Kornacker and Stonebraker [BKS94, KB95] have a simpler solution that marks keys and no des with sequence numb ers, and uses the sequence numb ers to determine order among the no des. Both approaches provide solutions for degree-3 consistency | Ng and Kameda via ab orting transactions that read phantoms, and Banks, et al. via predicate lo cking. These techniques can b e used in GiSTs in the same way they are used in R-trees. The sequence number approach seems more attractive for the GiST, since it is a simple generalization of the original B-link approach. As a result, the original approach could b e used in the case of linearly ordered data, and the more complex sequence numb er approach for unordered data. A full implementation of concurrency, recoverability, and consistency in the GiST will certainly have to revisit some issues, but the bulk of the problems have already b een addressed. 6.3 Variable-Length Keys It is often convenient to allowkeys to vary in length, particularly b ecause of the Compress metho d available in GiSTs. Unfortunately,variable length keys can cause problems for the GiST (and for R-trees), due to the following p ossible scenario: 1. A new entry is to b e inserted in leaf no de L 0 0 2. L is full, so a new no de L is generated, and the data on L is split among L and L 3. During split, the new entry is assigned to L. This results in the key ab ove L b eing readjusted. 4. The key ab ove L grows in size, and no longer ts on the parent no de, so... 5. the entry for the key ab ove L must b e removed from the parentnode 0 At this p oint, we are in the unpleasant situation of having both L and L detached from our tree. The traditional split propagation algorithm only works for one detached no de, not for two. Some new technique is required to handle this. For linearly ordered domains, the situation outlined ab ove cannot o ccur, since key propagation is not necessary (recall the discussion at the end of Section 3.4.3.) For unordered domains, the problem can indeed o ccur. We suggest two p ossible solutions: Reinsertion: The scenario describ ed can b e handled by calling the Insert metho d to reinsert the two orphaned no des at their original level. Some care must b e taken to ensure that the problem describ ed ab ove do es not re-o ccur on reinsert. 18 0 New Entry on New No de: An alternative is to force the new entry to b e placed on L after the split. Then we know that some suciently small key can b e found that is appropriate for L: in particular, the old key for L is still valid after the split. The old key may b e unacceptably general though, and it is advisable to search for a more sp eci c key that is small enough to t. The rst solution has the advantage of b eing exible, and p otentially improving the p erformance of the tree through re ned data placement [BKSS90]. It may require some mo di cations to the concurrency control techniques describ ed in the previous section. The second solution, while simpler, presents the problem of nding a go o d key that requires suciently little storage. This is always p ossible if one can ensure that as the generalityofakey decreases, the storage required for its compressed key do es not increase. Although this cannot b e guaranteed in general, it do es seem natural for some domains, e.g. sets. 6.4 Bulk Loading In unordered domains, it is not clear how to eciently build an index over a large, pre-existing dataset. Todo so, some ordering must b e used to sort the data, and then the sorted data must b e partitioned into a linked list of no des, so that a tree may b e constructed over the list. A go o d sort order and partitioning results in a relatively full tree with lowoverlap at the keys. Various sort orders have b een develop ed for R-trees (e.g. [KF93], [Jag90], etc.), but these solutions are sp eci c to the spatial domain and thus not generally applicable for an extensible structure like the GiST. Extending this work to new domains should proveinteresting. An extensible BulkLoad metho d may b e added to the GiST to accommo date bulk loading for various domains. 6.5 Optimizer Integration Tointegrate GiSTs with a query optimizer, one must let the optimizer know which query predicates match each GiST. This can be done by registering the predicates supp orted by the Consistent metho d with the optimizer. For trees with IsOrdered = TRUE, one can additionally sp ecify to the optimizer those predicates that can b e evaluated using the FindMin/Next technique. + When planning a query, if the optimizer sees a Bo olean Factor [SAC 79] based on one of the registered predicates, then it knows it should consider probing the GiST as one of its p otential access paths. The question of estimating the cost of probing a GiST is more dicult, and we defer it until Section 7. 6.6 Co ding Details We prop ose implementing the GiST in two ways: the Extensible GiST will be designed for easy exten- sibility, while the Template GiST will be designed for maximal eciency. With a little care, these two implementations can b e built o of the same co de base, without replication of logic. The Extensible GiST package is an ob ject library providing a GiST class, and a GiSTkey class. The latter is a skeleton from which users may inherit the metho d names describ ed in Section 3.3, to provide their own domain-sp eci c implementations. In C++ terminology, these metho ds are virtual member functions, meaning that each time a metho d, e.g. Compress, is invoked on a key ob ject, the appropriate implementation of Compress is lo oked up and invoked. Invoking virtual member functions can be inecient, but this architecture allows for great exibility: new key classes may b e implemented on demand, dynamically linked into a running DBMS, and used to index new data typ es without halting the DBMS. This is analogous to the extensible indexing scheme used in POSTGRES, and is suitable b oth for prototyping, and for systems where high availability is imp ortant. The Template GiST package provides a template GiST class, which is a C++ source co de library. Users develop a key class of their liking, saymyGiSTkey, with the appropriate metho ds. Then they can declare a class GiST index that b ehaves as desired without the ineciencies of invoking virtual member functions. Note that myGiSTkey may implement some key metho ds as inline co de, particularly for trivial op erations such as identity function Compress and Decompress metho ds. This eliminates the pathlength involved in making 19 a function call, thus masking the complexity of unexploited extensibility features. The Template GiST is in the spirit of \extensible database to olkits" suchasEXODUS, and can b e used to build custom-designed, ecient systems. 7 Summary and Future Work The incorp oration of new data typ es into to day's database systems requires indexes that supp ort an ex- tensible set of queries. To facilitate this, we isolated the essential nature of search trees, providing a clean characterization of how they are all alike. Using this insight, we develop ed the Generalized Search Tree, which uni es previously distinct search tree structures. The GiST is extremely extensible, allowing arbitrary data sets to b e indexed and eciently queried in new ways. This exibility op ens the question of when and how one can generate e ective search trees. Since the GiST uni es B+-trees and R-trees into a single structure, it is immediately useful for systems which require the functionality of b oth. In addition, the extensibility of the GiST also op ens up a number of interesting research problems whichweintend to pursue: Indexability: The primary theoretical question raised by the GiST is whether one can nd a general characterization of workloads that are amenable to indexing. The GiST provides a means to index arbitrary domains for arbitrary queries, but as yet we lack an \indexability theory" to describ e whether or not trying to index a given data set is practical for a given set of queries. Indexing Non-Standard Domains: As a practical matter, we are interested in building indices for unusual domains, such as sets, terms, images, sequences, graphs, video and sound clips, ngerprints, molecular structures, etc. Pursuit of such applied results should provide an interesting feedbackloop with the theoretical explorations describ ed ab ove. Our investigation into RD-trees for set data has already b egun: wehave implemented RD-trees in SHORE and Illustra, using R-trees rather than the GiST. Once we shift from R-trees to the GiST, we will also b e able to exp eriment with new PickSplit metho ds and new predicates for sets. Query Optimization and Cost Estimation: Cost estimates for query optimization need to take into account the costs of searching a GiST. Currently such estimates are reasonably accurate for B+-trees, and less so for R-trees. Recently, some work on R-tree cost estimation has b een done [FK94], but more work is required to bring this to b ear on GiSTs in general. As an additional problem, the user-de ned GiST metho ds may b e time-consuming op erations, and their CPU cost should b e registered with the optimizer [HS93]. The optimizer must then correctly incorp orate the CPU cost of the metho ds into its estimate of the cost for probing a particular GiST. Lossy Key Compression Techniques: As new data domains are indexed, it will likely b e necessary to nd new lossy compression techniques that preserve the prop erties of a GiST. Algorithmic Improvements: The GiST algorithms for insertion are based on those of R-trees. As noted in Section 4.2, R*-trees use somewhat mo di ed algorithms, which seem to provide some p erformance gain for spatial data. In particular, the R*-tree p olicy of \forced reinsert" during split may b e generally b ene cial. An investigation of the R*-tree mo di cations needs to b e carried out for non-spatial domains. If the techniques prove b ene cial, they will be incorp orated into the GiST, either as an option or as default b ehavior. Additional work will be required to unify the R*-tree mo di cations with the techniques for concurrency control and recovery. Finally, we b elieve that future domain-sp eci c search tree enhancements should take into account the generality issues raised by GiSTs. There is no go o d reason to develop new, distinct search tree structures if comparable p erformance can b e obtained in a uni ed framework. The GiST provides such a framework, and we plan to implement it as a C++ library package, so that it can b e exploited byavariety of systems. 20 Acknowledgements Thanks to Praveen Seshadri, Marcel Kornacker, Mike Olson, Kurt Brown, Jim Gray, and the anonymous reviewers for their helpful input on this pap er. Many debts of gratitude are due to the sta of Illustra Information Systems | thanks to Mike Stonebraker and Paula Hawthorn for providing a exible industrial researchenvironment, and to Mike Olson, Je Meredith, Kevin Brown, Michael Ub ell, and Wei Hong for their help with technical matters. Thanks also to Shel Finkelstein for his insights on RD-trees. Simon Hellerstein is resp onsible for the acronym GiST. Ira Singer provided a hardware loan which made this pap er p ossible. Finally, thanks to Adene Sacks, who was a crucial resource throughout the course of this work. References [Aok91] P. M. Aoki. Implementation of Extended Indexes in POSTGRES. SIGIR Forum, 25(1):2{9, 1991. [Ben79] J. L. Bentley. Multidimensional Binary SearchTrees in Database Applications. IEEE Transac- tions on Software Engineering, SE-5(4):339{353, July 1979. [BKS94] Douglas Banks, Marcel Kornacker, and Michael Stonebraker. High Concurrency Lo cking in R- Trees. Technical Rep ort Sequoia 2000 94/56, University of California, Berkeley, June 1994. [BKSS90] Norb ert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An Ecient and Robust Access Metho d For Points and Rectangles. In Proc. ACM-SIGMOD International Conference on Management of Data,Atlantic City,May 1990, pages 322{331. [BKSS94] Thomas Brinkho , Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. Multi-Step Pro- cessing of Spatial Joins. In Proc. ACM-SIGMOD International Conference on Management of Data, Minneap olis, May 1994, pages 197{208. + [CDF 94] Michael J. Carey, David J. DeWitt, Michael J. Franklin, Nancy E. Hall, Mark L. McAuli e, Je rey F. Naughton, Daniel T. Schuh, Marvin H. Solomon, C. K. Tan, Odysseas G. Tsatalos, Seth J. White, and Michael J. Zwilling. Shoring Up Persistent Applications. In Proc. ACM- SIGMOD International Conference on Management of Data, Minneap olis, May 1994, pages 383{ 394. + [CDG 90] M.J. Carey, D.J. DeWitt, G. Graefe, D.M. Haight, J.E. Richardson, D.H. Schuh, E.J. Shekita, and S.L. Vandenb erg. The EXODUS Extensible DBMS Pro ject: An Overview. In Stan Zdonik and David Maier, editors, Readings In Object-Oriented Database Systems. Morgan-Kaufmann Publishers, Inc., 1990. [Com79] Douglas Comer. The Ubiquitous B-Tree. Computing Surveys, 11(2):121{137, June 1979. [FB74] R. A. Finkel and J. L. Bentley. Quad-Trees: A Data Structure For Retrieval On Comp osite Keys. ACTA Informatica, 4(1):1{9, 1974. [FK94] Christos Faloutsos and Ibrahim Kamel. Beyond Uniformity and Indep endence: Analysis of R- trees Using the Concept of Fractal Dimension. In Proc. 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 4{13, Minneap olis, May 1994. [Gro94] The POSTGRES Group. POSTGRES Reference Manual, Version 4.2. Technical Rep ort M92/85, Electronics Research Lab oratory, University of California, Berkeley, April 1994. [Gut84] Antonin Guttman. R-Trees: A Dynamic Index Structure For Spatial Searching. In Proc. ACM- SIGMOD International Conference on Management of Data, Boston, June 1984, pages 47{57. 21 [HP94] Joseph M. Hellerstein and Avi Pfe er. The RD-Tree: An Index Structure for Sets. Technical Rep ort #1252, University of Wisconsin at Madison, Octob er 1994. [HS93] Joseph M. Hellerstein and Michael Stonebraker. Predicate Migration: Optimizing Queries With Exp ensive Predicates. In Proc. ACM-SIGMOD International Conference on Management of Data,Washington, D.C., May 1993, pages 267{276. [Jag90] H. V. Jagadish. Linear Clustering of Ob jects With Multiple Attributes. In Proc. ACM-SIGMOD International Conference on Management of Data,Atlantic City,May 1990, pages 332{342. [JS93] T. Johnson and D. Shasha. Inserts and Deletes on B-trees: WhyFree-At-Empty is Better Than Merge-At-Half. Journal of Computer Sciences and Systems, 47(1):45{76, August 1993. [KB95] Marcel Kornacker and Douglas Banks. High-Concurrency Lo cking in R-Trees. Submitted for publication, 1995. [KF93] Ibrahim Kamel and Christos Faloutsos. On Packing R-Trees. Second International Conference on Information and Know ledge Management (CIKM),Novemb er 1993. [KG94] Won Kim and Jorge Garza. Requirements For a Performance Benchmark For Ob ject-Oriented Systems. In Won Kim, editor, Modern Database Systems: The Object Model, Interoperability and Beyond.ACM Press, June 1994. [KKD89] Won Kim, Kyung-Chang Kim, and Alfred Dale. Indexing Techniques for Ob ject-Oriented Databases. In Won Kim and Fred Lo chovsky, editors, Object-Oriented Concepts, Databases, and Applications, pages 371{394. ACM Press and Addison-Wesley Publishing Co., 1989. [Knu73] Donald Ervin Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley Publishing Co., 1973. [LJF94] King-Ip Lin, H. V. Jagadish, and Christos Faloutsos. The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal, 3:517{542, Octob er 1994. [LS90] David B. Lomet and Betty Salzb erg. The hB-Tree: A Multiattribute Indexing Metho d. ACM Transactions on Database Systems, 15(4), Decemb er 1990. [LS92] David Lomet and Betty Salzb erg. Access Metho d Concurrency with Recovery. In Proc. ACM- SIGMOD International Conference on Management of Data, pages 351{360, San Diego, June 1992. [LY81] P. L. Lehman and S. B. Yao. Ecient Lo cking For Concurrent Op erations on B-trees. ACM Transactions on Database Systems, 6(4):650{670, 1981. [MCD94] Maurcio R. Mediano, Marco A. Casanova, and Marcelo Dreux. V-Trees | A Storage Metho d For Long Vector Data. In Proc. 20th International ConferenceonVery Large Data Bases, pages 321{330, Santiago, Septemb er 1994. [NK94] Vincent Ng and Tiko Kameda. The R-Link Tree: A Recoverable Index Structure for Spatial Data. In Proc. Fifth International Conference on Database and Expert Systems Applications (DEXA '94), pages 163{172, Athens, 1994. [PSTW93] Bernd-UwePagel, Hans-Werner Six, HeinrichTob en, and Peter Widmayer. Towards an Analysis of Range Query Performance in Spatial Data Structures. In Proc. 12th ACM SIGACT-SIGMOD- SIGART Symposium on Principles of Database Systems, pages 214{221, Washington, D. C., May 1993. 22 [Qui93] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan-Kaufmann Publishers, Inc., 1993. [Rob81] J. T. Robinson. The k-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. In Proc. ACM-SIGMOD International Conference on Management of Data, pages 10{ 18, Ann Arb or, April/May 1981. + [SAC 79] Patricia G. Selinger, M. Astrahan, D. Chamb erlin, Raymond Lorie, and T. Price. Access Path Selection in a Relational Database Management System. In Proc. ACM-SIGMOD International Conference on Management of Data, Boston, June 1979. [SRF87] Timos Sellis, Nick Roussop oulos, and Christos Faloutsos. The R+-Tree: A Dynamic Index For Multi-Dimensional Ob jects. In Proc. 13th International Conference on Very Large Data Bases, pages 507{518, Brighton, Septemb er 1987. [Sto86] Michael Stonebraker. Inclusion of New Typ es in Relational Database Systems. In Proceedings of the IEEE Fourth International Conference on Data Engineering, pages 262{269, Washington, D.C., February 1986. [Sto93] Michael Stonebraker. The Miro DBMS. In Proc. ACM-SIGMOD International Conference on Management of Data,Washington, D.C., May 1993, page 439. [VV84] PatrickValduriez and Yann Viemont. A Multikey Hashing Scheme Using Predicate Trees. In Proc. ACM-SIGMOD International Conference on Management of Data, Boston, June 1984, pages 107{114. [WE77] Kai C. Wong and Murray Edelb erg. Interval hierarchies and their application to predicate les. ACM Transactions on Database Systems, 2(3):223{232, Septemb er 1977. [WE80] C. K. Wong and M. C. Easton. An Ecient Metho d for Weighted Sampling Without Replace- ment. SIAM Journal on Computing, 9(1):111{113, February 1980. 23 A Query Performance Over Comb Data numranges overlap Query 1 Query 2 Query 3 Query 4 Query 5 Average 20 0 7 2 2 2 2 3 20 2 7 2 2 2 2 3 20 4 7 2 2 2 2 3 20 6 7 2 2 2 2 3 20 8 8 2 2 2 2 3.2 20 10 3254 3252 3252 3252 3252 3252.4 25 0 7 4087 6 2 2 820.8 25 2 7 4067 6 2 2 816.8 25 4 7 4063 6 2 2 816 25 6 7 4086 6 2 2 820.6 25 8 8 4084 7 2 2 820.6 25 10 4096 4095 4095 4095 4095 4095.2 30 0 7 4087 4086 6 2 1637.6 30 2 7 4067 4066 6 2 1629.6 30 4 7 4063 4062 6 2 1628 30 6 8 4086 4086 6 2 1637.6 30 8 8 4084 4081 7 2 1636.4 30 10 4096 4095 4095 4095 4095 4095.2 35 0 7 5734 5734 5734 10 3443.8 35 2 7 5730 5729 5729 6 3440.2 35 4 7 5727 5726 5726 6 3438.4 35 6 8 5753 5753 5753 7 3454.8 35 8 10 5533 5534 5534 7 3323 35 10 5002 5002 5002 5002 5002 5002 40 0 8 5734 5734 5734 5734 4588.8 40 2 8 5730 5729 5729 5729 4585 40 4 8 5727 5726 5726 5726 4582.6 40 6 9 5753 5753 5753 5753 4604.2 40 8 10 5756 5754 5754 5754 4605 40 10 5751 5749 5749 5749 5749 5749.4 24