Generalized Search Trees for Database Systems

Joseph M. Hellerstein, Je rey F. Naughton, Avi Pfe er

fhellers, [email protected], [email protected] erkeley.edu

June, 1995

Abstract

This pap er intro duces the Generalized SearchTree (GiST), an index structure supp orting an extensi-

ble of queries and data typ es. The GiST allows new data typ es to b e indexed in a manner supp orting

queries natural to the typ es; this is in contrast to previous work on extensibility which only sup-

p orted the traditional set of equality and range predicates. In a single data structure, the GiST provides

all the basic search tree logic required by a database system, thereby unifying disparate structures such

as B+-trees and R-trees in a single piece of co de, and op ening the application of search trees to general

extensibility.

To illustrate the exibility of the GiST, we provide simple metho d implementations that allowitto

b ehave like a B+-tree, an R-tree, and an RD-tree, a new index for data with set-valued attributes. We

also present a preliminary p erformance analysis of RD-trees, which leads to discussion on the nature of

tree indices and how they b ehave for various datasets.

1 Intro duction

An ecient implementation of search trees is crucial for any database system. In traditional relational

systems, B+-trees [Com79] were sucient for the sorts of queries p osed on the usual set of alphanumeric

data typ es. Today, database systems are increasingly b eing deployed to supp ort new applications suchas

geographic information systems, multimedia systems, CAD to ols, do cument libraries, sequence databases,

ngerprint identi cation systems, bio chemical databases, etc. To supp ort the growing set of applications,

search trees must b e extended for maximum exibility. This requirement has motivated two ma jor research

approaches in extending search tree technology:

1. SpecializedSearch Trees: A large variety of search trees has b een develop ed to solve sp eci c problems.

Among the b est known of these trees are spatial search trees such as R-trees [Gut84]. While some of

this work has had signi cant impact in particular domains, the approach of developing domain-sp eci c

search trees is problematic. The e ort required to implement and maintain such data structures is

high. As new applications need to b e supp orted, new tree structures have to b e develop ed from scratch,

requiring new implementations of the usual tree facilities for search, maintenance, concurrency control

and recovery.

2. Search Trees For Extensible Data Types: As an alternative to developing new data structures, existing

data structures such as B+-trees and R-trees can be made extensible in the data typ es they sup-

p ort [Sto86]. For example, B+-trees can b e used to index any data with a linear ordering, supp orting

equality or linear range queries over that data. While this provides extensibility in the data that can

b e indexed, it do es not extend the set of queries which can b e supp orted by the tree. Regardless of the

typ e of data stored in a B+-tree, the only queries that can b ene t from the tree are those containing

equality and linear range predicates. Similarly in an R-tree, the only queries that can use the tree are

those containing equality, overlap and containment predicates. This in exibility presents signi cant 1

problems for new applications, since traditional queries on linear orderings and spatial lo cation are

unlikely to b e p ertinent for new data typ es.

In this pap er we present a third direction for extending search tree technology. Weintro duce a new data

structure called the Generalized SearchTree (GiST), which is easily extensible b oth in the data typ es it can

index and in the queries it can supp ort. Extensibility of queries is particularly imp ortant, since it allows

new data typ es to b e indexed in a manner that supp orts the queries natural to the typ es. In addition to

providing extensibility for new data typ es, the GiST uni es previously disparate structures used for currently

common data typ es. For example, b oth B+-trees and R-trees can b e implemented as extensions of the GiST,

resulting in a single co de base for indexing multiple dissimilar applications.

The GiST is easy to con gure: adapting the tree for di erent uses only requires registering six methods

with the database system, which encapsulate the structure and b ehavior of the ob ject class used for keys in

the tree. As an illustration of this exibility,we provide metho d implementations that allow the GiST to b e

used as a B+-tree, an R-tree, and an RD-tree, a new index for data with set-valued attributes. The GiST

can b e adapted to work likeavariety of other known search tree structures, e.g. partial sum trees [WE80],

+

k-D-B-trees [Rob81], Ch-trees [KKD89], Exo dus large ob jects [CDG 90], hB-trees [LS90], V-trees [MCD94],

TV-trees [LJF94], etc. Implementing a new set of metho ds for the GiST is a signi cantly easier task than

+

implementing a new tree package from scratch: for example, the POSTGRES [Gro94] and SHORE [CDF 94]

implementations of R-trees and B+-trees are on the order of 3000 lines of C or C++ co de each, while our

metho d implementations for the GiST are on the order of 500 lines of C co de each.

In addition to providing an uni ed, highly extensible data structure, our general treatment of search trees

sheds some initial light on a more fundamental question: if any dataset can b e indexed with a GiST, do es

the resulting tree always provide ecient lo okup? The answer to this question is \no", and in our discussion

we illustrate some issues that can a ect the eciency of a search tree. This leads to the interesting question

of how and when one can build an ecient search tree for queries over non-standard domains | a question

that can now b e further explored by exp erimenting with the GiST.

1.1 Structure of the Pap er

In Section 2, we illustrate and generalize the basic nature of database search trees. Section 3 intro duces the

Generalized SearchTree ob ject, with its structure, prop erties, and b ehavior. In Section 4 we provide GiST

implementations of three di erent sorts of search trees. Section 5 presents some p erformance results that

explore the issues involved in building an e ective search tree. Section 6 examines some details that need to

b e considered when implementing GiSTs in a full- edged DBMS. Section 7 concludes with a discussion of

the signi cance of the work, and directions for further research.

1.2 Related Work

A go o d survey of search trees is provided byKnuth [Knu73], though B-trees and their variants are covered in

more detail by Comer [Com79]. There are a varietyofmultidimensional search trees, such as R-trees [Gut84]

and their variants: R*-trees [BKSS90] and R+-trees [SRF87]. Other multidimensional search trees include

quad-trees [FB74], k-D-B-trees [Rob81], and hB-trees [LS90]. Multidimensional data can also b e transformed

into unidimensional data using a space- lling curve [Jag90]; after transformation, a B+-tree can b e used to

index the resulting unidimensional data.

Extensible-key indices were intro duced in POSTGRES [Sto86, Aok91], and are included in Illustra [Sto93],

b oth of whichhave distinct extensible B+-tree and R-tree implementations. These extensible indices allow

manytyp es of data to b e indexed, but only supp ort a xed set of query predicates. For example, POSTGRES

B+-trees supp ort the usual ordering predicates (<; ; =; ;>), while POSTGRES R-trees supp ort only the

predicates Left, Right, OverLeft, Overlap, OverRight, Right, Contains, Contained and Equal [Gro94].

Extensible R-trees actually provide a sizable subset of the GiST's functionality. To our knowledge this

pap er represents the rst demonstration that R-trees can index data that has not b een mapp ed into a spatial

domain. However, b esides their limited extensibility R-trees lacka numb er of other features supp orted by 2 key1 key2 ...

Internal Nodes (directory)

Leaf Nodes (linked list)

Figure 1: Sketch of a database search tree.

the GiST. R-trees provide only one sort of key predicate (Contains), they do not allow user sp eci cation of

the PickSplit and Penalty algorithms describ ed b elow, and they lack optimizations for data from linearly

ordered domains. Despite these limitations, extensible R-trees are close enough to GiSTs to allow for the

initial metho d implementations and p erformance exp eriments we describ e in Section 5.

Classi cation trees such as ID3 and C4.5 [Qui93] are similar in spirit to the GiST, but have some ma jor

di erences. Most signi cantly, classi cation trees are not intended as search structures, and are not suited

for indexing large amounts of data. Additionally, ID3 and C4.5 are built over pre-de ned, discrete classes,

whereas the GiST is a dynamic structure that develops its own, unsp eci ed data buckets (leaves), with

p otentially overlapping keys. Finally, classi cation trees are not generally extensible: they are de ned

over traditional tuple-like ob jects with attributes from numeric or nitely enumerated domains. However,

extensions of the GiST may b e able to leverage some of the lessons learned in the development of classi cation

trees, particularly in implementing metho ds for picking a way to split no des (the PickSplit metho d describ ed

b elow) and for cho osing go o d keys (the Union and Compress metho ds describ ed b elow.)

The Predicate Trees of Valduriez and Viemont [VV84] are similar to classi cation trees, in that they are

not used to index data sets, and can b e de ned only on traditional alphanumeric predicates. Predicate Trees

are used to generate hash keys. Aeven more limited notion is that of the Interval Hierarchy [WE77], which

can b e used to index a le of simple conjunctive ordering predicates.

Analyses of R-tree p erformance have app eared in [FK94] and [PSTW93]. This work is dep endentonthe

spatial nature of typical R-tree data, and thus is not generally applicable to the GiST. However, similar ideas

may prove relevant to our questions of when and how one can build ecient indices in arbitrary domains.

2 Getting The Gist of Database Search Trees

As an intro duction to GiSTs, it is instructive to review search trees in a simpli ed manner. Most p eople with

database exp erience haveanintuitive notion of how search trees work, so our discussion here is purp osely

vague: the goal is simply to illustrate that this notion leaves many details unsp eci ed. After highlighting the

unsp eci ed details, we can pro ceed to describ e a structure that leaves the details op en for user sp eci cation.

The canonical rough picture of a database search tree app ears in Figure 1. It is a balanced tree, with

high fanout. The internal no des are used as a directory. The leaf no des contain p ointers to the actual data,

and are stored as a linked list to allow for partial or complete scanning.

Within each internal no de is a series of keys and p ointers. To search for tuples which match a query 3

predicate q , one starts at the ro ot no de. For each p ointer on the no de, if the asso ciated key is consistent

with q , i.e. the key do es not rule out the p ossibility that data stored b elow the p ointer may match q , then

one traverses the subtree b elow the p ointer, until all the matching data is found. As an illustration, we

review the notion of consistency in some familiar tree structures. In B+-trees, queries are in the form of

range predicates (e.g. \ nd all i such that c  i  c "), and keys logically delineate a range in which the

1 2

data b elow a p ointer is contained. If the query range and a p ointer's key range overlap, then the two are

consistent and the p ointer is traversed. In R-trees, queries are in the form of region predicates (e.g. \ nd

all i such that (x ;y ;x ;y )overlaps i"), and keys delineate the b ounding b ox in which the data b elowa

1 1 2 2

p ointer is contained. If the query region and the p ointer's key b oxoverlap, the p ointer is traversed.

Note that in the ab ove description, the only restriction placed on a key is that it must logically match

each datum stored b elow it, so that the consistency check do es not miss anyvalid data. In B+-trees and R-

trees, keys are essentially \containment" predicates: they describ e a contiguous region in which all the data

b elowapointer are contained. Containment predicates are not the only p ossible key constructs, however.

For example, the predicate \elected ocial(i) ^ has criminal record(i)" is an acceptable key if every data item

i stored b elow the asso ciated p ointer satis es the predicate. As in R-trees, keysonanodemay\overlap",

i.e. twokeys on the same no de may hold simultaneously for some tuple.

This exibility allows us to generalize the notion of a search key: a search key may be any arbitrary

predicate that holds for each datum below the key. Given a data structure with such exible searchkeys, a

user is free to form a tree by organizing data into arbitrary nested sub-categories, lab elling each with some

characteristic predicate. This in turn lets us capture the essential nature of a database search tree: it is a

hierarchy of categorizations, in which each categorization holds for al l data stored under it in the hierarchy.

Searches on arbitrary predicates may b e conducted based on the categorization. In order to supp ort searches

on a predicate q , the user must provide a Bo olean metho d to tell if q is consistent with a given searchkey.

When this is so, the search pro ceeds by traversing the p ointer asso ciated with the searchkey. The grouping of

data into categories maybecontrolled by a user-supplied no de splitting algorithm, and the characterization

of categories can b e done with user-supplied searchkeys. Thus by exp osing the key metho ds and the tree's

split metho d to the user, arbitrary search trees may b e constructed, supp orting an extensible set of queries.

These ideas form the basis of the GiST, whichwe pro ceed to describ e in detail.

3 The Generalized Search Tree

In this section we present the abstract data typ e (or \ob ject") Generalized Search Tree (GiST). We de ne

its structure, its invariant prop erties, its extensible metho ds and its built-in algorithms. As a matter of

convention, we refer to each indexed datum as a \tuple"; in an Ob ject-Oriented or Ob ject-Relational DBMS,

each indexed datum could b e an arbitrary data ob ject.

3.1 Structure

2 1

A GiST is a balanced tree of variable fanout b etween kM and M ,  k  , with the exception of the ro ot

M 2

no de, whichmayhave fanout b etween 2 and M . The constant k is termed the minimum l l factor of the tree.

Leaf no des contain (p; ptr) pairs, where p is a predicate that is used as a searchkey, and ptr is the identi er

of some tuple in the database. Non-leaf no des contain (p; ptr) pairs, where p is a predicate used as a search

key and ptr is a p ointer to another tree no de. Predicates can contain anynumb er of free variables, as long

as any single tuple referenced by the leaves of the tree can instantiate all the variables. Note that by using

\key compression", a given predicate p may take as little as zero bytes of storage. However, for purp oses of

exp osition we will assume that entries in the tree are all of uniform size. Discussion of variable-sized entries

is deferred to Section 6.3. We assume in an implementation that given an entry E =(p; ptr), one can access

the no de on which E currently resides. This can prove helpful in implementing the key metho ds describ ed

b elow. 4

3.2 Prop erties

The following prop erties are invariant in a GiST:

1. Every no de contains b etween kM and M index entries unless it is the ro ot.

2. For each index entry (p; ptr) in a leaf no de, p is true when instantiated with the values from the

indicated tuple (i.e. p holds for the tuple.)

3. For each index entry (p; ptr) in a non-leaf no de, p is true when instantiated with the values of any

0 0

tuple reachable from ptr. Note that, unlike in R-trees, for some entry (p ; ptr ) reachable from ptr,

0 0 0

we do not require that p ! p, merely that p and p b oth hold for all tuples reachable from ptr .

4. The ro ot has at least twochildren unless it is a leaf.

5. All leaves app ear on the same level.

0

Prop erty3 is of particular interest. An R-tree would require that p ! p, since b ounding b oxes of an

R-tree are arranged in a containment hierarchy. The R-tree approach is unnecessarily restrictive, however:

the predicates in keys ab ove a no de N must hold for data b elow N , and therefore one need not havekeys

on N restate those predicates in a more re ned manner. One mightcho ose, instead, to have the keys at N

characterize the sets b elow based on some entirely orthogonal classi cation. This can b e an advantage in

b oth the information content and the size of keys.

3.3 Key Metho ds

In principle, the keys of a GiST may be arbitrary predicates. In practice, the keys come from a user-

implemented ob ject class, which provides a particular set of metho ds required by the GiST. Examples of

n

key structures include ranges of integers for data from Z (as in B+-trees), b ounding b oxes for regions in R

(as in R-trees), and b ounding sets for set-valued data, e.g. data from P (Z) (as in RD-trees, describ ed in

Section 4.3.) The key class is op en to rede nition by the user, with the following set of six metho ds required

by the GiST:

 Consistent(E ,q ): given an entry E =(p; ptr), and a query predicate q , returns false if p ^ q can b e

guaranteed unsatis able, and true otherwise. Note that an accurate test for satis ability is not required

here: Consistentmay return true incorrectly without a ecting the correctness of the tree algorithms.

The p enalty for such errors is in p erformance, since they may result in exploration of irrelevant subtrees

during search.

 Union(P ): given a set P of entries (p ; ptr );:::(p ;ptr ), returns some predicate r that holds for

1 n

1 n

all tuples stored b elow ptr through ptr . This can be done by nding a predicate r such that

1 n

(p _ ::: _ p ) ! r.

1 n

 Compress(E ): given an entry E =(p; ptr) returns an entry (; ptr) where  is a compressed repre-

sentation of p.

 Decompress(E ): given a compressed representation E =(; ptr), where  = Compress(p), returns

an entry (r; ptr) such that p ! r . Note that this is a p otentially \lossy" compression, since wedonot

require that p $ r .

 Penalty(E ;E ): given twoentries E =(p ;ptr );E =(p ;ptr ), returns a domain-sp eci c p enalty

1 2 1 1 2 2

1 2

for inserting E into the subtree ro oted at E . This is used to aid the Split and Insert algorithms

2 1

(describ ed b elow.) Typically the p enalty metric is some representation of the increase of size from

2

E :p to Union(E ;E ). For example, Penalty for keys from R can b e de ned as area(Union (E ;E ))

1 1 1 2 1 2

area(E :p ) [Gut84].

1 1 5

 PickSplit(P ): given a set P of M +1 entries (p; ptr), splits P into two sets of entries P ;P , eachof

1 2

size at least kM . The choice of the minimum ll factor for a tree is controlled here. Typically,itis

desirable to split in suchaway as to minimize some badness metric akin to a multi-wayPenalty, but

this is left op en for the user.

The ab ove are the only metho ds a GiST user needs to supply. Note that Consistent, Compress, Union

and Penaltyhave to b e able to handle any predicate in their input. In full generality this could b ecome very

dicult, esp ecially for Consistent. But typically a limited set of predicates is used in any one tree, and this

set can b e constrained in the metho d implementation.

There are a number of options for key compression. A simple implementation can let b oth Compress

and Decompress be the identity function. A more complex implementation can have Compress((p; ptr))

generate a valid but more compact predicate r , p ! r , and let Decompress b e the identity function. This

is the technique used in SHORE's R-trees, for example, which up on insertion take a p olygon and compress

it to its b ounding box, which is itself a valid p olygon. It is also used in pre x B+-trees [Com79], which

truncate split keys to an initial substring. More involved implementations might use complex metho ds for

b oth Compress and Decompress.

3.4 Tree Metho ds

The key metho ds in the previous section must b e provided by the designer of the key class. The tree metho ds

in this section are provided by the GiST, and may invoke the required key metho ds. Note that keys are

Compressed when placed on a no de, and Decompressed when read from a no de. We consider this implicit,

and will not mention it further in describing the metho ds.

3.4.1 Search

Search comes in two avors. The rst metho d, presented in this section, can b e used to searchany dataset

with any query predicate, by traversing as much of the tree as necessary to satisfy the query. It is the most

general search technique, analogous to that of R-trees. A more ecient technique for queries over linear

orders is describ ed in the next section.

Algorithm Search(R; q )

Input: GiST ro oted at R , predicate q

Output: all tuples that satisfy q

Sketch: Recursively descend all paths in tree whose keys are consistent with q .

S1: [Search subtrees] If R is not a leaf, check each entry E on R to determine whether

Consistent(E; q). For all entries that are Consistent, invoke Search on the subtree

whose ro ot no de is referenced by E:ptr.

S2: [Search leaf no de] If R is a leaf, check each entry E on R to determine whether

Consistent(E; q). If E is Consistent, it is a qualifying entry. At this p oint E:ptr could

b e fetched to check q accurately, or this check could b e left to the calling pro cess.

Note that the query predicate q can b e either an exact match (equality) predicate, or a predicate satis able

by many values. The latter category includes \range" or \window" predicates, as in B+ or R-trees, and

also more general predicates that are not based on contiguous areas (e.g. set-containment predicates like \all

sup ersets of f6, 7, 68g".) 6

3.4.2 Search In Linearly Ordered Domains

If the domain to b e indexed has a linear ordering, and queries are typically equality or range-containment

predicates, then a more ecient search metho d is p ossible using the FindMin and Next metho ds de ned in

this section. To make this option available, the user must take some extra steps when creating the tree:

1. The ag IsOrdered must be set to true. IsOrdered is a static prop erty of the tree that is set at

creation. It defaults to false.

2. An additional metho d Compare(E ;E )must b e registered. Given twoentries E =(p ;ptr ) and

1 2 1 1

1

E =(p ;ptr ), Compare returns a negativenumber if p precedes p , returns a p ositivenumber if p

2 2 1 2 1

2

follows p and returns 0 otherwise. Compare is used to insert entries in order on each no de.

2

3. The PickSplit metho d must ensure that for anyentries E on P and E on P , Compare(E ;E ) < 0.

1 1 2 2 1 2

4. The metho ds must assure that no twokeys on a no de overlap, i.e. for any pair of entries E ;E on a

1 2

no de, Consistent(E ;E :p) = false.

1 2

If these four steps are carried out, then equality and range-containment queries may be evaluated by

calling FindMin and rep eatedly calling Next, while other query predicates may still b e evaluated with the

general Search metho d. FindMin/Next is more ecient than traversing the tree using Search, since FindMin

and Next only visit the non-leaf no des along one ro ot-to-leaf path. This technique is based on the typical

range-lo okup in B+-trees.

Algorithm FindMin(R; q )

Input: GiST ro oted at R , predicate q

Output: minimum tuple in linear order that satis es q

Sketch: descend leftmost branch of tree whose keys are Consistent with q . When a leaf no de

is reached, return the rst key that is Consistent with q .

FM1: [Search subtrees] If R is not a leaf, nd the rst entry E in order such that

1

Consistent(E; q) . If suchanE can b e found, invoke FindMin on the subtree whose

ro ot no de is referenced by E:ptr. If no suchentry is found, return NULL.

FM2: [Search leaf no de] If R is a leaf, nd the rst entry E on R such that Consistent(E; q),

and return E . If no suchentry exists, return NULL.

Given one element E that satis es a predicate q , the Next metho d returns the next existing element that

satis es q , or NULL if there is none. Next is made suciently general to nd the next entry on non-leaf levels

of the tree, which will prove useful in Section 4. For search purp oses, however, Next will only b e invoked on

leaf entries.

1

The appropriate entry may b e found by doing a binary search of the entries on the no de. Further discussion of intra-no de

search optimizations app ears in Section 6.1. 7

Algorithm Next(R; q ; E )

Input: GiST ro oted at R , predicate q , currententry E

Output: next entry in linear order that satis es q

Sketch: return next entry on the same level of the tree if it satis es q . Else return NULL.

N1: [next on no de] If E is not the rightmost entry on its no de, and N is the next entry

to the rightof E in order, and Consistent(N; q), then return N . If :Consistent(N; q),

return NULL.

N2: [next on neighb oring no de] If E is the righmost entry on its no de, let P b e the next

no de to the rightof R on the same level of the tree (this can b e found via tree traversal,

or via sideways p ointers in the tree, when available [LY81].) If P is non-existent, return

NULL. Otherwise, let N b e the leftmost entry on P . If Consistent(N; q), then return

N , else return NULL.

3.4.3 Insert

The insertion routines guarantee that the GiST remains balanced. They are very similar to the insertion

routines of R-trees, which generalize the simpler insertion routines for B+-trees. Insertion allows sp eci cation

of the level at which to insert. This allows subsequent metho ds to use Insert for reinserting entries from

internal no des of the tree. We will assume that level numb ers increase as one ascends the tree, with leaf

no des b eing at level 0. Thus new entries to the tree are inserted at level l =0.

Algorithm Insert(R; E ; l )

Input: GiST ro oted at R ,entry E =(p; ptr), and level l , where p is a predicate such that p

holds for all tuples reachable from ptr.

Output: new GiST resulting from insert of E at level l .

Sketch: nd where E should go, and add it there, splitting if necessary to make ro om.

I1. [invoke Cho oseSubtree to nd where E should go] Let L = Cho oseSubtree(R; E ; l )

I2. If there is ro om for E on L, install E on L (in order according to Compare, if IsOr-

dered.) Otherwise invoke Split(R ; L; E ).

I3. [propagate changes upward] AdjustKeys(R; L).

Cho oseSubtree can b e used to nd the b est no de for insertion at any level of the tree. When the IsOrdered

prop erty holds, the Penalty metho d must b e carefully written to assure that Cho oseSubtree arrives at the

correct leaf no de in order. An example of how this can b e done is given in Section 4.1. 8

Algorithm Cho oseSubtree(R; E ; l )

Input: subtree ro oted at R ,entry E =(p; ptr), level l

Output: no de at level l b est suited to hold entry with characteristic predicate E:p

Sketch: Recursively descend tree minimizing Penalty

CS1. If R is at level l , return R ;

0

CS2. Else among all entries F = (q; ptr ) on R nd the one such that Penalty(F; E ) is

0

minimal. Return Cho oseSubtree(F:ptr ;E;l).

The Split algorithm makes use of the user-de ned PickSplit metho d to cho ose how to split up the elements

of a no de, including the new tuple to be inserted into the tree. Once the elements are split up into two

groups, Split generates a new no de for one of the groups, inserts it into the tree, and up dates keys ab ove the

new no de.

Algorithm Split(R; N ; E )

Input: GiST R with no de N , and a new entry E =(p; ptr).

Output: the GiST with N split in two and E inserted.

Sketch: split keys of N along with E into two groups according to PickSplit. Put one group

onto a new no de, and Insert the new no de into the parentof N.

SP1: Invoke PickSplit on the union of the elements of N and fE g, put one of the two

0

partitions on no de N , and put the remaining partition on a new no de N .

0 0

0

SP2: [Insert entry for N in parent] Let E =(q; ptr ), where q is the Union of all entries

N

0 0 0

0 0

on N , and ptr is a p ointer to N . If there is ro om for E on Parent(N ), install E

N N

2

0

on Parent(N ) (in order if IsOrdered.) Otherwise invoke Split(R; Parent(N );E ) .

N

SP3: Mo dify the entry F which p oints to N , so that F:p is the Union of all entries on N .

Step SP3 of Split mo di es the parent no de to re ect the changes in N . These changes are propagated

upwards through the rest of the tree by step I3 of the Insert algorithm, which also propagates the changes

0

due to the insertion of N .

The AdjustKeys algorithm ensures that keys ab ove a set of predicates hold for the tuples b elow, and are

appropriately sp eci c.

2

Weintentionally do not sp ecify what technique is used to nd the Parent of a no de, since this implementation interacts

with issues related to concurrency control, which are discussed in Section 6. Dep ending on techniques used, the Parentmaybe

found via a p ointer, a stack, or via re-traversal of the tree. 9

Algorithm AdjustKeys(R; N )

Input: GiST ro oted at R , tree no de N

Output: the GiST with ancestors of N containing correct and sp eci c keys

Sketch: ascend parents from N in the tree, making the predicates b e accurate characterizations

of the subtrees. Stop after ro ot, or when a predicate is found that is already accurate.

PR1: If N is the ro ot, or the entry which p oints to N has an already-accurate representation

of the Union of the entries on N , then return.

PR2: Otherwise, mo dify the entry E which p oints to N so that E:p is the Union of all

entries on N . Then AdjustKeys(R ,Parent(N ).)

Note that AdjustKeys typically p erforms no work when IsOrdered = true, since for such domains predi-

cates on eachnodetypically partition the entire domain into ranges, and thus need no mo di cation on simple

insertion or deletion. The AdjustKeys routine detects this in step PR1, whichavoids calling AdjustKeys on

higher no des of the tree. For such domains, AdjustKeys may b e circumvented entirely if desired.

3.4.4 Delete

The deletion algorithms must maintain the balance of the tree, and attempt to keep keys as sp eci c as

p ossible. The CondenseTree algorithm uses the B+-tree \b orrow or coalesce" technique for under ow if

there is a linear order. Otherwise it uses the R-tree reinsertion technique.

Algorithm Delete(R; E )

Input: GiST R , leaf entry E =(p; ptr)

Output: balanced GiST with E removed

Sketch: Remove E from its leaf no de. If this causes under ow, adjust tree accordingly. Up date

predicates in ancestors to keep them as sp eci c as p ossible.

D1: [Find no de containing entry] Invoke Search(R; E :p) and nd leaf no de L containing

E . Stop if E not found.

D2: [Delete entry.] Remove E from L.

D3: [Propagate changes.] Invoke CondenseTree(R; L);

D4: [Shorten tree.] If the ro ot no de has only one child after the tree has b een adjusted,

make the child the new ro ot. 10

Algorithm CondenseTree(R; L)

Input: GiST R containing leaf no de L

Output: GiST with invariant prop erties maintained

Sketch: If L has fewer than kM elements, either eliminate L and relo cate its entries, or b orrow

entries from elsewhere to put on L. Propagate no de elimination upwards as neces-

sary. Adjust all predicates on the path to the ro ot, making them more sp eci c as

appropriate.

CT1: [Initialize.] Set N = L. Set Q, the set of eliminated no des, to b e empty.

CT2: If N is the ro ot, go to CT6. Otherwise let P =Parent(N ), and let E b e the entry

N

on P that p oints to N .

CT3: [Handle under-full no de] If N has fewer than kM entries:

CT3.1 [If not IsOrdered, delete N ]If not IsOrdered, add the elements of N to set Q,

delete E from P , and invoke AdjustKeys(R; P ).

N

CT3.2 [If IsOrdered, try to b orrow entries, else coalesce with a neighb or] Otherwise

0 0

let N be the neighb oring no de in order. If the number of keys on N and N

0

combined is 2kM or greater, then evenly split the entries on N and N between

0

the two no des. Otherwise, place the entries from N onto N , and delete E from

N

0

P . Invoke AdjustKeys(R; N ), and AdjustKeys(R; P ).

CT4: [Adjust covering predicate] If E was not deleted from P , then AdjustKeys(R; N );

N

CT5: [Propagate deletes] If E was deleted from P , let N = P , and goto CT2;

N

CT6: [Re-insert orphaned entries] If Q is not empty, invoke Insert(R; E ; level(E )) for all

elements E of Q.

In some implementations, it is considered preferable to leave a no de under-full after a delete, in the

exp ectation that it will ll up so on thereafter [JS93]. To supp ort such b ehavior, step D3 of algorithm Delete

could invoke AdjustKeys(R; L), and avoid CondenseTree.

4 The GiST for Three Applications

In this section we brie y describ e implementations of key classes used to make the GiST b ehave like a

B+-tree, an R-tree, and an RD-tree, a new R-tree-like index over set-valued data.

4.1 GiSTs Over Z (B+-trees)

In this example we index integer data. Before compression, each key in this tree is a pair of integers,

representing the interval contained b elow the key. Particularly, a key represents the predicate

Contains([a; b);v) with variable v . The query predicates we supp ort in this key class are Contains(interval,

v ), and Equal(numb er, v ). The interval in the Contains query maybe closed or op en at either end. The

b oundary of any interval of integers can be trivially converted to be closed or op en. So without loss of

generality,we assume b elow that all intervals are closed on the left and op en on the right.

The implementations of the Contains and Equal query predicates are as follows:

 Contains([x; y );v) If x  v

 Equal(x; v ) If x = v return true. Otherwise return false.

Now, the implementations of the GiST metho ds:

 Consistent(E; q) Given entry E = (p; ptr) and query predicate q , we know that

p = Contains([x ;y );v), and either q = Contains([x ;y );v) or q = Equal(x ;v). In the rst case,

p p q q q

return true if (x x ) and false otherwise. In the second case, return true if x  x

p q p q p q p

and false otherwise.

 Union(E =([x ;y );ptr );:::;E =([x ;y );ptr )) Return [min (x ;:::;x );max(y ;:::;y )).

1 1 1 n n n 1 n 1 n

1 n

 Compress(E = ([x; y ); ptr)) If E is the leftmost key on a non-leaf no de, return a 0-byte ob ject.

Otherwise return x.

 Decompress(E =(; ptr)) Wemust construct an interval [x; y ). If E is the leftmost key on a non-leaf

no de, let x = 1. Otherwise let x =  . If E is the rightmost key on a non-leaf no de, let y = 1. If

E is any other key on a non-leaf no de, let y b e the value stored in the next key (as found by the Next

metho d.) If E is on a leaf no de, let y = x +1. Return ([x; y ); ptr).

 Penalty(E = ([x ;y );ptr );F = ([x ;y );ptr )) If E is the leftmost p ointer on its no de, return

1 1 2 2

1 2

max(y y ; 0). If E is the rightmost p ointer on its no de, return max(x x ; 0). Otherwise return

2 1 1 2

max(y y ; 0) + max(x x ; 0).

2 1 1 2

jP j jP j

 PickSplit(P ) Let the rst b c entries in order go in the left group, and the last d e entries go in

2 2

M

the right. Note that this guarantees a minimum ll factor of .

2

Finally, the additions for ordered keys:

 IsOrdered = true

 Compare(E =(p ;ptr );E =(p ;ptr )) Given p =[x ;y ) and p =[x ;y ), return x x .

1 1 2 2 1 1 1 2 2 2 1 2

1 2

There are a number of interesting features to note in this set of metho ds. First, the Compress and

Decompress metho ds pro duce the typical \split keys" found in B+-trees, i.e. n 1 stored keys for n p ointers,

with the leftmost and rightmost b oundaries on a no de left unsp eci ed (i.e. 1 and 1). Even though

GiSTs use key/p ointer pairs rather than split keys, this GiST uses no more space for keys than a traditional

B+-tree, since it compresses the rst p ointer on each no de to zero bytes. Second, the Penalty metho d

allows the GiST to cho ose the correct insertion p oint. Inserting (i.e. Unioning) a new key value k into a

interval [x; y ) will cause the Penalty to b e p ositive only if k is not already contained in the interval. Thus

in step CS2, the Cho oseSubtree metho d will place new data in the appropriate sp ot: any set of keys on a

no de partitions the entire domain, so in order to minimize the Penalty, Cho oseSubtree will cho ose the one

partition in which k is already contained. Finally, observe that one could fairly easily supp ort more complex

predicates, including disjunctions of intervals in query predicates, or ranked intervals in key predicates for

supp orting ecient sampling [WE80].

2

4.2 GiSTs Over Polygons in R (R-trees)

In this example, our data are 2-dimensional p olygons on the Cartesian plane. Before compression, the keys

in this tree are 4-tuples of reals, representing the upp er-left and lower-right corners of rectilinear b ounding

rectangles for 2d-p olygons. Akey (x ;y ;x ;y ) represents the predicate Contains((x ;y ;x ;y );v),

ul ul lr lr ul ul lr lr

where (x ;y ) is the upp er left corner of the b ounding b ox, (x ;y ) is the lower right corner, and v is the

ul ul lr lr

free variable. The query predicates we supp ort in this key class are Contains(b ox, v ), Overlap(b ox, v ), and

Equal(b ox, v ), where b ox is a 4-tuple as ab ove.

The implementations of the query predicates are as follows: 12

1 1 1 1 2 2 2 2

 Contains((x ;y ;x ;y ); (x ;y ;x ;y )) Return true if

ul ul lr lr ul ul lr lr

1 2 1 2 1 2 1 2

(x  x ) ^ (x  x ) ^ (y  y ) ^ (y  y ):

lr lr ul ul lr lr ul ul

Otherwise return false.

1 1 1 1 2 2 2 2

 Overlap((x ;y ;x ;y ); (x ;y ;x ;y )) Return true if

ul ul lr lr ul ul lr lr

1 2 2 1 1 2 2 1

(x  x ) ^ (x  x ) ^ (y  y ) ^ (y  y ):

ul lr ul lr lr ul lr ul

Otherwise return false.

1 1 1 1 2 2 2 2

 Equal((x ;y ;x ;y ); (x ;y ;x ;y )) Return true if

ul ul lr lr ul ul lr lr

1 2 1 2 1 2 1 2

(x = x ) ^ (y = y ) ^ (x = x ) ^ (y = y ):

ul ul ul ul lr lr lr lr

Otherwise return false.

Now, the GiST metho d implementations:

1 1 1 1

 Consistent((E; q) Given entry E = (p; ptr), we know that p = Contains((x ;y ;x ;y );v), and

ul ul lr lr

2 2 2 2

q is either Contains, Overlap or Equal on the argument(x ;y ;x ;y ). For any of these queries,

ul ul lr lr

1 1 1 1 2 2 2 2

return true if Overlap((x ;y ;x ;y ); (x ;y ;x ;y )), and return false otherwise.

ul ul lr lr ul ul lr lr

1 1 1 1 n n n n

 Union(E =((x ;y ;x ;y ); ptr );:::;E =(x ;y ;x ;y ))

q n

1

ul ul lr lr ul ul lr lr

1 n 1 n 1 n 1 n

Return (min (x ;:::;x ), max(y ;:::;y ), max(x ;:::;x ), min(y ;:::;y )).

ul ul ul ul lr lr lr lr

 Compress(E =(p; ptr)) Form the b ounding b ox of p olygon p, i.e., given a p olygon stored as a set of

i i i i i i i i

line segments l =(x ;y ;x ;y ), form  =(8 min(x ), 8 max(y ), 8 max(x ), 8 min(y )). Return

i i i i i

1 1 2 2

ul ul lr lr

(; ptr).

 Decompress(E =((x ;y ;x ;y ); ptr)) The identity function, i.e., return E .

ul ul lr lr

 Penalty(E ;E ) Given E =(p ;ptr ) and E =(p ;ptr ), compute q = Union(E ;E ), and return

1 2 1 1 2 2 1 2

1 2

area(q ) area(E :p). This metric of \change in area" is the one prop osed by Guttman [Gut84].

1

 PickSplit(P ) A variety of algorithms have b een prop osed for R-tree splitting. We thus omit this

metho d implementation from our discussion here, and refer the interested reader to [Gut84] and

[BKSS90].

The ab ove implementations, along with the GiST algorithms describ ed in the previous chapters, give

b ehavior identical to that of Guttman's R-tree. A series of variations on R-trees have b een prop osed, notably

the R*-tree [BKSS90] and the R+-tree [SRF87]. The R*-tree di ers from the basic R-tree in three ways: in

its PickSplit algorithm, which hasavariety of small changes, in its Cho oseSubtree algorithm, whichvaries

only slightly, and in its p olicy of reinserting a numberofkeys during no de split. It would not b e dicult to

implement the R*-tree in the GiST: the R*-tree PickSplit algorithm can b e implemented as the PickSplit

metho d of the GiST, the mo di cations to Cho oseSubtree could b e intro duced with a careful implementation

of the Penalty metho d, and the reinsertion p olicy of the R*-tree could easily b e added into the built-in GiST

tree metho ds (see Section 7.) R+-trees, on the other hand, cannot b e mimicked by the GiST. This is b ecause

the R+-tree places duplicate copies of data entries in multiple leaf no des, thus violating the GiST principle

of a search tree b eing a hierarchyof partitions of the data.

Again, observe that one could fairly easily supp ort more complex predicates, including n-dimensional

analogs of the disjunctive queries and ranked keys mentioned for B+-trees. Other examples include arbitrary

variations of the usual overlap or ordering queries, e.g. \ nd all p olygons that overlap more than 30% of this

box", or \ nd all p olygons that overlap 12 to 1 o'clo ck", which for a given p oint p returns all p olygons that

 

are in the region b ounded bytworays that exit p at angles 90 and 60 in p olar co ordinates. Note that this

in nite region cannot be de ned as a p olygon made up of line segments, and hence this query cannot be

expressed using typical R-tree predicates. 13

4.3 GiSTs Over P (Z) (RD-trees)

In the previous two sections we demonstrated that the GiST can provide the functionality of two known

data structures: B+-trees and R-trees. In this section, we demonstrate that the GiST can provide supp ort

for a new search tree that indexes set-valued data.

The problem of handling set-valued data is attracting increasing attention in the Ob ject-Oriented data-

base community [KG94], and is fairly natural even for traditional relational database applications. For

example, one mighthave a university database with a table of students, and for each student an attribute

courses passed of typ e setof(integer). One would like to eciently supp ort containment queries such as \ nd

all students who have passed all the courses in the prerequisite set f101, 121, 150g."

We handle this in the GiST by using sets as containmentkeys, much as an R-tree uses b ounding b oxes

as containmentkeys. We call the resulting structure an RD-tree (or \Russian Doll" tree.) The keys in an

RD-tree are sets of integers, and the RD-tree derives its name from the fact that as one traverses a branchof

the tree, eachkey contains the key b elow it in the branch. We pro ceed to give GiST metho d implementations

for RD-trees.

Before compression, the keys in our RD-trees are sets of integers. A key S represents the predicate

Contains(S; v ) for set-valued variable v . The query predicates allowed on the RD-tree are Contains(set, v ),

Overlap(set, v ), and Equal(set, v ).

The implementation of the query predicates is straightforward:

 Contains(S; T ) Return true if S  T , and false otherwise.

 Overlap(S; T ) Return true if S \ T 6= ;, false otherwise.

 Equal(S; T ) Return true if S = T , false otherwise.

Now, the GiST metho d implementations:

 Consistent(E =(p; ptr);q) Given our keys and predicates, we know that p = Contains(S; v ), and

either q = Contains(T; v), q = Overlap(T; v) or q = Equal(T; v). For all of these, return true if

Overlap(S; T ), and false otherwise.

 Union(E =(S ;ptr );:::;E =(S ;ptr )) Return S [ ::: [ S .

1 1 n n 1 n

1 n

 Compress(E =(S; ptr)) Avariety of compression techniques for sets are given in [HP94]. We brie y

describ e one of them here. The elements of S are sorted, and then converted into a set of n disjoint

ranges f[l ;h ];[l ;h ];:::;[l ;h ]g where l  h , and h < l . The conversion is done using the

1 1 2 2 n n i i i i+1

following algorithm:

Initialize: consider each element a 2 S to be a range [a ;a ].

m m m

while (more than n ranges remain) f

find the pair of adjacent ranges with the least interval between them;

form a single range of the pair;

g

The resulting structure is called a rangeset. It can b e shown that this algorithm pro duces a rangeset

of n items with minimal addition of elements not in S [HP94].

 Decompress(E = (rangeset; ptr)) Rangesets are easily converted back to sets by enumerating the

elements in the ranges.

 Penalty(E =(S ;ptr );E =(S ;ptr ) Return jE :S [ E :S j jE :S j. Alternatively, return the

1 1 2 2 1 1 2 2 1 1

1 2

change in a weighted cardinality, where each element of Z has a weight, and jS j is the sum of the

weights of the elements in S . 14 0, 1 1,1

B+-trees Data Overlap

0,0 1, 0

Compression Loss

Figure 2: Space of Factors A ecting GiST Performance

 PickSplit(P ) Guttman's quadratic algorithm for R-tree split works naturally here. The reader is

referred to [Gut84] for details.

This GiST supp orts the usual R-tree query predicates, has containment keys, and uses a traditional

R-tree algorithm for PickSplit. As a result, wewere able to implement these metho ds in Illustra's extensible

R-trees, and get b ehavior identical to what the GiST b ehavior would b e. This exercise gave us a sense of the

complexity of a GiST class implementation (c. 500 lines of C co de), and allowed us to do the p erformance

studies describ ed in the next section. Using R-trees did limit our choices for predicates and for the split and

p enalty algorithms, which will merit further exploration when we build RD-trees using GiSTs.

5 GiST Performance Issues

In balanced trees such as B+-trees whichhave non-overlapping keys, the maximum numb er of no des to b e

examined (and hence I/O's) is easy to b ound: for a p oint query over duplicate-free data it is the heightof

the tree, i.e. O (log n) for a database of n tuples. This upp er b ound cannot b e guaranteed, however, if keys

onanodemayoverlap, as in an R-tree or GiST, since overlapping keys can cause searches in multiple paths

in the tree. The p erformance of a GiST varies directly with the amount that keys on no des tend to overlap.

There are two ma jor causes of key overlap: data overlap, and information loss due to key compression.

The rst issue is straightforward: if many data ob jects overlap signi cantly, then keys within the tree are

likely to overlap as well. For example, any dataset made up entirely of identical items will pro duce an

inecient index for queries that match the items. Such workloads are simply not amenable to indexing

techniques, and should b e pro cessed with sequential scans instead.

Loss due to key compression causes problems in a slightly more subtle way: even though two sets of data

may not overlap, the keys for these sets mayoverlap if the Compress/Decompress metho ds do not pro duce

exact keys. Consider R-trees, for example, where the Compress metho d pro duces b ounding b oxes. If ob jects

are not b ox-like, then the keys that represent them will b e inaccurate, and may indicate overlaps when none

are present. In R-trees, the problem of compression loss has b een largely ignored, since most spatial data

3

ob jects (geographic entities, regions of the brain, etc.) tend to b e relatively b ox-shap ed. But this need not

b e the case. For example, consider a 3-d R-tree index over the dataset corresp onding to a plate of spaghetti:

although no single spaghetto intersects any other in three dimensions, their b ounding b oxes will likely all

intersect!

3

Better approximations than b ounding b oxes have b een considered for doing spatial joins [BKSS94]. However, this work

prop oses using b ounding boxes in an R*-tree, and only using the more accurate approximations in main memory during

p ost-pro cessing steps. 15

The two p erformance issues describ ed ab ove are displayed as a graph in Figure 2. At the origin of

this graph are trees with no data overlap and lossless key compression, whichhave the optimal logarithmic

p erformance describ ed ab ove. Note that B+-trees over duplicate-free data are at the origin of the graph. As

one moves towards 1 along either axis, p erformance can b e exp ected to degrade. In the worst case on the x

axis, keys are consistent with any query, and the whole tree must b e traversed for any query. In the worst

case on the y axis, all the data are identical, and the whole tree must b e traversed for any query consistent

with the data.

In this section, we present some initial exp eriments wehave done with RD-trees to explore the space of

Figure 2. Wechose RD-trees for two reasons:

1. Wewere able to implement the metho ds in Illustra R-trees.

2. Set data can be \co oked" to have almost arbitrary overlap, as opp osed to p olygon data which is

contiguous within its b oundaries, and hence harder to manipulate. For example, it is trivial to construct

n distant \hot sp ots" shared by all sets in an RD-tree, but is geometrically dicult to do the same for

p olygons in an R-tree. We thus b elieve that set-valued data is particularly useful for exp erimenting

with overlap.

To validate our intuition ab out the p erformance space, we generated 30 datasets, each corresp onding

to a p oint in the space of Figure 2. Each dataset contained 10000 set-valued ob jects. Each ob ject was

a regularly spaced set of ranges, much like a comb laid on the number line (e.g. f[1; 10]; [100001; 100010];

[200001; 200010];:::g). The \teeth" of each combwere 10 integers wide, while the spaces b etween teeth were

99990 integers wide, large enough to accommo date one to oth from every other ob ject in the dataset. The

30 datasets were formed bychanging twovariables: numranges, the numb er of ranges p er set, and overlap,

the amount that each comboverlapp ed its predecessor. Varying numranges adjusted the compression loss:

our Compress metho d only allowed for 20 ranges per rangeset, so a comb of t > 20 teeth had t 20 of

its inter-to oth spaces erroneously included into its compressed representation. The amountofoverlap was

controlled by the left edge of each comb: for overlap 0, the rst combwas started at 1, the second at 11,

the third at 21, etc., so that no two combs overlapp ed. For overlap 2, the rst combwas started at 1, the

second at 9, the third at 17, etc. The 30 datasets were generated by forming all combinations of numranges

in f20, 25, 30, 35, 40g, and overlap in f0, 2, 4, 6, 8, 10g.

For eachof the 30 datasets, ve queries were p erformed. Each query searched for ob jects overlapping

a di erent to oth of the rst comb. The query p erformance was measured in numb er of I/Os, and the ve

numb ers averaged p er dataset. Achart of the p erformance is shown in App endix A. More illustrative is the

3-d plot shown in Figure 3, where the x and y axes are the same as in Figure 2, and the z axis represents the

average numb er of I/Os. The landscap e is muchaswe had exp ected: it slop es upwards as wemoveaway

from 0 on either axis.

While our general insights on data overlap and compression loss are veri ed by this exp eriment, a number

of p erformance variables remain unexplored. Two issues of concern are hot spots and the correlation factor

across hot sp ots. Hot sp ots in RD-trees are integers that app ear in many sets. In general, hot sp ots can b e

thoughtofasvery sp eci c predicates satis able by many tuples in a dataset. The correlation factor for two

integers j and k in an RD-tree is the likeliho o d that if one of j or k app ears in a set, then b oth app ear. In

general, the correlation factor for two hot sp ots p; q is the likeliho o d that if p _ q holds for a tuple, p ^ q

holds as well. An interesting question is how the GiST b ehaves as one denormalizes data sets to pro duce

hot sp ots, and correlations b etween them. This question, along with similar issues, should provetobearich

area of future research.

6 Implementation Issues

In previous sections we describ ed the GiST, demonstrated its exibility, and discussed its p erformance as an

index for secondary storage. A full- edged database system is more than just a secondary storage manager, 16 Avg. Number of I/Os

5000 4000 3000 1 2000 0.8 0.6 1000 0.4 Data Overlap 0.2 0 0.1 0.2 0.3 0

Compression Loss 0.4 0.5

Figure 3: Performance in the Parameter Space

This surface was generated from the data in App endix A. Compression loss was calculated

as (numranges 20)=numranges, while data overlap was calculated as overlap=10.

however. In this section we brie y address how the GiST can b e implemented, taking into account imp ortant

database system issues.

6.1 In-Memory Eciency: Streamlining Intra-no de Op erations

The GiST is an index for secondary storage. As a result, the algorithms describ ed ab ove fo cus on minimizing

no de accesses, which are the only op erations that involve I/O. However, even secondary storage structures

should be implemented with an eye on algorithmic eciency, and in some situations it is b ene cial to

streamline in-memory op erations on a single no de. We facilitate such optimizations in the GiST for situations

where they can b e exploited, although the default b ehavior is general-purp ose and can b e relied up on for

correctness in all scenarios.

For the sake of simplicity and generality, the algorithms describ ed ab ove compare input predicates with

eachentry on a no de. More ecientschemes may b e used for particular domains. For example, in ordered

domains the appropriate entries on a no de may b e found via binary search. Another example is the hB-tree,

in which the entries on a no de are themselves indexed via a k-d tree [Ben79], which is used for nding the

appropriate entries on the no de. Many alternative optimizations may exist, dep ending on the domain of the

key predicates.

To facilitate these techniques, additional extensibility in the GiST may optionally b e leveraged for high

p erformance. The No de ob ject in the GiST maybe sub classed, and metho ds for searching, inserting and

deleting entries on a no de may b e sp ecialized. For the sake of brevity,we do not describ e the No de interface

in detail here, but instead give an illustrative example. The default b ehavior of the No de ob ject is to store

entries in a typical slotted-page format, and do intra-no de search in a linear fashion. The b ehavior of an hB-

tree maybeachieved instead, by sp ecifying a sub class of the No de ob ject which has a k-d tree memb er, and

overloaded memb er functions for intra-no de insertion, deletion, and search which maintain and exploit the

k-d tree. Improved No de metho ds such as these can streamline the intra-no de search op erations mentioned

in steps S1, S2, FM1, FM2, and CS2. Small overheads may be incurred in steps which insert, delete or

mo dify entries on no des. 17

Another in-memory optimization is to avoid the overhead of calling user metho ds for each op eration.

This can b e done by writing metho ds as inline co de, and recompiling the system. This is discussed in more

detail in Section 6.6.

6.2 Concurrency Control, Recovery and Consistency

High concurrency, recoverability, and degree-3 consistency are critical factors in a full- edged database

system. Concurrency control for B+-trees is a well-understo o d problem, with Lehman and Yao's B-link

variant[LY81] b eing atypical way of implementing high-concurrency B+-trees. Recovery for B-link trees

has b een explored by Lomet and Salzb erg [LS92], who show that a -tree, which generalizes the B-link

tree, can b e reconstructed gradually from anyinterim state. Until recently, no analogous results existed for

R-trees. The main stumbling blo ck was the fact that Lehman and Yao's techniques were based on linear

ordering in the data | sideways p ointers were intro duced at each level of the tree to connect the no des of

each level into a linked list, ordered bykey value. R-tree data has no such natural ordering, and imp osing

an arti cial ordering upsets the balancing techniques for the tree.

Fortunately, recent work extends Lehman and Yao's sideways p ointer techniques, along with recovery

techniques, to R-trees. Ng and Kameda [NK94] do so by generating a pending update list at each no de

of the tree, and applying Lomet and Salzb erg's results on -trees to this context. Banks, Kornacker and

Stonebraker [BKS94, KB95] have a simpler solution that marks keys and no des with sequence numb ers,

and uses the sequence numb ers to determine order among the no des. Both approaches provide solutions for

degree-3 consistency | Ng and Kameda via ab orting transactions that read phantoms, and Banks, et al.

via predicate lo cking.

These techniques can b e used in GiSTs in the same way they are used in R-trees. The sequence number

approach seems more attractive for the GiST, since it is a simple generalization of the original B-link

approach. As a result, the original approach could b e used in the case of linearly ordered data, and the more

complex sequence numb er approach for unordered data. A full implementation of concurrency, recoverability,

and consistency in the GiST will certainly have to revisit some issues, but the bulk of the problems have

already b een addressed.

6.3 Variable-Length Keys

It is often convenient to allowkeys to vary in length, particularly b ecause of the Compress metho d available

in GiSTs. Unfortunately,variable length keys can cause problems for the GiST (and for R-trees), due to the

following p ossible scenario:

1. A new entry is to b e inserted in leaf no de L

0 0

2. L is full, so a new no de L is generated, and the data on L is split among L and L

3. During split, the new entry is assigned to L. This results in the key ab ove L b eing readjusted.

4. The key ab ove L grows in size, and no longer ts on the parent no de, so...

5. the entry for the key ab ove L must b e removed from the parentnode

0

At this p oint, we are in the unpleasant situation of having both L and L detached from our tree. The

traditional split propagation algorithm only works for one detached no de, not for two. Some new technique

is required to handle this.

For linearly ordered domains, the situation outlined ab ove cannot o ccur, since key propagation is not

necessary (recall the discussion at the end of Section 3.4.3.) For unordered domains, the problem can indeed

o ccur. We suggest two p ossible solutions:

 Reinsertion: The scenario describ ed can b e handled by calling the Insert metho d to reinsert the two

orphaned no des at their original level. Some care must b e taken to ensure that the problem describ ed

ab ove do es not re-o ccur on reinsert. 18

0

 New Entry on New No de: An alternative is to force the new entry to b e placed on L after the split.

Then we know that some suciently small key can b e found that is appropriate for L: in particular,

the old key for L is still valid after the split. The old key may b e unacceptably general though, and it

is advisable to search for a more sp eci c key that is small enough to t.

The rst solution has the advantage of b eing exible, and p otentially improving the p erformance of the tree

through re ned data placement [BKSS90]. It may require some mo di cations to the concurrency control

techniques describ ed in the previous section. The second solution, while simpler, presents the problem of

nding a go o d key that requires suciently little storage. This is always p ossible if one can ensure that as

the generalityofakey decreases, the storage required for its compressed key do es not increase. Although

this cannot b e guaranteed in general, it do es seem natural for some domains, e.g. sets.

6.4 Bulk Loading

In unordered domains, it is not clear how to eciently build an index over a large, pre-existing dataset. Todo

so, some ordering must b e used to sort the data, and then the sorted data must b e partitioned into a linked

list of no des, so that a tree may b e constructed over the list. A go o d sort order and partitioning results in

a relatively full tree with lowoverlap at the keys. Various sort orders have b een develop ed for R-trees (e.g.

[KF93], [Jag90], etc.), but these solutions are sp eci c to the spatial domain and thus not generally applicable

for an extensible structure like the GiST. Extending this work to new domains should proveinteresting. An

extensible BulkLoad metho d may b e added to the GiST to accommo date bulk loading for various domains.

6.5 Optimizer Integration

Tointegrate GiSTs with a query optimizer, one must let the optimizer know which query predicates match

each GiST. This can be done by registering the predicates supp orted by the Consistent metho d with the

optimizer. For trees with IsOrdered = TRUE, one can additionally sp ecify to the optimizer those predicates

that can b e evaluated using the FindMin/Next technique.

+

When planning a query, if the optimizer sees a Bo olean Factor [SAC 79] based on one of the registered

predicates, then it knows it should consider probing the GiST as one of its p otential access paths. The

question of estimating the cost of probing a GiST is more dicult, and we defer it until Section 7.

6.6 Co ding Details

We prop ose implementing the GiST in two ways: the Extensible GiST will be designed for easy exten-

sibility, while the Template GiST will be designed for maximal eciency. With a little care, these two

implementations can b e built o of the same co de base, without replication of logic.

The Extensible GiST package is an ob ject library providing a GiST class, and a GiSTkey class. The

latter is a skeleton from which users may inherit the metho d names describ ed in Section 3.3, to provide their

own domain-sp eci c implementations. In C++ terminology, these metho ds are virtual member functions,

meaning that each time a metho d, e.g. Compress, is invoked on a key ob ject, the appropriate implementation

of Compress is lo oked up and invoked. Invoking virtual member functions can be inecient, but this

architecture allows for great exibility: new key classes may b e implemented on demand, dynamically linked

into a running DBMS, and used to index new data typ es without halting the DBMS. This is analogous to

the extensible indexing scheme used in POSTGRES, and is suitable b oth for prototyping, and for systems

where high availability is imp ortant.

The Template GiST package provides a template GiST class, which is a C++ source co de library. Users

develop a key class of their liking, saymyGiSTkey, with the appropriate metho ds. Then they can declare

a class GiST in their co de. After compiling the co de for GiST, they havean

index that b ehaves as desired without the ineciencies of invoking virtual member functions. Note that

myGiSTkey may implement some key metho ds as inline co de, particularly for trivial op erations such as

identity function Compress and Decompress metho ds. This eliminates the pathlength involved in making 19

a function call, thus masking the complexity of unexploited extensibility features. The Template GiST is

in the spirit of \extensible database to olkits" suchasEXODUS, and can b e used to build custom-designed,

ecient systems.

7 Summary and Future Work

The incorp oration of new data typ es into to day's database systems requires indexes that supp ort an ex-

tensible set of queries. To facilitate this, we isolated the essential nature of search trees, providing a clean

characterization of how they are all alike. Using this insight, we develop ed the Generalized Search Tree,

which uni es previously distinct search tree structures. The GiST is extremely extensible, allowing arbitrary

data sets to b e indexed and eciently queried in new ways. This exibility op ens the question of when and

how one can generate e ective search trees.

Since the GiST uni es B+-trees and R-trees into a single structure, it is immediately useful for systems

which require the functionality of b oth. In addition, the extensibility of the GiST also op ens up a number

of interesting research problems whichweintend to pursue:

 Indexability: The primary theoretical question raised by the GiST is whether one can nd a general

characterization of workloads that are amenable to indexing. The GiST provides a means to index

arbitrary domains for arbitrary queries, but as yet we lack an \indexability theory" to describ e whether

or not trying to index a given data set is practical for a given set of queries.

 Indexing Non-Standard Domains: As a practical matter, we are interested in building indices for

unusual domains, such as sets, terms, images, sequences, graphs, video and sound clips, ngerprints,

molecular structures, etc. Pursuit of such applied results should provide an interesting feedbackloop

with the theoretical explorations describ ed ab ove. Our investigation into RD-trees for set data has

already b egun: wehave implemented RD-trees in SHORE and Illustra, using R-trees rather than the

GiST. Once we shift from R-trees to the GiST, we will also b e able to exp eriment with new PickSplit

metho ds and new predicates for sets.

 Query Optimization and Cost Estimation: Cost estimates for query optimization need to take into

account the costs of searching a GiST. Currently such estimates are reasonably accurate for B+-trees,

and less so for R-trees. Recently, some work on R-tree cost estimation has b een done [FK94], but more

work is required to bring this to b ear on GiSTs in general. As an additional problem, the user-de ned

GiST metho ds may b e time-consuming op erations, and their CPU cost should b e registered with the

optimizer [HS93]. The optimizer must then correctly incorp orate the CPU cost of the metho ds into its

estimate of the cost for probing a particular GiST.

 Lossy Key Compression Techniques: As new data domains are indexed, it will likely b e necessary to

nd new lossy compression techniques that preserve the prop erties of a GiST.

 Algorithmic Improvements: The GiST algorithms for insertion are based on those of R-trees. As noted

in Section 4.2, R*-trees use somewhat mo di ed algorithms, which seem to provide some p erformance

gain for spatial data. In particular, the R*-tree p olicy of \forced reinsert" during split may b e generally

b ene cial. An investigation of the R*-tree mo di cations needs to b e carried out for non-spatial domains.

If the techniques prove b ene cial, they will be incorp orated into the GiST, either as an option or

as default b ehavior. Additional work will be required to unify the R*-tree mo di cations with the

techniques for concurrency control and recovery.

Finally, we b elieve that future domain-sp eci c search tree enhancements should take into account the

generality issues raised by GiSTs. There is no go o d reason to develop new, distinct search tree structures

if comparable p erformance can b e obtained in a uni ed framework. The GiST provides such a framework,

and we plan to implement it as a C++ library package, so that it can b e exploited byavariety of systems. 20

Acknowledgements

Thanks to Praveen Seshadri, Marcel Kornacker, Mike Olson, Kurt Brown, Jim Gray, and the anonymous

reviewers for their helpful input on this pap er. Many debts of gratitude are due to the sta of Illustra

Information Systems | thanks to Mike Stonebraker and Paula Hawthorn for providing a exible industrial

researchenvironment, and to Mike Olson, Je Meredith, Kevin Brown, Michael Ub ell, and Wei Hong for

their help with technical matters. Thanks also to Shel Finkelstein for his insights on RD-trees. Simon

Hellerstein is resp onsible for the acronym GiST. Ira Singer provided a hardware loan which made this pap er

p ossible. Finally, thanks to Adene Sacks, who was a crucial resource throughout the course of this work.

References

[Aok91] P. M. Aoki. Implementation of Extended Indexes in POSTGRES. SIGIR Forum, 25(1):2{9,

1991.

[Ben79] J. L. Bentley. Multidimensional Binary SearchTrees in Database Applications. IEEE Transac-

tions on Software Engineering, SE-5(4):339{353, July 1979.

[BKS94] Douglas Banks, Marcel Kornacker, and Michael Stonebraker. High Concurrency Lo cking in R-

Trees. Technical Rep ort Sequoia 2000 94/56, University of California, Berkeley, June 1994.

[BKSS90] Norb ert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree:

An Ecient and Robust Access Metho d For Points and Rectangles. In Proc. ACM-SIGMOD

International Conference on Management of Data,Atlantic City,May 1990, pages 322{331.

[BKSS94] Thomas Brinkho , Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. Multi-Step Pro-

cessing of Spatial Joins. In Proc. ACM-SIGMOD International Conference on Management of

Data, Minneap olis, May 1994, pages 197{208.

+

[CDF 94] Michael J. Carey, David J. DeWitt, Michael J. Franklin, Nancy E. Hall, Mark L. McAuli e,

Je rey F. Naughton, Daniel T. Schuh, Marvin H. Solomon, C. K. Tan, Odysseas G. Tsatalos,

Seth J. White, and Michael J. Zwilling. Shoring Up Persistent Applications. In Proc. ACM-

SIGMOD International Conference on Management of Data, Minneap olis, May 1994, pages 383{

394.

+

[CDG 90] M.J. Carey, D.J. DeWitt, G. Graefe, D.M. Haight, J.E. Richardson, D.H. Schuh, E.J. Shekita,

and S.L. Vandenb erg. The EXODUS Extensible DBMS Pro ject: An Overview. In Stan Zdonik

and David Maier, editors, Readings In Object-Oriented Database Systems. Morgan-Kaufmann

Publishers, Inc., 1990.

[Com79] Douglas Comer. The Ubiquitous B-Tree. Computing Surveys, 11(2):121{137, June 1979.

[FB74] R. A. Finkel and J. L. Bentley. Quad-Trees: A Data Structure For Retrieval On Comp osite Keys.

ACTA Informatica, 4(1):1{9, 1974.

[FK94] Christos Faloutsos and Ibrahim Kamel. Beyond Uniformity and Indep endence: Analysis of R-

trees Using the Concept of Fractal Dimension. In Proc. 13th ACM SIGACT-SIGMOD-SIGART

Symposium on Principles of Database Systems, pages 4{13, Minneap olis, May 1994.

[Gro94] The POSTGRES Group. POSTGRES Reference Manual, Version 4.2. Technical Rep ort M92/85,

Electronics Research Lab oratory, University of California, Berkeley, April 1994.

[Gut84] Antonin Guttman. R-Trees: A Dynamic Index Structure For Spatial Searching. In Proc. ACM-

SIGMOD International Conference on Management of Data, Boston, June 1984, pages 47{57. 21

[HP94] Joseph M. Hellerstein and Avi Pfe er. The RD-Tree: An Index Structure for Sets. Technical

Rep ort #1252, University of Wisconsin at Madison, Octob er 1994.

[HS93] Joseph M. Hellerstein and Michael Stonebraker. Predicate Migration: Optimizing Queries With

Exp ensive Predicates. In Proc. ACM-SIGMOD International Conference on Management of

Data,Washington, D.C., May 1993, pages 267{276.

[Jag90] H. V. Jagadish. Linear Clustering of Ob jects With Multiple Attributes. In Proc. ACM-SIGMOD

International Conference on Management of Data,Atlantic City,May 1990, pages 332{342.

[JS93] T. Johnson and D. Shasha. Inserts and Deletes on B-trees: WhyFree-At-Empty is Better Than

Merge-At-Half. Journal of Computer Sciences and Systems, 47(1):45{76, August 1993.

[KB95] Marcel Kornacker and Douglas Banks. High-Concurrency Lo cking in R-Trees. Submitted for

publication, 1995.

[KF93] Ibrahim Kamel and Christos Faloutsos. On Packing R-Trees. Second International Conference

on Information and Know ledge Management (CIKM),Novemb er 1993.

[KG94] Won Kim and Jorge Garza. Requirements For a Performance Benchmark For Ob ject-Oriented

Systems. In Won Kim, editor, Modern Database Systems: The Object Model, Interoperability and

Beyond.ACM Press, June 1994.

[KKD89] Won Kim, Kyung-Chang Kim, and Alfred Dale. Indexing Techniques for Ob ject-Oriented

Databases. In Won Kim and Fred Lo chovsky, editors, Object-Oriented Concepts, Databases,

and Applications, pages 371{394. ACM Press and Addison-Wesley Publishing Co., 1989.

[Knu73] Donald Ervin Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.

Addison-Wesley Publishing Co., 1973.

[LJF94] King-Ip Lin, H. V. Jagadish, and Christos Faloutsos. The TV-Tree: An Index Structure for

High-Dimensional Data. VLDB Journal, 3:517{542, Octob er 1994.

[LS90] David B. Lomet and Betty Salzb erg. The hB-Tree: A Multiattribute Indexing Metho d. ACM

Transactions on Database Systems, 15(4), Decemb er 1990.

[LS92] David Lomet and Betty Salzb erg. Access Metho d Concurrency with Recovery. In Proc. ACM-

SIGMOD International Conference on Management of Data, pages 351{360, San Diego, June

1992.

[LY81] P. L. Lehman and S. B. Yao. Ecient Lo cking For Concurrent Op erations on B-trees. ACM

Transactions on Database Systems, 6(4):650{670, 1981.

[MCD94] Maurcio R. Mediano, Marco A. Casanova, and Marcelo Dreux. V-Trees | A Storage Metho d

For Long Vector Data. In Proc. 20th International ConferenceonVery Large Data Bases, pages

321{330, Santiago, Septemb er 1994.

[NK94] Vincent Ng and Tiko Kameda. The R-Link Tree: A Recoverable Index Structure for Spatial

Data. In Proc. Fifth International Conference on Database and Expert Systems Applications

(DEXA '94), pages 163{172, Athens, 1994.

[PSTW93] Bernd-UwePagel, Hans-Werner Six, HeinrichTob en, and Peter Widmayer. Towards an Analysis

of Range Query Performance in Spatial Data Structures. In Proc. 12th ACM SIGACT-SIGMOD-

SIGART Symposium on Principles of Database Systems, pages 214{221, Washington, D. C., May

1993. 22

[Qui93] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan-Kaufmann Publishers, Inc.,

1993.

[Rob81] J. T. Robinson. The k-D-B-Tree: A Search Structure for Large Multidimensional Dynamic

Indexes. In Proc. ACM-SIGMOD International Conference on Management of Data, pages 10{

18, Ann Arb or, April/May 1981.

+

[SAC 79] Patricia G. Selinger, M. Astrahan, D. Chamb erlin, Raymond Lorie, and T. Price. Access Path

Selection in a Relational Database Management System. In Proc. ACM-SIGMOD International

Conference on Management of Data, Boston, June 1979.

[SRF87] Timos Sellis, Nick Roussop oulos, and Christos Faloutsos. The R+-Tree: A Dynamic Index For

Multi-Dimensional Ob jects. In Proc. 13th International Conference on Very Large Data Bases,

pages 507{518, Brighton, Septemb er 1987.

[Sto86] Michael Stonebraker. Inclusion of New Typ es in Relational Database Systems. In Proceedings

of the IEEE Fourth International Conference on Data Engineering, pages 262{269, Washington,

D.C., February 1986.

[Sto93] Michael Stonebraker. The Miro DBMS. In Proc. ACM-SIGMOD International Conference on

Management of Data,Washington, D.C., May 1993, page 439.

[VV84] PatrickValduriez and Yann Viemont. A Multikey Hashing Scheme Using Predicate Trees. In

Proc. ACM-SIGMOD International Conference on Management of Data, Boston, June 1984,

pages 107{114.

[WE77] Kai C. Wong and Murray Edelb erg. Interval hierarchies and their application to predicate les.

ACM Transactions on Database Systems, 2(3):223{232, Septemb er 1977.

[WE80] C. K. Wong and M. C. Easton. An Ecient Metho d for Weighted Sampling Without Replace-

ment. SIAM Journal on Computing, 9(1):111{113, February 1980. 23

A Query Performance Over Comb Data

numranges overlap Query 1 Query 2 Query 3 Query 4 Query 5 Average

20 0 7 2 2 2 2 3

20 2 7 2 2 2 2 3

20 4 7 2 2 2 2 3

20 6 7 2 2 2 2 3

20 8 8 2 2 2 2 3.2

20 10 3254 3252 3252 3252 3252 3252.4

25 0 7 4087 6 2 2 820.8

25 2 7 4067 6 2 2 816.8

25 4 7 4063 6 2 2 816

25 6 7 4086 6 2 2 820.6

25 8 8 4084 7 2 2 820.6

25 10 4096 4095 4095 4095 4095 4095.2

30 0 7 4087 4086 6 2 1637.6

30 2 7 4067 4066 6 2 1629.6

30 4 7 4063 4062 6 2 1628

30 6 8 4086 4086 6 2 1637.6

30 8 8 4084 4081 7 2 1636.4

30 10 4096 4095 4095 4095 4095 4095.2

35 0 7 5734 5734 5734 10 3443.8

35 2 7 5730 5729 5729 6 3440.2

35 4 7 5727 5726 5726 6 3438.4

35 6 8 5753 5753 5753 7 3454.8

35 8 10 5533 5534 5534 7 3323

35 10 5002 5002 5002 5002 5002 5002

40 0 8 5734 5734 5734 5734 4588.8

40 2 8 5730 5729 5729 5729 4585

40 4 8 5727 5726 5726 5726 4582.6

40 6 9 5753 5753 5753 5753 4604.2

40 8 10 5756 5754 5754 5754 4605

40 10 5751 5749 5749 5749 5749 5749.4 24