<<

Multi-Dimensional Indexing

Torsten Grust Chapter 5 Multi-Dimensional Indexing

Indexing Points and Regions in k Dimensions B+-trees...... over composite keys Architecture and Implementation of Database Systems Point Quad Trees Point (Equality) Search Summer 2014 Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B- Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Torsten Grust Spaces with High Wilhelm-Schickard-Institut für Informatik Dimensionality Wrap-Up

Universität Tübingen 1 Multi-Dimensional More Dimensions. . . Indexing Torsten Grust

One SQL query, many range predicates

1 SELECT* 2 FROM CUSTOMERS B+-trees. . . 3 WHERE ZIPCODE BETWEEN 8880 AND 8999 . . . over composite keys 4 AND REVENUE BETWEEN 3500 AND 6000 Point Quad Trees Point (Equality) Search Inserting Data This query involves a range predicate in two dimensions. Region Queries k-d Trees Balanced Construction

K-D-B-Trees Typical use cases for multi-dimensional data include K-D-B-Tree Operations Splitting a Point Page • geographical data (GIS: Geographical Information Systems), Splitting a Region Page R-Trees • multimedia retrieval (e.g., image or video search), Searching and Inserting • OLAP (Online Analytical Processing), Splitting R-Tree Nodes UB-Trees • queries against encoded data (e.g., XML) Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

2 Multi-Dimensional . . . More Challenges. . . Indexing Torsten Grust Queries and data can be points or regions

B+-trees...... over composite keys

point queryregion query Point Quad Trees region inclusion Point (Equality) Search region data Inserting Data or intersection Region Queries

k-d Trees point data Balanced Construction

K-D-B-Trees most interesting: k-nearest-neighbor search (k-NN) K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes . . . and you can come up with many more meaningful types of UB-Trees queries over multi-dimensional data. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries Note: All equality searches can be reduced to one-dimensional queries. Spaces with High Dimensionality

Wrap-Up

3 Multi-Dimensional Points, Lines, and Regions Indexing Tarifzonen | Fare zones Torsten Grust

FEUERTHALEN

FLURLINGEN16 SCHLOSS LAUFEN A. RH.

DACHSEN

RHEINAU STAMMHEIM MARTHALEN OSSINGEN 62 OBERSTAMM- 15 61 HEIM OBERNEUNFORN WIL ZH RAFZ HÜNTWANGEN KLEINANDELFINGEN 14 ANDELFINGEN WASTERKINGEN HÜNTWANGEN- RÜDLINGEN WIL 24 60 FLAACH THALHEIM-ALTIKON KAISERSTUHL AG 13 BUCHBERG B+-trees. . . ALTIKON HENGGART ZWEIDLEN ELLIKON EGLISAU DINHARD AN DER THUR TEUFEN 63 . . . over composite keys GLATTFELDEN HETTLINGEN 18 SEUZACH RICKENBACH-ATTIKON STADEL NEFTENBACH REUTLINGEN GUNDETSWIL BEI NIEDERGLATT BÜLACH 23 WALLRÜTI Point Quad Trees BACHS WIESENDANGEN EMBRACH- PFUNGEN- OBERWINTERTHUR NEFTENBACH ELSAU HÖRI RORBAS HEGI ELGG Point (Equality) Search NIEDERWENINGENNIEDERWENINGENDORFSCHÖFFLISDORF- GRÜZE OBERWENINGEN 12 WÜLFLINGEN STEINMAUR WINTERTHUR RÄTERSCHEN SCHOTTIKON NIEDERGLATT OBEREMBRACH TÖSS SCHLEINIKON SEEN ELGG DIELSDORF Inserting Data * EIDBERG 64 REGENSBERG NIEDERHASLI BRÜTTEN 17 20 SCHLATT BEI ZÜRICH SENNHOF-KYBURG BOPPELSEN FLUGHAFEN 21 WINTERTHUR Region Queries KLOTEN BUCHS- KOLLBRUNN DÄLLIKON NÜRENSDORF KYBURG OTELFINGEN OBERGLATT RÄMISMÜHLE- RÜMLANG KEMPTTHAL ZELL HÜTTIKON TURBENTHAL OTELFINGEN 11 BALSBERG RIKON GOLFPARK GLATTBRUGG WEISSLINGEN k-d Trees REGENSDORF- BASSERSDORF OETWIL AN WATT AFFOLTERN OPFIKON DIETLIKON 70 SITZBERG DER LIMMAT 22 71 WALLISELLEN WILDBERG Balanced Construction SPREITENBACH SEEBACH WILA SHOPPING CENTER GEROLDSWIL EFFRETIKON UNTERENGSTRINGEN * ILLNAU DIETIKON 10 OERLIKON 84 SCHLIEREN ALTSTETTEN 21 VOLKETSWIL 35 STELZENACKER WIPKINGEN DÜBENDORF RUSSIKON SALAND STERNENBERG K-D-B-Trees GLANZENBERG HARDBRÜCKE STETTBACH FEHRALTORF BERGDIETIKON URDORF ZÜRICHHB 30 HITTNAU WIEDIKON SCHWERZENBACH ZH REPPISCHHOF SELNAU BAUMA K-D-B-Tree Operations STADELHOFEN NÄNIKON-GREIFENSEE UITIKON WEIHERMATT FÄLLANDEN WALDEGG PFÄFFIKON ZH GIESSHÜBEL BIRMENSDORF ZH ENGE 31 Splitting a Point Page REHALP STEG TIEFENBRUNNEN USTER 72 UETLIBERG ZOLLIKERBERG ZOLLIKON AATHAL BÄRETSWIL 54 WOLLISHOFEN 30 KEMPTEN FISCHENTHAL Splitting a Region Page ZUMIKON MAUR BONSTETTEN- LEIMBACH GOLDBACH WETTSWIL 40 73 KILCHBERG WETZIKON KÜSNACHT FORCH GOSSAU ZH RINGWIL GIBSWIL MÖNCHALTORF OBERLUNKHOFEN SOOD- ZH ERLENBACH ZH OBERLEIMBACH R-Trees WINKEL EGG 55 RÜSCHLIKON AM ZÜRICHSEE 32 HINWIL ADLISWIL 50 HERRLIBERG- 33 HEDINGEN ESSLINGEN 34 SIHLAU FELDMEILEN Searching and Inserting THALWIL GRÜNINGEN OBERDÜRNTEN OTTENBACH WILDPARK-HÖFLI BUBIKON 41 42 WALD FALTIGBERG AFFOLTERN LANGNAU-GATTIKON MEILEN OETWIL 43 OBERRIEDEN UETIKON TANN-DÜRNTEN Splitting R-Tree Nodes AM ALBIS OBERRIEDEN DORF AM SEE HORGEN HOMBRECHTIKON 33 OBFELDEN AEUGST SIHLWALD MÄNNEDORF RÜTI ZH AM ALBIS HORGEN OBERDORF SIHLBRUGG KEMPRATEN 56 AU ZH UB-Trees METTMENSTETTEN HAUSEN 51 AM ALBIS STÄFA UERIKON FELDBACH JONA 80 WAGEN MASCHWANDEN WÄDENSWIL Bit Interleaving / Z-Ordering KNONAU KAPPEL 52 RAPPERSWIL BLUMENAU AM ALBIS HURDEN HIRZEL 83 BURGHALDEN RICHTERSWILBÄCH FREIENBACH 82 B+-Trees over Z-Codes PFÄFFIKON 53 GRÜNENFELD FREIENBACH SCHÖNENBERG ZH WOLLERAU Range Queries SAMSTAGERN SZ

Bahnstrecke mit Bahnhof Railway with Station SOB HÜTTEN Regionalbus Regional bus SCHINDELLEGI- 81 FEUSISBERG Schiff Boat Spaces with High 51 Zonennummer 51 Zone number Dimensionality Die Tarifzonen 10 (Stadt Zürich) Fare zones 10 (Zurich City) and 20 10 10 * und 20 (Stadt Winterthur) werden * (Winterthur City) each count as two für die Preisberechnung doppelt zones when calculating the fare. 20* 20* gezählt. Wrap-Up Verbundfahrausweise berechtigen You can travel as often as you like on während der Gültigkeitsdauer zu your ZVV ticket within the fare zones beliebigen Fahrten in den gelösten and period shown on the ticket. Zonen. © Zürcher Verkehrsverbund/PostAuto Region Zürich, 12.2007 4 Multi-Dimensional . . . More Solutions Indexing Torsten Grust

Recent proposals for multi-dimension index Quad Tree [Finkel 1974] K-D-B-Tree [Robinson 1981] R-tree [Guttman 1984] Grid File [Nievergelt 1984] + B+-trees. . . R -tree [Sellis 1987] LSD-tree [Henrich 1989] . . . over composite keys R*-tree [Geckmann 1990] hB-tree [Lomet 1990] Point Quad Trees Point (Equality) Search Vp-tree [Chiueh 1994] TV-tree [Lin 1994] Inserting Data UB-tree [Bayer 1996] hB-Π-tree [Evangelidis 1995] Region Queries k-d Trees SS-tree [White 1996] X-tree [Berchtold 1996] Balanced Construction M-tree [Ciaccia 1996] SR-tree [Katayama 1997] K-D-B-Trees K-D-B-Tree Operations Pyramid [Berchtold 1998] Hybrid-tree [Chakrabarti 1999] Splitting a Point Page DABS-tree [Böhm 1999] IQ-tree [Böhm 2000] Splitting a Region Page Slim-tree [Faloutsos 2000] Landmark file [Böhm 2000] R-Trees Searching and Inserting P-Sphere-tree [Goldstein 2000] A-tree [Sakurai 2000] Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes None of these provides a “fits all” solution. Range Queries Spaces with High Dimensionality

Wrap-Up

5 + Multi-Dimensional Can We Just Use a B -tree? Indexing Torsten Grust • Can two B+-trees, one over ZIPCODE and over REVENUE, adequately support a two-dimensional range query?

Querying a rectangle

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees • Can only scan along either index at once, and both of them Searching and Inserting produce many false hits. Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering • If all you have are these two indices, you can do index B+-Trees over Z-Codes intersection: perform both scans in separation to obtain the Range Queries Spaces with High rids of candidate tuples. Then compute the intersection Dimensionality between the two rid lists (DB2: IXAND). Wrap-Up 6 Multi-Dimensional IBM DB2: Index Intersection Indexing Torsten Grust

IBM DB2 uses index Plan selected by the intersection (operator IXAND) query optimizer B+-trees. . . 1 SELECT* . . . over composite keys 2 FROM ACCEL Point Quad Trees 3 WHERE PRE BETWEEN0 AND 10000 Point (Equality) Search Inserting Data 4 AND POST BETWEEN 10000 AND 20000 Region Queries

k-d Trees Relevant indexes defined over table Balanced Construction ACCEL: K-D-B-Trees K-D-B-Tree Operations 1 Splitting a Point Page IPOST over column POST Splitting a Region Page 2 SQL0801192024500 over R-Trees Searching and Inserting column PRE (primary key) Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

7 Multi-Dimensional Can Composite Keys Help? Indexing Torsten Grust Indexes with composite keys

REVENUE REVENUE

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees ZIPCODE ZIPCODE Balanced Construction REVENUE, ZIPCODE index ZIPCODE, REVENUE index K-D-B-Trees h i h i K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting • Almost the same thing! Splitting R-Tree Nodes Indices over composite keys are not symmetric: The major UB-Trees + Bit Interleaving / Z-Ordering attribute dominates the organization of the B -tree. B+-Trees over Z-Codes Range Queries

• But: Since the minor attribute is also stored in the index Spaces with High (part of the keys k), we may discard non-qualifying tuples Dimensionality before fetching them from the data pages. Wrap-Up 8 Multi-Dimensional Multi-Dimensional Indices Indexing Torsten Grust

• B+-trees can answer one-dimensional queries only.1

• We would like to have a multi-dimensional index B+-trees. . . that . . . over composite keys Point Quad Trees • is symmetric in its dimensions, Point (Equality) Search • clusters data in a space-aware fashion, Inserting Data Region Queries

• is dynamic with respect to updates, and k-d Trees • provides good support for useful queries. Balanced Construction K-D-B-Trees K-D-B-Tree Operations • We will start with data structures that have been designed Splitting a Point Page for in-memory use, then tweak them into disk-aware Splitting a Region Page R-Trees database indices (e.g., organize data in a page-wise fashion). Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality 1 Toward the end of this chapter, we will see UB-trees, a nifty trick that uses Wrap-Up + B -trees to support some multi-dimensional queries. 9 Multi-Dimensional “Binary” Indexing Torsten Grust In k dimensions, a “” becomes a 2k-ary tree.

Point quad tree (k = 2) • Each data point partitions the data k B+-trees. . . space into 2 disjoint . . . over composite keys regions. Point Quad Trees Point (Equality) Search • In each , a region Inserting Data Region Queries

points to another node k-d Trees (representing a refined Balanced Construction partitioning) or to a K-D-B-Trees K-D-B-Tree Operations special null value. Splitting a Point Page Splitting a Region Page

• This is a R-Trees point quad tree. Searching and Inserting Splitting R-Tree Nodes Finkel and Bentley. Quad Trees: A% Data Structure for Retrieval on UB-Trees Composite Keys. Acta Informatica, Bit Interleaving / Z-Ordering vol. 4, 1974. B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

10 ?

!

Multi-Dimensional Searching a Point Quad Tree Indexing Torsten Grust

Point quad tree (k = 2) Point quad tree search

1 Function: p_search (q, node)

2 if q matches data point in node then B+-trees. . . 3 return data point; . . . over composite keys

4 else Point Quad Trees ? 5 P partition containing q; Point (Equality) Search ← Inserting Data 6 if P points to null then Region Queries 7 return not found; k-d Trees 8 else Balanced Construction 9 node0 node pointed to by P; K-D-B-Trees ← K-D-B-Tree Operations 10 return p_search (q, node0); Splitting a Point Page Splitting a Region Page

R-Trees 1 Function: pointsearch (q) Searching and Inserting Splitting R-Tree Nodes 2 return p_search (q, root) ; UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

11 ?

!

Multi-Dimensional Searching a Point Quad Tree Indexing Torsten Grust

Point quad tree (k = 2) Point quad tree search

1 Function: p_search (q, node)

2 if q matches data point in node then B+-trees. . . 3 return data point; . . . over composite keys

4 else Point Quad Trees ? 5 P partition containing q; Point (Equality) Search ← Inserting Data 6 if P points to null then Region Queries 7 return not found; k-d Trees 8 else Balanced Construction 9 node0 node pointed to by P; K-D-B-Trees ← K-D-B-Tree Operations 10 return p_search (q, node0); Splitting a Point Page Splitting a Region Page

R-Trees 1 Function: pointsearch (q) Searching and Inserting Splitting R-Tree Nodes 2 return p_search (q, root) ; UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

11 !

Multi-Dimensional Searching a Point Quad Tree Indexing Torsten Grust

Point quad tree (k = 2) Point quad tree search

1 Function: p_search (q, node)

2 if q matches data point in node then B+-trees. . . 3 return data point; . . . over composite keys

4 else Point Quad Trees ? 5 P partition containing q; Point (Equality) Search ← Inserting Data 6 if P points to null then Region Queries 7 return not found; k-d Trees 8 else Balanced Construction 9 node0 node pointed to by P; K-D-B-Trees ← K-D-B-Tree Operations 10 return p_search (q, node0); Splitting a Point Page ? Splitting a Region Page R-Trees 1 Function: pointsearch (q) Searching and Inserting Splitting R-Tree Nodes 2 return p_search (q, root) ; UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

11 !

Multi-Dimensional Searching a Point Quad Tree Indexing Torsten Grust

Point quad tree (k = 2) Point quad tree search

1 Function: p_search (q, node)

2 if q matches data point in node then B+-trees. . . 3 return data point; . . . over composite keys

4 else Point Quad Trees ? 5 P partition containing q; Point (Equality) Search ← Inserting Data 6 if P points to null then Region Queries 7 return not found; k-d Trees 8 else Balanced Construction 9 node0 node pointed to by P; K-D-B-Trees ← K-D-B-Tree Operations 10 return p_search (q, node0); Splitting a Point Page ? Splitting a Region Page R-Trees 1 Function: pointsearch (q) Searching and Inserting Splitting R-Tree Nodes 2 return p_search (q, root) ; UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

11 Multi-Dimensional Searching a Point Quad Tree Indexing Torsten Grust

Point quad tree (k = 2) Point quad tree search

1 Function: p_search (q, node)

2 if q matches data point in node then B+-trees. . . 3 return data point; . . . over composite keys

4 else Point Quad Trees ? 5 P partition containing q; Point (Equality) Search ← Inserting Data 6 if P points to null then Region Queries 7 return not found; k-d Trees 8 else Balanced Construction 9 node0 node pointed to by P; K-D-B-Trees ← K-D-B-Tree Operations 10 return p_search (q, node0); Splitting a Point Page ? Splitting a Region Page R-Trees 1 Function: pointsearch (q) Searching and Inserting Splitting R-Tree Nodes 2 return p_search (q, root) ; UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes ! Range Queries Spaces with High Dimensionality

Wrap-Up

11 → →

→ →

Multi-Dimensional Inserting into a Point Quad Tree Indexing

Inserting a point qnew into a quad tree happens analogously to Torsten Grust an insertion into a binary tree:

1 Traverse the tree just like during a search for qnew until you encounter a partition P with a null pointer. 2 Create a new node n0 that spans the same area as P and is B+-trees. . . partitioned by qnew, with all partitions pointing to null. . . . over composite keys 3 Let P point to n0. Point Quad Trees Point (Equality) Search Inserting Data Note that this procedure does not keep the tree balanced. Region Queries

k-d Trees Example (Insertions into an empty point quad tree) Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

12 → →

→ →

Multi-Dimensional Inserting into a Point Quad Tree Indexing

Inserting a point qnew into a quad tree happens analogously to Torsten Grust an insertion into a binary tree:

1 Traverse the tree just like during a search for qnew until you encounter a partition P with a null pointer. 2 Create a new node n0 that spans the same area as P and is B+-trees. . . partitioned by qnew, with all partitions pointing to null. . . . over composite keys 3 Let P point to n0. Point Quad Trees Point (Equality) Search Inserting Data Note that this procedure does not keep the tree balanced. Region Queries

k-d Trees Example (Insertions into an empty point quad tree) Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

12 →

→ →

Multi-Dimensional Inserting into a Point Quad Tree Indexing

Inserting a point qnew into a quad tree happens analogously to Torsten Grust an insertion into a binary tree:

1 Traverse the tree just like during a search for qnew until you encounter a partition P with a null pointer. 2 Create a new node n0 that spans the same area as P and is B+-trees. . . partitioned by qnew, with all partitions pointing to null. . . . over composite keys 3 Let P point to n0. Point Quad Trees Point (Equality) Search Inserting Data Note that this procedure does not keep the tree balanced. Region Queries

k-d Trees Example (Insertions into an empty point quad tree) Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees → Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

12 → →

Multi-Dimensional Inserting into a Point Quad Tree Indexing

Inserting a point qnew into a quad tree happens analogously to Torsten Grust an insertion into a binary tree:

1 Traverse the tree just like during a search for qnew until you encounter a partition P with a null pointer. 2 Create a new node n0 that spans the same area as P and is B+-trees. . . partitioned by qnew, with all partitions pointing to null. . . . over composite keys 3 Let P point to n0. Point Quad Trees Point (Equality) Search Inserting Data Note that this procedure does not keep the tree balanced. Region Queries

k-d Trees Example (Insertions into an empty point quad tree) Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees → → Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

12 →

Multi-Dimensional Inserting into a Point Quad Tree Indexing

Inserting a point qnew into a quad tree happens analogously to Torsten Grust an insertion into a binary tree:

1 Traverse the tree just like during a search for qnew until you encounter a partition P with a null pointer. 2 Create a new node n0 that spans the same area as P and is B+-trees. . . partitioned by qnew, with all partitions pointing to null. . . . over composite keys 3 Let P point to n0. Point Quad Trees Point (Equality) Search Inserting Data Note that this procedure does not keep the tree balanced. Region Queries

k-d Trees Example (Insertions into an empty point quad tree) Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees → → Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High → Dimensionality

Wrap-Up

12 Multi-Dimensional Inserting into a Point Quad Tree Indexing

Inserting a point qnew into a quad tree happens analogously to Torsten Grust an insertion into a binary tree:

1 Traverse the tree just like during a search for qnew until you encounter a partition P with a null pointer. 2 Create a new node n0 that spans the same area as P and is B+-trees. . . partitioned by qnew, with all partitions pointing to null. . . . over composite keys 3 Let P point to n0. Point Quad Trees Point (Equality) Search Inserting Data Note that this procedure does not keep the tree balanced. Region Queries

k-d Trees Example (Insertions into an empty point quad tree) Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees → → Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High → → Dimensionality

Wrap-Up

12 Multi-Dimensional Range Queries Indexing Torsten Grust To evaluate a range query2, we may need to follow several children of a quad tree node node:

Point quad tree range search

B+-trees...... over composite keys

1 Function: r_search (Q, node) Point Quad Trees Point (Equality) Search 2 if data point in node is in Q then Inserting Data 3 append data point to result ; Region Queries k-d Trees 4 foreach partition P in node that intersects with Q do Balanced Construction 5 node0 node pointed to by P ; ← K-D-B-Trees 6 r_search (Q, node0) ; K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees 1 Function: regionsearch (Q) Searching and Inserting Splitting R-Tree Nodes 2 return r_search (Q, root) ; UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality 2 We consider rectangular regions only; other shapes may be answered by Wrap-Up querying for the bounding rectangle and post-processing the output. 13 Multi-Dimensional Point Quad Trees—Discussion Indexing Torsten Grust

Point quad trees " are symmetric with respect to all dimensions and " can answer point queries and region queries. B+-trees...... over composite keys

Point Quad Trees But Point (Equality) Search Inserting Data % the shape of a quad tree depends on the insertion order of Region Queries its content, in the worst case degenerates into a , k-d Trees % Balanced Construction null pointers are space inefficient (particularly for large k). K-D-B-Trees K-D-B-Tree Operations In addition, Splitting a Point Page Splitting a Region Page - they can only store point data. R-Trees Searching and Inserting Splitting R-Tree Nodes Also remember that quad trees were designed for main memory UB-Trees Bit Interleaving / Z-Ordering operation. B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

14 Multi-Dimensional k-d Trees Indexing Torsten Grust

Sample k-d tree (k = 2) • Index k-dimensional data, but keep the tree binary. • For each tree level l use B+-trees. . . a different discriminator . . . over composite keys Point Quad Trees dimension dl along Point (Equality) Search Inserting Data which to partition. Region Queries • Typically: round k-d Trees robin Balanced Construction K-D-B-Trees • This is a k-d tree. K-D-B-Tree Operations Splitting a Point Page Bentley. Multidimensional Splitting a Region Page Binary% Search Trees Used for R-Trees Associative Searching. Comm. ACM, Searching and Inserting vol. 18, no. 9, Sept. 1975. Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

15 Multi-Dimensional k-d Trees Indexing Torsten Grust k-d trees inherit the positive properties of the point quad trees, but improve on space efficiency. For a given point , we can also construct a balanced k-d tree:3

k-d tree construction (Bentley’s OPTIMIZE algorithm) B+-trees...... over composite keys

Point Quad Trees 1 Function: kdtree (pointset, level) Point (Equality) Search 2 if pointset is empty then Inserting Data Region Queries 3 return null ; k-d Trees Balanced Construction 4 else K-D-B-Trees 5 p median from pointset (along dlevel); ← K-D-B-Tree Operations 6 Splitting a Point Page pointsleft v pointset where vdlevel < pdlevel ; ← { ∈ } Splitting a Region Page 7 pointsright v pointset where vdlevel pdlevel p ; ← { ∈ ≥ }\{ } R-Trees 8 n new k-d tree node, with data point p ; Searching and Inserting ← Splitting R-Tree Nodes 9 n.left kdtree (pointsleft, level + 1) ; ← UB-Trees 10 n.right kdtree (pointsright, level + 1) ; ← Bit Interleaving / Z-Ordering 11 return n ; B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up 3 vi: coordinate i of point v. 16 → → →

f a e → → → → h b d g

Resulting tree shape: d e c

g a h f

b

Multi-Dimensional Balanced k-d Tree Construction Indexing Torsten Grust

Example (Step-by-step balanced k-d tree construction)

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

17 → → →

f a e c → → → → h b d g

Resulting tree shape: d e c

g a h f

b

Multi-Dimensional Balanced k-d Tree Construction Indexing Torsten Grust

Example (Step-by-step balanced k-d tree construction)

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

17 → →

f a e c → → → → h b d g

Resulting tree shape: d e c

g a h f

b

Multi-Dimensional Balanced k-d Tree Construction Indexing Torsten Grust

Example (Step-by-step balanced k-d tree construction)

→ B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

17 →

f a e c → → → → h b d g

Resulting tree shape: d e c

g a h f

b

Multi-Dimensional Balanced k-d Tree Construction Indexing Torsten Grust

Example (Step-by-step balanced k-d tree construction)

→ → B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

17 Resulting tree shape: d e c

g a h f

b

Multi-Dimensional Balanced k-d Tree Construction Indexing Torsten Grust

Example (Step-by-step balanced k-d tree construction)

→ → → B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data f Region Queries a e k-d Trees c → → → → h Balanced Construction b d g K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

17 Multi-Dimensional Balanced k-d Tree Construction Indexing Torsten Grust

Example (Step-by-step balanced k-d tree construction)

→ → → B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data f Region Queries a e k-d Trees c → → → → h Balanced Construction b d g K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page d Resulting tree shape: R-Trees Searching and Inserting e c Splitting R-Tree Nodes

UB-Trees g a h f Bit Interleaving / Z-Ordering B+-Trees over Z-Codes b Range Queries Spaces with High Dimensionality

Wrap-Up

17 Multi-Dimensional K-D-B-Trees Indexing Torsten Grust

k-d trees improve on some of the deficiencies of point quad trees: " We can balance a k-d tree by re-building it. (For a limited number of points and in-memory processing, this may B+-trees...... over composite keys be sufficient.) Point Quad Trees " Point (Equality) Search We’re no longer wasting big amounts of space. Inserting Data Region Queries

k-d Trees It’s time to bring k-d trees to the disk. The K-D-B-Tree Balanced Construction • uses pages as an organizational unit K-D-B-Trees K-D-B-Tree Operations (e.g., each node in the K-D-B-tree fills a page) and Splitting a Point Page Splitting a Region Page -d tree-like layout • uses a k to organize each page. R-Trees Searching and Inserting Splitting R-Tree Nodes John T. Robinsion. The K-D-B-Tree: A Search Structure for Large % UB-Trees Multidimensional Dynamic Indexes. SIGMOD 1981. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

18 Multi-Dimensional K-D-B-Tree Idea Indexing Torsten Grust Region pages: page 0 ◦ ··· ◦ • contain entries ··· ◦ ◦ ··· region, pageID h i no null pointers ◦ ◦ • B+-trees. . . • form a balanced . . . over composite keys

tree Point Quad Trees page 3 Point (Equality) Search page 1 • all regions Inserting Data ◦ Region Queries ◦ ◦ disjoint and ··· ◦ k-d Trees ··· ◦ page 2 ··· rectangular Balanced Construction ◦ ◦ ◦ ··· K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page . . Splitting a Region Page . . Point pages: . R-Trees Searching and Inserting page 4 • contain entries Splitting R-Tree Nodes page 5 point, rid UB-Trees h + i Bit Interleaving / Z-Ordering • B -tree leaf B+-Trees over Z-Codes nodes Range Queries ; Spaces with High Dimensionality

Wrap-Up

19 Multi-Dimensional K-D-B-Tree Operations Indexing Torsten Grust

• Searching a K-D-B-Tree works straightforwardly:

• Within each page determine all regions Ri that contain B+-trees. . . the query point q (intersect with the query region Q). . . . over composite keys

• For each of the Ri, consult the page it points to and Point Quad Trees recurse. Point (Equality) Search Inserting Data • On point pages, fetch and return the corresponding Region Queries record for each matching data point pi. k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations • When inserting data, we keep the K-D-B-Tree balanced, Splitting a Point Page much like we did in the B+-tree: Splitting a Region Page R-Trees • Simply insert a region, pageID ( point, rid ) entry into a Searching and Inserting region page (pointh page) if there’si h sufficienti space. Splitting R-Tree Nodes UB-Trees • Otherwise, split the page. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

20 Multi-Dimensional K-D-B-Tree: Splitting a Point Page Indexing Torsten Grust

Splitting a point page p

B+-trees. . . 1 Choose a dimension i and an i-coordinate xi along which to . . . over composite keys

split, such that the split will result in two pages that are not Point Quad Trees overfull. Point (Equality) Search Inserting Data 2 Move data points p with pi < xi and pi > xi to new pages Region Queries p and p (respectively). k-d Trees left right Balanced Construction 3 Replace region, p in the parent of p with left region, pleft K-D-B-Trees h i h i K-D-B-Tree Operations right region, pright . Splitting a Point Page h i Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes Step 3 may cause an overflow in p’s parent and, hence, lead to a UB-Trees split of a region page. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

21 Multi-Dimensional K-D-B-Tree: Splitting a Region Page Indexing Torsten Grust

• Splitting a point page and moving its data points to the resulting pages is straightforward. • In case of a region page split, by contrast, some regions may both B+-trees. . . intersect with sides of the split. . . . over composite keys

Point Quad Trees Consider, e.g., page 0 on slide 19: Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees split K-D-B-Tree Operations Splitting a Point Page region crossing the split Splitting a Region Page R-Trees Searching and Inserting Splitting R-Tree Nodes • Such regions need to be split, too. UB-Trees Bit Interleaving / Z-Ordering • This can cause a recursive splitting downward (!) the tree. B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

22 Multi-Dimensional Example: Page 0 Split in K-D-B-Tree of slide 19 Indexing Torsten Grust Split of region page 0

new root

page 6 B+-trees...... over composite keys

Point Quad Trees ◦ ◦ Point (Equality) Search page 0 Inserting Data Region Queries ◦ ◦ k-d Trees Balanced Construction

page 3 K-D-B-Trees page 7 K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

page 1 page 2 R-Trees Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes • Root page 0 pages 0 and 6 ( creation of new root). Range Queries ⇒ Spaces with High • Region page 1 pages 1 and 7; (point pages not shown). Dimensionality ⇒ Wrap-Up 23 Multi-Dimensional K-D-B-Trees: Discussion Indexing Torsten Grust

K-D-B-Trees " are symmetric with respect to all dimensions,

" cluster data in a space-aware and page-oriented fashion, B+-trees. . . " are dynamic with respect to updates, and . . . over composite keys Point Quad Trees " can answer point queries and region queries. Point (Equality) Search Inserting Data Region Queries

However, k-d Trees - we still don’t have support for region data and Balanced Construction K-D-B-Trees - K-D-B-Trees (like k-d trees) will not handle deletes K-D-B-Tree Operations Splitting a Point Page dynamically. Splitting a Region Page R-Trees This is because we always partitioned the data space such that Searching and Inserting Splitting R-Tree Nodes • every region is rectangular and UB-Trees Bit Interleaving / Z-Ordering • regions never intersect. B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

24 Multi-Dimensional R-Trees Indexing Torsten Grust

R-trees do not have the disjointness requirement: • R-tree inner or leaf nodes contain region, pageID or h i B+-trees. . . region, rid entries (respectively). . . . over composite keys h i region is the minimum bounding rectangle that spans all Point Quad Trees Point (Equality) Search data items reachable by the respective pointer. Inserting Data • Every node contains between d and 2d entries ( B+-tree).4 Region Queries k-d Trees • Insertion and deletion algorithms keep an R-tree; balanced Balanced Construction K-D-B-Trees at all times. K-D-B-Tree Operations Splitting a Point Page R-trees allow the storage of point and region data. Splitting a Region Page R-Trees Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching and Inserting % Splitting R-Tree Nodes Searching. SIGMOD 1984. UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up 4 except the root node 25 Multi-Dimensional R-Tree: Example Indexing Torsten Grust

• order: d = 2 • region data ◦ ◦ B+-trees. . . ◦ page 0 . . . over composite keys

Point Quad Trees Point (Equality) Search

inner nodes Inserting Data Region Queries ◦ ◦ k-d Trees ◦ ··· ◦ ◦ page 3 Balanced Construction ··· ◦ ··· ◦ page 1 ◦ K-D-B-Trees page 2 ··· K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Searching and Inserting Splitting R-Tree Nodes page 8 UB-Trees page 9 Bit Interleaving / Z-Ordering

leaf nodes B+-Trees over Z-Codes page 6 Range Queries page 7 Spaces with High Dimensionality

Wrap-Up

26 Multi-Dimensional R-Tree: Searching and Inserting Indexing Torsten Grust

While searching an R-tree, we may have to descend into more than one child node for point and region queries ( range search in point quad trees, slide 13). % B+-trees...... over composite keys Inserting into an R-tree (cf. B+-tree insertion) Point Quad Trees Point (Equality) Search Inserting Data 1 Choose a leaf node n to insert the new entry. Region Queries k-d Trees • Try to minimize the necessary region enlargement(s). Balanced Construction K-D-B-Trees 2 full split If n is , it (resulting in n and n0) and distribute old K-D-B-Tree Operations and new entries evenly across n and n0. Splitting a Point Page Splitting a Region Page

• Splits may propagate bottom-up and eventually reach R-Trees + the root ( B -tree). Searching and Inserting % Splitting R-Tree Nodes 3 After the insertion, some regions in the ancestor nodes of n UB-Trees Bit Interleaving / Z-Ordering may need to be adjusted to cover the new entry. B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

27 “bad” split “good” split

Multi-Dimensional Splitting an R-Tree Node Indexing Torsten Grust To split an R-tree node, we generally have more than one alternative:

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page

R-Trees Heuristic: Minimize the totally covered area. But: Searching and Inserting • Exhaustive search for the best split is infeasible. Splitting R-Tree Nodes UB-Trees • Guttman proposes two ways to approximate the search. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes • Follow-up papers (e.g., the R*-tree) aim at improving the Range Queries Spaces with High quality of node splits. Dimensionality

Wrap-Up

28 Multi-Dimensional Splitting an R-Tree Node Indexing Torsten Grust To split an R-tree node, we generally have more than one alternative:

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations “bad” split “good” split Splitting a Point Page Splitting a Region Page

R-Trees Heuristic: Minimize the totally covered area. But: Searching and Inserting • Exhaustive search for the best split is infeasible. Splitting R-Tree Nodes UB-Trees • Guttman proposes two ways to approximate the search. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes • Follow-up papers (e.g., the R*-tree) aim at improving the Range Queries Spaces with High quality of node splits. Dimensionality

Wrap-Up

28 Multi-Dimensional R-Tree: Deletes Indexing Torsten Grust All R-tree invariants (see 25) are maintained during deletions.

1 If an R-tree node n underflows (i.e., less than d entries are left after a deletion), the whole node is deleted.

2 Then, all entries that existed in n are re-inserted into the B+-trees. . . R-tree. . . . over composite keys Point Quad Trees Note that Step 1 may lead to a recursive deletion of n’s parent. Point (Equality) Search Inserting Data • Deletion, therefore, is a rather expensive task in an R-tree. Region Queries k-d Trees Balanced Construction

K-D-B-Trees K-D-B-Tree Operations Spatial indexing in mainstream database implementations Splitting a Point Page Splitting a Region Page • Indexing in commercial database systems is typically based R-Trees Searching and Inserting on R-trees. Splitting R-Tree Nodes UB-Trees • Yet, only few systems implement them out of the box (e.g., Bit Interleaving / Z-Ordering PostgreSQL). Most require the licensing/use of specific B+-Trees over Z-Codes Range Queries

extensions. Spaces with High Dimensionality

Wrap-Up

29 Multi-Dimensional Bit Interleaving Indexing Torsten Grust • We saw earlier that a B+-tree over concatenated fields a, b does not help our case, because of the asymmetry betweenh i the role of a and b in the index key. • What happens if we interleave the bits of a and b (hence, + make the B -tree “more symmetric”)? B+-trees...... over composite keys

a = 42 b = 17 Point Quad Trees 00101010 00010001 Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction 0010101000010001 K-D-B-Trees K-D-B-Tree Operations a, b (concatenation) Splitting a Point Page h i Splitting a Region Page a = 42 b = 17 R-Trees Searching and Inserting 00101010 00010001 Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries 0000100110001001 Spaces with High Dimensionality a and b interleaved Wrap-Up 30 Multi-Dimensional Z-Ordering Indexing Torsten Grust

a a

B+-trees...... over composite keys

Point Quad Trees Point (Equality) Search Inserting Data Region Queries

k-d Trees Balanced Construction

K-D-B-Trees b b K-D-B-Tree Operations a, b (concatenated) a and b interleaved Splitting a Point Page h i Splitting a Region Page R-Trees • Both approaches linearize all coordinates in the value space Searching and Inserting according to some order. see also slide 8 Splitting R-Tree Nodes % UB-Trees • Bit interleaving leads to what is called the Z-order. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes • The Z-order (largely) preserves spatial clustering. Range Queries Spaces with High Dimensionality

Wrap-Up

31 + Multi-Dimensional UB-Tree: B -trees Over Z-Codes Indexing Torsten Grust • Use a B+-tree to index Z-codes of multi-dimensional data. • Each leaf in the B+-tree describes an interval in the Z-space. • Each interval in the Z-space describes a region in the multi-dimensional data space: B+-trees...... over composite keys

Point Quad Trees + UB-Tree: B -tree over Z-codes Point (Equality) Search Inserting Data Region Queries

k-d Trees x8 Balanced Construction

x10 K-D-B-Trees x7 x9 K-D-B-Tree Operations x6 Splitting a Point Page Splitting a Region Page

x2 x4 0 3 4 20 21 35 36 47 48 63 R-Trees x1 x3 x5 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes • To retrieve all data points in a query region Q, try to touch Range Queries only those leave pages that intersect with Q. Spaces with High Dimensionality

Wrap-Up

32 Multi-Dimensional UB-Tree: Range Queries Indexing Torsten Grust After each page processed, perform an index re-scan ( ) to fetch the next leaf

page that intersects with Q. http://mistral.in.tum.de/

B+-trees...... over composite keys

Point Quad Trees UB-tree range search (Function ub_range(Q)) Point (Equality) Search Inserting Data 1 cur z(Q ) ; Region Queries ← bottom,left 2 while true do k-d Trees Balanced Construction // search B+-tree page containing cur ( slide 0.0) Example by Volker Markl and Rudolf Bayer, taken from % K-D-B-Trees 3 page search (cur); K-D-B-Tree Operations ← Splitting a Point Page 4 foreach data point p on page do Splitting a Region Page 5 if p is in Q then R-Trees 6 append p to result ; Searching and Inserting Splitting R-Tree Nodes

7 if region in page reaches beyond Qtop,right then UB-Trees Bit Interleaving / Z-Ordering 8 break ; B+-Trees over Z-Codes Range Queries // compute next Z-address using Q and data on current Spaces with High page Dimensionality 9 cur get_next_z_address (Q, page); Wrap-Up ← 33 Multi-Dimensional UB-Trees: Discussion Indexing Torsten Grust

• Routine get_next_z_address() (return next Z-address B+-trees. . . lying in the query region) is non-trivial and depends on the . . . over composite keys shape of the Z-region. Point Quad Trees Point (Equality) Search • UB-trees are fully dynamic, a property inherited from the Inserting Data underlying B+-trees. Region Queries k-d Trees • The use of other space-filling curves to linearize the data Balanced Construction K-D-B-Trees space is discussed in the literature, too. E.g., Hilbert curves. K-D-B-Tree Operations Splitting a Point Page I UB-trees have been commercialized in the Transbase® Splitting a Region Page R-Trees database system. Searching and Inserting Splitting R-Tree Nodes

UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries

Spaces with High Dimensionality

Wrap-Up

34 Multi-Dimensional Spaces with High Dimensionality Indexing Torsten Grust

For large k, all techniques that have been discussed become ineffective: • For example, for k = 100, we get 2100 1030 partitions per ≈ node in a point quad tree. Even with billions of data points, B+-trees. . . almost all of these are empty. . . . over composite keys Point Quad Trees • Consider a really big search region, cube-sized covering Point (Equality) Search Inserting Data 95 % of the range along each dimension: Region Queries data space k-d Trees Balanced Construction

query region K-D-B-Trees K-D-B-Tree Operations For k = 100, the probability of a point Splitting a Point Page Splitting a Region Page

being in this region is still only R-Trees 100 0.95 0.59 %. Searching and Inserting ≈ Splitting R-Tree Nodes 95 % UB-Trees 100 % Bit Interleaving / Z-Ordering B+-Trees over Z-Codes • We experience the curse of dimensionality here. Range Queries Spaces with High Dimensionality

Wrap-Up

35 Multi-Dimensional Page Selectivty for k-NN Search Indexing Torsten Grust N=50’000, image database, k=10

140 Scan R*-Tree 120 X-Tree B+-trees. . . VA-File . . . over composite keys 100 Point Quad Trees Point (Equality) Search Inserting Data 80 Region Queries k-d Trees 60 Balanced Construction K-D-B-Trees K-D-B-Tree Operations 40 Splitting a Point Page Splitting a Region Page

R-Trees % Vector/Leaf blocks visited 20 Searching and Inserting Splitting R-Tree Nodes

0 UB-Trees 0 5 10 15 20 25 30 35 40 45 Bit Interleaving / Z-Ordering Number of dimensions in vectors B+-Trees over Z-Codes Range Queries

Data: Stephen Bloch. What’s Wrong with High-Dimensionality Search. VLDB 2008. Spaces with High Dimensionality

Wrap-Up

36 Multi-Dimensional Wrap-Up Indexing Torsten Grust Point Quad Tree k-dimensional analogy to binary trees; main memory only. k-d Tree, K-D-B-Tree k-d tree: partition space one dimension at a time B+-trees. . . (round-robin); K-D-B-Tree: B+-tree-like organization with . . . over composite keys Point Quad Trees pages as nodes, nodes use a k-d-like structure internally. Point (Equality) Search Inserting Data R-Tree Region Queries regions within a node may overlap; fully dynamic; for point k-d Trees Balanced Construction

and region data. K-D-B-Trees K-D-B-Tree Operations UB-Tree Splitting a Point Page use space-filling curve (Z-order) to linearize k-dimensional Splitting a Region Page + R-Trees data, then use B -tree. Searching and Inserting Splitting R-Tree Nodes

Curse Of Dimensionality UB-Trees most indexing structures become ineffective for large k; fall Bit Interleaving / Z-Ordering B+-Trees over Z-Codes back to sequential scanning and Range Queries Spaces with High approximation/compression. Dimensionality

Wrap-Up

37