Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Multi-Dimensional Indexing Torsten Grust Chapter 5 Multi-Dimensional Indexing k Indexing Points and Regions in Dimensions B+-trees. over composite keys Architecture and Implementation of Database Systems Point Quad Trees Point (Equality) Search Summer 2016 Inserting Data Region Queries k-d Trees Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees Searching and Inserting Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Torsten Grust Range Queries Wilhelm-Schickard-Institut für Informatik Spaces with High Dimensionality Universität Tübingen Wrap-Up 1 Multi-Dimensional More Dimensions. Indexing Torsten Grust One SQL query, many range predicates 1 SELECT* 2 FROM CUSTOMERS 3 WHERE ZIPCODE BETWEEN 8880 AND 8999 B+-trees. over composite keys 4 AND REVENUE BETWEEN 3500 AND 6000 Point Quad Trees Point (Equality) Search range predicate two Inserting Data This query involves a in dimensions. Region Queries k-d Trees Balanced Construction Typical use cases for multi-dimensional data include K-D-B-Trees K-D-B-Tree Operations • geographical data (GIS: Geographical Information Splitting a Point Page Splitting a Region Page Systems), R-Trees multimedia retrieval e.g. Searching and Inserting • ( , image or video search), Splitting R-Tree Nodes • OLAP (Online Analytical Processing), UB-Trees Bit Interleaving / Z-Ordering • queries against encoded data (e.g., XML) B+-Trees over Z-Codes Range Queries Spaces with High Dimensionality Wrap-Up 2 Multi-Dimensional . More Challenges. Indexing Torsten Grust Queries and data can be points or regions B+-trees. over composite keys point queryregion query Point Quad Trees region inclusion Point (Equality) Search region data or intersection Inserting Data Region Queries point data k-d Trees Balanced Construction K-D-B-Trees most interesting: k-nearest-neighbor search (k-NN) K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees Searching and Inserting . and you can come up with many more meaningful types of Splitting R-Tree Nodes UB-Trees queries over multi-dimensional data. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Note: All equality searches can be reduced to one-dimensional queries. Range Queries Spaces with High Dimensionality Wrap-Up 3 Multi-Dimensional Points, Lines, and Regions Indexing Tarifzonen | Fare zones Torsten Grust FEUERTHALEN FLURLINGEN16 SCHLOSS LAUFEN A. RH. DACHSEN RHEINAU STAMMHEIM MARTHALEN OSSINGEN 62 OBERSTAMM- 15 61 HEIM OBERNEUNFORN WIL ZH RAFZ HÜNTWANGEN KLEINANDELFINGEN 14 ANDELFINGEN WASTERKINGEN HÜNTWANGEN- RÜDLINGEN WIL 24 60 FLAACH THALHEIM-ALTIKON KAISERSTUHL AG 13 BUCHBERG ALTIKON HENGGART B+-trees. ZWEIDLEN ELLIKON EGLISAU DINHARD AN DER THUR TEUFEN 63 GLATTFELDEN HETTLINGEN . over composite keys 18 SEUZACH RICKENBACH-ATTIKON STADEL NEFTENBACH REUTLINGEN GUNDETSWIL BEI NIEDERGLATT BÜLACH 23 WALLRÜTI Point Quad Trees BACHS WIESENDANGEN EMBRACH- PFUNGEN- OBERWINTERTHUR NEFTENBACH ELSAU HÖRI RORBAS HEGI ELGG NIEDERWENINGENNIEDERWENINGENDORFSCHÖFFLISDORF- GRÜZE OBERWENINGEN 12 WÜLFLINGEN Point (Equality) Search STEINMAUR WINTERTHUR RÄTERSCHEN SCHOTTIKON NIEDERGLATT OBEREMBRACH TÖSS SCHLEINIKON SEEN ELGG DIELSDORF Inserting Data * EIDBERG 64 REGENSBERG NIEDERHASLI BRÜTTEN 17 20 SCHLATT BEI ZÜRICH SENNHOF-KYBURG BOPPELSEN FLUGHAFEN 21 WINTERTHUR KLOTEN Region Queries BUCHS- KOLLBRUNN DÄLLIKON NÜRENSDORF KYBURG OTELFINGEN OBERGLATT RÄMISMÜHLE- RÜMLANG KEMPTTHAL ZELL HÜTTIKON TURBENTHAL OTELFINGEN 11 BALSBERG RIKON GOLFPARK GLATTBRUGG WEISSLINGEN REGENSDORF- BASSERSDORF k-d Trees OETWIL AN WATT AFFOLTERN OPFIKON DIETLIKON 70 SITZBERG DER LIMMAT 22 71 WALLISELLEN WILDBERG SPREITENBACH SEEBACH WILA SHOPPING CENTER GEROLDSWIL EFFRETIKON Balanced Construction UNTERENGSTRINGEN * ILLNAU DIETIKON 10 OERLIKON 84 SCHLIEREN ALTSTETTEN 21 VOLKETSWIL 35 STELZENACKER WIPKINGEN DÜBENDORF RUSSIKON SALAND STERNENBERG GLANZENBERG HARDBRÜCKE STETTBACH FEHRALTORF K-D-B-Trees BERGDIETIKON URDORF ZÜRICHHB 30 HITTNAU WIEDIKON SCHWERZENBACH ZH REPPISCHHOF SELNAU BAUMA STADELHOFEN NÄNIKON-GREIFENSEE UITIKON WEIHERMATT FÄLLANDEN K-D-B-Tree Operations WALDEGG PFÄFFIKON ZH GIESSHÜBEL BIRMENSDORF ZH ENGE 31 REHALP STEG TIEFENBRUNNEN USTER 72 Splitting a Point Page UETLIBERG ZOLLIKERBERG ZOLLIKON AATHAL BÄRETSWIL 54 WOLLISHOFEN 30 KEMPTEN FISCHENTHAL ZUMIKON MAUR Splitting a Region Page BONSTETTEN- LEIMBACH GOLDBACH WETTSWIL 40 73 KILCHBERG WETZIKON KÜSNACHT FORCH GOSSAU ZH RINGWIL GIBSWIL MÖNCHALTORF OBERLUNKHOFEN SOOD- ZH ERLENBACH ZH OBERLEIMBACH WINKEL EGG R-Trees 55 RÜSCHLIKON AM ZÜRICHSEE 32 HINWIL ADLISWIL 50 HERRLIBERG- 33 HEDINGEN ESSLINGEN 34 SIHLAU FELDMEILEN THALWIL GRÜNINGEN OBERDÜRNTEN OTTENBACH WILDPARK-HÖFLI BUBIKON Searching and Inserting 41 42 WALD FALTIGBERG AFFOLTERN LANGNAU-GATTIKON MEILEN OETWIL 43 OBERRIEDEN UETIKON TANN-DÜRNTEN AM ALBIS OBERRIEDEN DORF AM SEE Splitting R-Tree Nodes HORGEN HOMBRECHTIKON 33 OBFELDEN AEUGST SIHLWALD MÄNNEDORF RÜTI ZH AM ALBIS HORGEN OBERDORF SIHLBRUGG KEMPRATEN 56 AU ZH METTMENSTETTEN HAUSEN 51 AM ALBIS STÄFA UERIKON FELDBACH JONA 80 WAGEN UB-Trees MASCHWANDEN WÄDENSWIL KNONAU KAPPEL 52 RAPPERSWIL BLUMENAU AM ALBIS Bit Interleaving / HURDEN HIRZEL 83 BURGHALDEN RICHTERSWILBÄCH FREIENBACH 82 Z-Ordering PFÄFFIKON 53 GRÜNENFELD FREIENBACH SCHÖNENBERG ZH WOLLERAU SAMSTAGERN B+-Trees over Z-Codes SZ Bahnstrecke mit Bahnhof Railway with Station SOB HÜTTEN Regionalbus Regional bus SCHINDELLEGI- 81 Range Queries FEUSISBERG Schiff Boat 51 Zonennummer 51 Zone number Die Tarifzonen 10 (Stadt Zürich) Fare zones 10 (Zurich City) and 20 10 10 Spaces with High * und 20 (Stadt Winterthur) werden * (Winterthur City) each count as two für die Preisberechnung doppelt zones when calculating the fare. 20* 20* gezählt. Dimensionality Verbundfahrausweise berechtigen You can travel as often as you like on während der Gültigkeitsdauer zu your ZVV ticket within the fare zones beliebigen Fahrten in den gelösten and period shown on the ticket. Zonen. © Zürcher Verkehrsverbund/PostAuto Region Zürich, 12.2007 Wrap-Up 4 Multi-Dimensional . More Solutions Indexing Torsten Grust Recent proposals for multi-dimension index structures Quad Tree [Finkel 1974] K-D-B-Tree [Robinson 1981] R-tree [Guttman 1984] Grid File [Nievergelt 1984] + B+-trees. R -tree [Sellis 1987] LSD-tree [Henrich 1989] . over composite keys R*-tree [Geckmann 1990] hB-tree [Lomet 1990] Point Quad Trees Point (Equality) Search Vp-tree [Chiueh 1994] TV-tree [Lin 1994] Inserting Data Π UB-tree [Bayer 1996] hB- -tree [Evangelidis 1995] Region Queries k-d Trees SS-tree [White 1996] X-tree [Berchtold 1996] Balanced Construction M-tree [Ciaccia 1996] SR-tree [Katayama 1997] K-D-B-Trees K-D-B-Tree Operations Pyramid [Berchtold 1998] Hybrid-tree [Chakrabarti 1999] Splitting a Point Page DABS-tree [Böhm 1999] IQ-tree [Böhm 2000] Splitting a Region Page Slim-tree [Faloutsos 2000] Landmark file [Böhm 2000] R-Trees Searching and Inserting P-Sphere-tree [Goldstein 2000] A-tree [Sakurai 2000] Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes None of these provides a “fits all” solution. Range Queries Spaces with High Dimensionality Wrap-Up 5 + Multi-Dimensional Can We Just Use a B -tree? Indexing Torsten Grust • Can two B+-trees, one over ZIPCODE and over REVENUE, adequately support a two-dimensional range query? Querying a rectangle B+-trees. over composite keys Point Quad Trees Point (Equality) Search Inserting Data Region Queries k-d Trees Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page • Can only scan along either index at once, and both of them R-Trees Searching and Inserting produce many false hits. Splitting R-Tree Nodes UB-Trees Bit Interleaving / • If all you have are these two indices, you can do index Z-Ordering intersection: B+-Trees over Z-Codes perform both scans in separation to obtain the Range Queries rids of candidate tuples. Then compute the intersection Spaces with High between the two rid lists (DB2: IXAND). Dimensionality Wrap-Up 6 Multi-Dimensional IBM DB2: Index Intersection Indexing Torsten Grust IBM DB2 uses index Plan selected by the intersection (operator IXAND) query optimizer B+-trees. 1 SELECT* . over composite keys 2 FROM ACCEL Point Quad Trees 3 WHERE PRE BETWEEN0 AND 10000 Point (Equality) Search Inserting Data 4 AND POST BETWEEN 10000 AND 20000 Region Queries k-d Trees Relevant indexes defined over table Balanced Construction K-D-B-Trees ACCEL: K-D-B-Tree Operations Splitting a Point Page 1 IPOST over column POST Splitting a Region Page R-Trees 2 SQL0801192024500 over Searching and Inserting column PRE (primary key) Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries Spaces with High Dimensionality Wrap-Up 7 Multi-Dimensional Can Composite Keys Help? Indexing Indexes with composite keys Torsten Grust REVENUE REVENUE B+-trees. over composite keys Point Quad Trees Point (Equality) Search Inserting Data Region Queries ZIPCODE ZIPCODE k-d Trees Balanced Construction hREVENUE; ZIPCODEi index hZIPCODE; REVENUEi index K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees • Almost the same thing! Searching and Inserting Splitting R-Tree Nodes Indices over composite keys are not symmetric: The major UB-Trees + Bit Interleaving / attribute dominates the organization of the B -tree. Z-Ordering B+-Trees over Z-Codes • But: Since the minor attribute is also stored in the index Range Queries (part of the keys k), we may