Chapter 5 Multi-Dimensional Indexing
Total Page:16
File Type:pdf, Size:1020Kb
Multi-Dimensional Indexing Torsten Grust Chapter 5 Multi-Dimensional Indexing Indexing Points and Regions in k Dimensions B+-trees. over composite keys Architecture and Implementation of Database Systems Point Quad Trees Point (Equality) Search Summer 2013 Inserting Data Region Queries k-d Trees Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees Searching and Inserting Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries Torsten Grust Spaces with High Wilhelm-Schickard-Institut für Informatik Dimensionality Wrap-Up Universität Tübingen 1 Multi-Dimensional More Dimensions. Indexing Torsten Grust One SQL query, many range predicates 1 SELECT* 2 FROM CUSTOMERS B+-trees. 3 WHERE ZIPCODE BETWEEN 8880 AND 8999 . over composite keys 4 AND REVENUE BETWEEN 3500 AND 6000 Point Quad Trees Point (Equality) Search Inserting Data This query involves a range predicate in two dimensions. Region Queries k-d Trees Balanced Construction K-D-B-Trees Typical use cases for multi-dimensional data include K-D-B-Tree Operations Splitting a Point Page • geographical data (GIS: Geographical Information Systems), Splitting a Region Page R-Trees • multimedia retrieval (e.g., image or video search), Searching and Inserting • OLAP (Online Analytical Processing), Splitting R-Tree Nodes UB-Trees • queries against encoded data (e.g., XML) Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries Spaces with High Dimensionality Wrap-Up 2 Multi-Dimensional . More Challenges. Indexing Torsten Grust Queries and data can be points or regions B+-trees. over composite keys point queryregion query Point Quad Trees region inclusion Point (Equality) Search region data Inserting Data or intersection Region Queries k-d Trees point data Balanced Construction K-D-B-Trees most interesting: k-nearest-neighbor search (k-NN) K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees Searching and Inserting Splitting R-Tree Nodes . and you can come up with many more meaningful types of UB-Trees queries over multi-dimensional data. Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries Note: All equality searches can be reduced to one-dimensional queries. Spaces with High Dimensionality Wrap-Up 3 Multi-Dimensional Points, Lines, and Regions Indexing Tarifzonen | Fare zones Torsten Grust FEUERTHALEN FLURLINGEN16 SCHLOSS LAUFEN A. RH. DACHSEN RHEINAU STAMMHEIM MARTHALEN OSSINGEN 62 OBERSTAMM- 15 61 HEIM OBERNEUNFORN WIL ZH RAFZ HÜNTWANGEN KLEINANDELFINGEN 14 ANDELFINGEN WASTERKINGEN HÜNTWANGEN- RÜDLINGEN WIL 24 60 FLAACH THALHEIM-ALTIKON KAISERSTUHL AG 13 BUCHBERG B+-trees. ALTIKON HENGGART ZWEIDLEN ELLIKON EGLISAU DINHARD AN DER THUR TEUFEN 63 . over composite keys GLATTFELDEN HETTLINGEN 18 SEUZACH RICKENBACH-ATTIKON STADEL NEFTENBACH REUTLINGEN GUNDETSWIL BEI NIEDERGLATT BÜLACH 23 WALLRÜTI Point Quad Trees BACHS WIESENDANGEN EMBRACH- PFUNGEN- OBERWINTERTHUR NEFTENBACH ELSAU HÖRI RORBAS HEGI ELGG Point (Equality) Search NIEDERWENINGENNIEDERWENINGENDORFSCHÖFFLISDORF- GRÜZE OBERWENINGEN 12 WÜLFLINGEN STEINMAUR WINTERTHUR RÄTERSCHEN SCHOTTIKON NIEDERGLATT OBEREMBRACH TÖSS SCHLEINIKON SEEN ELGG DIELSDORF Inserting Data * EIDBERG 64 REGENSBERG NIEDERHASLI BRÜTTEN 17 20 SCHLATT BEI ZÜRICH SENNHOF-KYBURG BOPPELSEN FLUGHAFEN 21 WINTERTHUR Region Queries KLOTEN BUCHS- KOLLBRUNN DÄLLIKON NÜRENSDORF KYBURG OTELFINGEN OBERGLATT RÄMISMÜHLE- RÜMLANG KEMPTTHAL ZELL HÜTTIKON TURBENTHAL OTELFINGEN 11 BALSBERG RIKON GOLFPARK GLATTBRUGG WEISSLINGEN k-d Trees REGENSDORF- BASSERSDORF OETWIL AN WATT AFFOLTERN OPFIKON DIETLIKON 70 SITZBERG DER LIMMAT 22 71 WALLISELLEN WILDBERG Balanced Construction SPREITENBACH SEEBACH WILA SHOPPING CENTER GEROLDSWIL EFFRETIKON UNTERENGSTRINGEN * ILLNAU DIETIKON 10 OERLIKON 84 SCHLIEREN ALTSTETTEN 21 VOLKETSWIL 35 STELZENACKER WIPKINGEN DÜBENDORF RUSSIKON SALAND STERNENBERG K-D-B-Trees GLANZENBERG HARDBRÜCKE STETTBACH FEHRALTORF BERGDIETIKON URDORF ZÜRICHHB 30 HITTNAU WIEDIKON SCHWERZENBACH ZH REPPISCHHOF SELNAU BAUMA K-D-B-Tree Operations STADELHOFEN NÄNIKON-GREIFENSEE UITIKON WEIHERMATT FÄLLANDEN WALDEGG PFÄFFIKON ZH GIESSHÜBEL BIRMENSDORF ZH ENGE 31 Splitting a Point Page REHALP STEG TIEFENBRUNNEN USTER 72 UETLIBERG ZOLLIKERBERG ZOLLIKON AATHAL BÄRETSWIL 54 WOLLISHOFEN 30 KEMPTEN FISCHENTHAL Splitting a Region Page ZUMIKON MAUR BONSTETTEN- LEIMBACH GOLDBACH WETTSWIL 40 73 KILCHBERG WETZIKON KÜSNACHT FORCH GOSSAU ZH RINGWIL GIBSWIL MÖNCHALTORF OBERLUNKHOFEN SOOD- ZH ERLENBACH ZH OBERLEIMBACH R-Trees WINKEL EGG 55 RÜSCHLIKON AM ZÜRICHSEE 32 HINWIL ADLISWIL 50 HERRLIBERG- 33 HEDINGEN ESSLINGEN 34 SIHLAU FELDMEILEN Searching and Inserting THALWIL GRÜNINGEN OBERDÜRNTEN OTTENBACH WILDPARK-HÖFLI BUBIKON 41 42 WALD FALTIGBERG AFFOLTERN LANGNAU-GATTIKON MEILEN OETWIL 43 OBERRIEDEN UETIKON TANN-DÜRNTEN Splitting R-Tree Nodes AM ALBIS OBERRIEDEN DORF AM SEE HORGEN HOMBRECHTIKON 33 OBFELDEN AEUGST SIHLWALD MÄNNEDORF RÜTI ZH AM ALBIS HORGEN OBERDORF SIHLBRUGG KEMPRATEN 56 AU ZH UB-Trees METTMENSTETTEN HAUSEN 51 AM ALBIS STÄFA UERIKON FELDBACH JONA 80 WAGEN MASCHWANDEN WÄDENSWIL Bit Interleaving / Z-Ordering KNONAU KAPPEL 52 RAPPERSWIL BLUMENAU AM ALBIS HURDEN HIRZEL 83 BURGHALDEN RICHTERSWILBÄCH FREIENBACH 82 B+-Trees over Z-Codes PFÄFFIKON 53 GRÜNENFELD FREIENBACH SCHÖNENBERG ZH WOLLERAU Range Queries SAMSTAGERN SZ Bahnstrecke mit Bahnhof Railway with Station SOB HÜTTEN Regionalbus Regional bus SCHINDELLEGI- 81 FEUSISBERG Schiff Boat Spaces with High 51 Zonennummer 51 Zone number Dimensionality Die Tarifzonen 10 (Stadt Zürich) Fare zones 10 (Zurich City) and 20 10 10 * und 20 (Stadt Winterthur) werden * (Winterthur City) each count as two für die Preisberechnung doppelt zones when calculating the fare. 20* 20* gezählt. Wrap-Up Verbundfahrausweise berechtigen You can travel as often as you like on während der Gültigkeitsdauer zu your ZVV ticket within the fare zones beliebigen Fahrten in den gelösten and period shown on the ticket. Zonen. © Zürcher Verkehrsverbund/PostAuto Region Zürich, 12.2007 4 Multi-Dimensional . More Solutions Indexing Torsten Grust Recent proposals for multi-dimension index structures Quad Tree [Finkel 1974] K-D-B-Tree [Robinson 1981] R-tree [Guttman 1984] Grid File [Nievergelt 1984] + B+-trees. R -tree [Sellis 1987] LSD-tree [Henrich 1989] . over composite keys R*-tree [Geckmann 1990] hB-tree [Lomet 1990] Point Quad Trees Point (Equality) Search Vp-tree [Chiueh 1994] TV-tree [Lin 1994] Inserting Data UB-tree [Bayer 1996] hB-Π-tree [Evangelidis 1995] Region Queries k-d Trees SS-tree [White 1996] X-tree [Berchtold 1996] Balanced Construction M-tree [Ciaccia 1996] SR-tree [Katayama 1997] K-D-B-Trees K-D-B-Tree Operations Pyramid [Berchtold 1998] Hybrid-tree [Chakrabarti 1999] Splitting a Point Page DABS-tree [Böhm 1999] IQ-tree [Böhm 2000] Splitting a Region Page Slim-tree [Faloutsos 2000] Landmark file [Böhm 2000] R-Trees Searching and Inserting P-Sphere-tree [Goldstein 2000] A-tree [Sakurai 2000] Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes None of these provides a “fits all” solution. Range Queries Spaces with High Dimensionality Wrap-Up 5 + Multi-Dimensional Can We Just Use a B -tree? Indexing Torsten Grust • Can two B+-trees, one over ZIPCODE and over REVENUE, adequately support a two-dimensional range query? Querying a rectangle B+-trees. over composite keys Point Quad Trees Point (Equality) Search Inserting Data Region Queries k-d Trees Balanced Construction K-D-B-Trees K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees • Can only scan along either index at once, and both of them Searching and Inserting produce many false hits. Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering • If all you have are these two indices, you can do index B+-Trees over Z-Codes intersection: perform both scans in separation to obtain the Range Queries Spaces with High rids of candidate tuples. Then compute the intersection Dimensionality between the two rid lists (DB2: IXAND). Wrap-Up 6 Multi-Dimensional IBM DB2: Index Intersection Indexing Torsten Grust IBM DB2 uses index Plan selected by the intersection (operator IXAND) query optimizer B+-trees. 1 SELECT* . over composite keys 2 FROM ACCEL Point Quad Trees 3 WHERE PRE BETWEEN0 AND 10000 Point (Equality) Search Inserting Data 4 AND POST BETWEEN 10000 AND 20000 Region Queries k-d Trees Relevant indexes defined over table Balanced Construction ACCEL: K-D-B-Trees K-D-B-Tree Operations 1 Splitting a Point Page IPOST over column POST Splitting a Region Page 2 SQL0801192024500 over R-Trees Searching and Inserting column PRE (primary key) Splitting R-Tree Nodes UB-Trees Bit Interleaving / Z-Ordering B+-Trees over Z-Codes Range Queries Spaces with High Dimensionality Wrap-Up 7 Multi-Dimensional Can Composite Keys Help? Indexing Torsten Grust Indexes with composite keys REVENUE REVENUE B+-trees. over composite keys Point Quad Trees Point (Equality) Search Inserting Data Region Queries k-d Trees ZIPCODE ZIPCODE Balanced Construction REVENUE, ZIPCODE index ZIPCODE, REVENUE index K-D-B-Trees h i h i K-D-B-Tree Operations Splitting a Point Page Splitting a Region Page R-Trees Searching and Inserting • Almost the same thing! Splitting R-Tree Nodes Indices over composite keys are not symmetric: The major UB-Trees + Bit Interleaving / Z-Ordering attribute dominates the organization of the B -tree. B+-Trees over Z-Codes Range Queries • But: Since the minor attribute is also stored in the index Spaces with High (part of