15-823 Questions Advanced Topics in Database Systems Performance What is spatial data ?
How to represent spatial data?
Spatial Application What are the queries?
How to process the query? Ti ankai Tu Computer Science Department [email protected] mu.edu
Spatial Application 2
Spatial Data:Definition Spatial Data:Features
Data that pertains to the space occupied Geometric and varied by objects Naturally high dimensional Conceptually, points, lines, rectangles, Can be either discrete or continuous surfaces, volumes and etc. May associate with non-spatial attributes Physically, cities, rivers, roads, states, crop coverage, mountain ranges etc. Applications include environmental monitoring, geographic information systems, earthquake research etc.
Spatial Application 3 Spatial Application 4
Spatial Data and DBMS A Motivating Example
How: turn spatial data into tuples Pittsburgh road database Termed representative point Suppose the roads are straight lines (forget about Forbes for this example) Parameterized representation A representative point consists of two endpoints, that Non-spatial attributes stored together is, a tuple of four four items It works for simple retrieval and queries Mapping from a two-dimensional (drawing)spacetoa four-dimensional (transformed) space Why not: unnatural Queries Dimension of representative point too high What are the roads originating from Point State Park? Deducing dimensionality lose important Which road is the closest to Wean Hall? physical information such as proximity Which roads go through CMU? Clumsy for queries involving space
Spatial Application 5 Spatial Application 6
1 Spatial Data Representation Spatial Indexing
Follow the natural way: spatial occupancy All the spatial occupancy methods are Decompose the space from which the data characterized as employing spatial is drawn into buckets indexing Four principal approaches Each block/cell/rectangle only contains Minimum bounding rectangles: data-dependent information about whether or not it’s e.g. R-tree [Guttman 1984], R* -tree [Beckmann et al. 1990] Disjoint cells: data-dependent occupied by the object or part of the object + e.g. R -tree [Sellis et al. 1987], Cell tree [Günther 1988] The information is usually in the form of a Uniform size blocks: data-independent e.g. Uniform grid [Franklin 1984] pointer to a descriptor of the object Adaptive regular blocks: data-independent e.g. Quadtree-based approach [Same et al. 1985]
Spatial Application 7 Spatial Application 8
Spatial Indexing (cont.) Case Study:R-Tree
The shaded block only Goal: efficient retrieval of objects records the fact line according to their spatial location b segment c crosses it a h e The part of the line that Conventional DBMS indexing structure g passes through or i Hashing: cannot handle range query f terminates in a block is d termed q-edge B-tree: cannot handle multi-dimensional data c Each q-edge in the block is Key ideas: represented by a pointer to a record containing the end Maintain balanced hierarchical tree structure Example collection of line segments embedded in a 4x4 grid points of the line segment Represent objects by intervals in several No information about what dimensions part the line crosses it Take care of secondary memory paging
Spatial Application 9 Spatial Application 10
R-Tree:Example R-Tree:Example (cont.)
R1 R1 b R3 b R3 b R1 R2 a R2 a R6 R2 a R6 h e h e h e g g g i i i R3 R4 R5 R6 f f f d d d c R4 c R4 c R5 R5 a b d g h c i e f
Example collection of line segments Spatial extents of the bounding Spatial extents of the bounding R-tree representation of line segments embedded in a 4x4 grid rectangles for R-tree rectangles for R-tree
Spatial Application 11 Spatial Application 12
2 R-Tree:Properties R-Tree:Properties (cont.)
Height-balanced tree similar to B-tree Every node contains between m and M All leaves appear on the same level entries unless it is the root Leaf nodes contain index record entries of M : maximum number of entries that fit in one node. the form (I,tuple-identifier) m: minimum number of entries in a node (m ≤ M/2) Nodes correspond to disk pages tuple-identifier: a tuple in database I:an n-dimensional rectangle that bounds the spatial Root node has at least two children unless object indexed it is a leaf Non-leaf nodes contain entries of the form (I,child-pointers) child-pointer: the address of a lower node in the R-tree I : bounds all rectangles in the lower node’sentries
Spatial Application 13 Spatial Application 14
R-Tree:Search R-Tree:Search (cont.)
Search Algorithm Acclaimed strength R1 Eliminate irrelevant regions Given an R-tree rooted at T, find all records R3 b of the indexed space and whose rectangles overlap search rectangle S R2 a R6 h e examine only data near the g Denote the rectangle part of an index entry E by EI, search area, thus visit only a i and the tuple-identifier or child-pointers by Ep f small number of nodes d Search subtree:IfT is not a leaf, check each entry E Q R4 c Criticized drawback: to determine whether EI overlaps S; For all R5 overlapping entries, invoke Search subtree on the More than one subtree under tree rooted at Ep a node may be searched, Spatial extents of the bounding thus potential search the Search leaf node:IfT is a leaf, check all entries E to rectangles for R-tree entire spatial database determine whether EI overlaps S.Ifso,E is a qualifying record Bad example:find the line segment passing through Q
Spatial Application 15 Spatial Application 16
R-Tree:Insert R-Tree:Insert (cont.)
Similar to insertion in B-tree How to choose leaf node New index records are added to the Descend the tree leaves, nodes overflow are split, and split At each non-leaf node, choose the entry that propagate up the tree needs least enlargement to include the new index entry; follow the child-pointer link of this Key issues: entry to find next level non-leaf node Choose leaf node for the first insertion How to adjust parent nodes Adjust nodes as changes propagate up Enlarge I of the entry in the parent node still Split node if it becomes too full ( > M ) tightly encloses all entry rectangles in child node
Spatial Application 17 Spatial Application 18
3 R-Tree:Insert (cont.) R-Tree:Deletion
How to split full node Different from deletion in B-tree in how to Objective: reduce the number of node to be handle under-full ( Spatial Application 19 Spatial Application 20 R-Tree:Deletion (cont.) R-Tree:Update How to shrink rectangle Update is simple Shrink to tightly contain all the entry Delete original index from R-tree rectangles in the child node Update the index How to handle under-full node Reinsert the index into the R-tree Not merged with sibling Other search operations are simply Removed aside till the end and then entries variants of the one described arereinsertedintotheR-tree Functionally equivalent with improve disk cache hit Incrementally refine the spatial structure and prevent gradual deterioration due to permanently fixed parent-children relationship Spatial Application 21 Spatial Application 22 R-Tree:Wrap-up Conclusion Exploit strength of B-tree and geometry of Spatial application is important, esp. in science and underlying objects engineering research Dynamically update index Conventional database technique failed to capture Use heuristic optimization(linear algorithm) to bound overhead the essential nature of spatial application Many variants have been proposed to New methods and algorithms have been proposed, improve R-tree each trying to solve part of the problem Hard to model continuous geo-spatial data such as a field (personal opinion) No cure-all solution is perceived, which results in very active research in this area Spatial Application 23 Spatial Application 24 4