<<

15-823 Questions Advanced Topics in Systems Performance  What is spatial data ?

 How to represent spatial data?

Spatial Application  What are the queries?

 How to process the query? Ti ankai Tu Computer Science Department [email protected] mu.edu

Spatial Application 2

Spatial Data:Definition Spatial Data:Features

 Data that pertains to the space occupied  Geometric and varied by objects  Naturally high dimensional  Conceptually, points, lines, rectangles,  Can be either discrete or continuous surfaces, volumes and etc.  May associate with non-spatial attributes  Physically, cities, rivers, roads, states, crop coverage, mountain ranges etc.  Applications include environmental monitoring, geographic information systems, earthquake research etc.

Spatial Application 3 Spatial Application 4

Spatial Data and DBMS A Motivating Example

 How: turn spatial data into tuples  Pittsburgh road database  Termed representative point  Suppose the roads are straight lines (forget about Forbes for this example)  Parameterized representation  A representative point consists of two endpoints, that  Non-spatial attributes stored together is, a tuple of four four items  It works for simple retrieval and queries  Mapping from a two-dimensional (drawing)spacetoa four-dimensional (transformed) space  Why not: unnatural  Queries  Dimension of representative point too high  What are the roads originating from Point State Park?  Deducing dimensionality lose important  Which road is the closest to Wean Hall? physical information such as proximity  Which roads go through CMU?  Clumsy for queries involving space

Spatial Application 5 Spatial Application 6

1 Spatial Data Representation Spatial Indexing

 Follow the natural way: spatial occupancy  All the spatial occupancy methods are  Decompose the space from which the data characterized as employing spatial is drawn into buckets indexing  Four principal approaches  Each block/cell/rectangle only contains  Minimum bounding rectangles: data-dependent information about whether or not it’s  e.g. - [Guttman 1984], R* -tree [Beckmann et al. 1990]  Disjoint cells: data-dependent occupied by the object or part of the object +  e.g. R -tree [Sellis et al. 1987], Cell tree [Günther 1988]  The information is usually in the form of a  Uniform size blocks: data-independent  e.g. Uniform grid [Franklin 1984] pointer to a descriptor of the object  Adaptive regular blocks: data-independent  e.g. -based approach [Same et al. 1985]

Spatial Application 7 Spatial Application 8

Spatial Indexing (cont.) Case Study:R-Tree

 The shaded block only  Goal: efficient retrieval of objects records the fact line according to their spatial location b segment c crosses it a  h e  The part of the line that Conventional DBMS indexing structure g passes through or  i Hashing: cannot handle range query f terminates in a block is d termed q-edge  B-tree: cannot handle multi-dimensional data c  Each q-edge in the block is  Key ideas: represented by a pointer to a record containing the end  Maintain balanced hierarchical tree structure Example collection of line segments embedded in a 4x4 grid points of the line segment  Represent objects by intervals in several  No information about what dimensions part the line crosses it  Take care of secondary memory paging

Spatial Application 9 Spatial Application 10

R-Tree:Example R-Tree:Example (cont.)

R1 R1 b R3 b R3 b R1 R2 a R2 a R6 R2 a R6 h e h e h e g g g i i i R3 R4 R5 R6 f f f d d d c R4 c R4 c R5 R5 a b d g h c i e f

Example collection of line segments Spatial extents of the bounding Spatial extents of the bounding R-tree representation of line segments embedded in a 4x4 grid rectangles for R-tree rectangles for R-tree

Spatial Application 11 Spatial Application 12

2 R-Tree:Properties R-Tree:Properties (cont.)

 Height-balanced tree similar to B-tree  Every node contains between m and M  All leaves appear on the same level entries unless it is the root  Leaf nodes contain index record entries of  M : maximum number of entries that fit in one node. the form (I,tuple-identifier)  m: minimum number of entries in a node (m ≤ M/2)  Nodes correspond to disk pages  tuple-identifier: a tuple in database  I:an n-dimensional rectangle that bounds the spatial  Root node has at least two children unless object indexed it is a leaf  Non-leaf nodes contain entries of the form (I,child-pointers)  child-pointer: the address of a lower node in the R-tree  I : bounds all rectangles in the lower node’sentries

Spatial Application 13 Spatial Application 14

R-Tree:Search R-Tree:Search (cont.)

 Search Algorithm  Acclaimed strength R1  Eliminate irrelevant regions  Given an R-tree rooted at T, find all records R3 b of the indexed space and whose rectangles overlap search rectangle S R2 a R6 h e examine only data near the g  Denote the rectangle part of an index entry E by EI, search area, thus visit only a i and the tuple-identifier or child-pointers by Ep f small number of nodes d  Search subtree:IfT is not a leaf, check each entry E Q R4 c  Criticized drawback: to determine whether EI overlaps S; For all R5 overlapping entries, invoke Search subtree on the  More than one subtree under tree rooted at Ep a node may be searched, Spatial extents of the bounding thus potential search the  Search leaf node:IfT is a leaf, check all entries E to rectangles for R-tree entire determine whether EI overlaps S.Ifso,E is a qualifying record  Bad example:find the line segment passing through Q

Spatial Application 15 Spatial Application 16

R-Tree:Insert R-Tree:Insert (cont.)

 Similar to insertion in B-tree  How to choose leaf node  New index records are added to the  Descend the tree leaves, nodes overflow are split, and split  At each non-leaf node, choose the entry that propagate up the tree needs least enlargement to include the new index entry; follow the child-pointer link of this  Key issues: entry to find next level non-leaf node  Choose leaf node for the first insertion  How to adjust parent nodes  Adjust nodes as changes propagate up  Enlarge I of the entry in the parent node still  Split node if it becomes too full ( > M ) tightly encloses all entry rectangles in child node

Spatial Application 17 Spatial Application 18

3 R-Tree:Insert (cont.) R-Tree:Deletion

 How to split full node  Different from deletion in B-tree in how to  Objective: reduce the number of node to be handle under-full (

Spatial Application 19 Spatial Application 20

R-Tree:Deletion (cont.) R-Tree:Update

 How to shrink rectangle  Update is simple  Shrink to tightly contain all the entry  Delete original index from R-tree rectangles in the child node  Update the index  How to handle under-full node  Reinsert the index into the R-tree  Not merged with sibling  Other search operations are simply  Removed aside till the end and then entries variants of the one described arereinsertedintotheR-tree  Functionally equivalent with improve disk cache hit  Incrementally refine the spatial structure and prevent gradual deterioration due to permanently fixed parent-children relationship

Spatial Application 21 Spatial Application 22

R-Tree:Wrap-up Conclusion

 Exploit strength of B-tree and geometry of  Spatial application is important, esp. in science and underlying objects engineering research  Dynamically update index  Conventional database technique failed to capture  Use heuristic optimization(linear algorithm) to bound overhead the essential nature of spatial application  Many variants have been proposed to  New methods and algorithms have been proposed, improve R-tree each trying to solve part of the problem  Hard to model continuous geo-spatial data such as a field (personal opinion)  No cure-all solution is perceived, which results in very active research in this area

Spatial Application 23 Spatial Application 24

4