CSE 326: Data Structures B-Trees and B+ Trees

CSE 326: Data Structures B-Trees and B+ Trees

Announcements (4/30/08) • Midterm on Friday CSE 326: Data Structures • Special office hour: 4:30-5:30 Thursday in Jaech B-Trees and B+ Trees Gallery (6th floor of CSE building) – This is instead of my usual 11am office hour. Brian Curless • Reading for this lecture: Weiss Sec. 4.7 Spring 2008 2 Traversing very large datasets Memory considerations Suppose we had very many pieces of data (as in a What is in a tree node? In an object? database), e.g., n = 230 ≈ 109. Node: How many (worst case) hops through the tree to find a Object obj; node? Node left; Node right; •BST Node parent; Object: Key key; • AVL …data… Suppose the data is 1KB. How much space does the tree take? •Splay How much of the data can live in 1GB of RAM? 3 4 Cycles to access: CPU Minimizing random disk access Registers 1 In our example, almost all of our data structure is on L1 Cache 2 disk. L2 Cache 30 Thus, hopping through a tree amounts to random accesses to disk. Ouch! Main memory 250 How can we address this problem? Disk Random: 30,000,000 Streamed: 5000 5 6 M-ary Search Tree B-Trees Suppose, somehow, we devised a search tree with How do we make an M-ary search tree work? maximum branching factor M: • Each node has (up to) M-1 keys. •Order property: – subtree between two keys x and y 3 7 12 21 contain leaves with values v such that x < v < y Complete tree has height: # hops for find: x<3 3<x<7 7<x<12 12<x<21 21<x Runtime of find: 7 8 B-Tree Structure Properties B-Tree: Example Root (special case) B-Tree with M = 4 – has between 2 and M children (or root could be a leaf) Internal nodes 10 42 – store up to M-1 keys – have between ⎡M/2⎤ and M children 6 20 27 34 50 Leaf nodes – store between ⎡(M-1)/2⎤ and M-1 sorted keys 1 2 8 9 12 14 16 22 24 28 32 35 38 39 44 47 52 60 – all at the same depth 9 10 B+ Trees B+ Tree Structure Properties In a B+ tree, the internal nodes have no data – only the Root (special case) leaves do! – has between 2 and M children (or root could be a leaf) • Each internal node still has (up to) M-1 keys: Internal nodes •Order property: – store up to M-1 keys – subtree between two keys x and y 3 7 12 21 – have between ⎡M/2⎤ and M children contain leaves with values v Leaf nodes such that x ≤ v < y – where data is stored – Note the “≤” – all at the same depth • Leaf nodes have up to L – contain between ⎡L/2⎤ and L data items sorted keys. x<3 3≤x<7 7≤x<12 12≤x<21 21≤x 11 12 B+ Tree: Example Disk Friendliness B+ Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) What makes B+ trees disk-friendly? Data objects… 12 44 1. Many keys stored in a node which I’ll ignore • All brought to memory/cache in one disk access. in slides 6 20 27 34 50 2. Internal nodes contain only keys; 1, AB.. 6 12 20 27 34 44 50 Only leaf nodes contain keys and actual data 2, GH.. 8 14 22 28 38 47 60 All leaves 4, XY.. 9 16 24 32 39 49 70 at the same • Much of tree structure can be loaded into memory 10 17 41 depth irrespective of data object size 19 • Data actually resides in disk 13 14 Definition for later: “neighbor” is the next sibling to the left or right. B+ trees vs. AVL trees Building a B+ Tree with Insertions Suppose again we have n = 230 ≈ 109 items: • Depth of AVL Tree Insert(3)Insert(18) Insert(14) • Depth of B+ Tree with M = 256, L = 256 The empty B-Tree Great, but how to we actually make a B+ tree and keep it M = 3 L = 3 balanced…? 15 16 18 18 18 3 18 3 18 3 18 Insert(32) Insert(36) 14 30 14 30 14 30 3 3 Insert(30) 14 14 18 18 18 Insert(15) 3 18 14 30 M = 3 L = 3 17 M = 3 L = 3 18 18 32 18 32 3 18 32 3 18 32 Insert(12,40,45,38) Insert(16) 18 14 30 36 14 30 36 18 15 15 15 32 15 32 40 3 15 18 32 3 15 18 32 40 18 32 14 16 30 36 12 16 30 36 45 14 38 18 32 30 36 19 20 M = 3 L = 3 M = 3 L = 3 Insertion Algorithm And Now for Deletion… 1. Insert the key in its leaf in 3. If an internal node ends up sorted order with M+1 children, overflow! 18 Delete(32) 18 2. If the leaf ends up with L+1 – Split the node into two nodes: items, overflow! • original with ⎡(M+1)/2⎤ children – Split the leaf into two nodes: • new one with ⎣(M+1)/2⎦ 15 32 40 15 40 • original with ⎡(L+1)/2⎤ items children • new one with ⎣(L+1)/2⎦ items – Add the new child to the parent 3 15 18 32 40 3 15 18 40 – Add the new child to the parent – If the parent ends up with M+1 – If the parent ends up with M+1 items, overflow! 12 16 30 36 45 12 16 30 45 children, overflow! 4. Split an overflowed root in two and hang the new nodes 14 38 14 This makes the tree deeper! under a new root 5. Propagate keys up tree. 21 M = 3 L = 3 22 18 Delete(15) 18 18 Delete(16) 18 15 36 40 36 40 14 36 40 36 40 3 15 18 36 40 3 18 36 40 3 14 18 36 40 18 36 40 12 16 30 38 45 12 16 30 38 45 12 16 30 38 45 30 38 45 14 M = 3 L = 3 23 M = 3 L = 3 24 Delete(14) 18 Delete(16) 36 36 36 40 18 40 18 40 3 18 36 40 3 3 18 36 40 3 18 36 40 12 30 38 45 12 12 30 38 45 12 30 38 45 14 14 14 M = 3 L = 3 25 M = 3 L = 3 26 Delete(18) 36 36 36 18 40 40 40 3 18 36 40 3 36 40 3 36 40 12 30 38 45 12 38 45 12 38 45 30 M = 3 L = 3 27 M = 3 L = 3 28 Deletion Algorithm Deletion Slide Two 3. If an internal node ends up with fewer than ⎡M/2⎤ children, underflow! 1. Remove the key from its leaf – Adopt from a neighbor; update the parent 2. If the leaf ends up with fewer – If adoption won’t work, merge with neighbor than ⎡L/2⎤ items, underflow! – If the parent ends up with fewer than ⎡ ⎤ – Adopt data from a neighbor; M/2 children, underflow! update the parent – If adopting won’t work, delete This reduces the node and merge with neighbor 4. If the root ends up with only one child, height of the tree! – If the parent ends up with make the child the new root of the tree fewer than ⎡M/2⎤ children, underflow! 29 5. Propagate keys up through tree. 30 Thinking about B+ Trees Tree Names You Might Encounter • B+ Tree insertion can cause (expensive) splitting FYI: and propagation – B-Trees with M = 3, L = x are called 2-3 trees • Nodes can have 2 or 3 keys • B+ Tree deletion can cause (cheap) adoption or – B-Trees with M = 4, L = x are called 2-3-4 trees (expensive) deletion, merging and propagation • Nodes can have 2, 3, or 4 keys • Propagation is rare if M and L are large (Why?) • Pick branching factor M and data items/leaf L such that each node takes one full page/block of memory/disk. 31 32.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us