AVL-Tree (Classic Example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree – …
Total Page:16
File Type:pdf, Size:1020Kb
Relational Database Systems 2 4. Trees & Advanced Indexes Silke Eckstein Benjamin Köhncke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 3 Indexing • Buffer Management is very important – Holds DB blocks in primary memory • DB block are made up of several FS blocks • Find good strategies to have requested DB blocks available when needed – Each block holds some meta data and row data • Indexes drastically speed up queries – Less blocks need to be scanned – Primary Index • On primary key attribute, usually influences row storage order – Secondary Index • On any attribute, does not influence storage order Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 2 4 Trees & Advanced Indexes 4.1 Introduction 4.2 Binary Search Trees 4.3 Self Balancing Binary Search Trees 4.4 B-Trees Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 3 4.1 Introduction • Indexes need a suitable data structure – For efficient index look-ups search keys need to be ordered • Remember: All indexes should be stored in a separate database file, not together with data – A suitable number of DB blocks (adjacent on disk) is reserved at index creation time – If the space is not sufficient, another file is created and linked to the original index file Search Key 1 Block Address 1 Search Key 2 Block Address 2 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 4 4.1 Introduction • Search within an index – Bisection search possible: ⌈log2n⌉; O(log n) 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 – But usually indexes span several DB blocks • If index is in n blocks, O(n) blocks need to be read from disk • Example: search for 92 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 5 4.1 Introduction • Maintenance of index is also difficult – Insert a new search key with value 5! 5 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 4 5 6 7 16 18 21 24 33 39 47 68 72 89 92 99 – In worst case, all cells need to be shifted and all blocks need to be accessed • Similar problem occurs when deleting a value – Often: do not shift values, but mark key as deleted Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 6 4.1 Introduction • In this lecture, we discuss more efficient multi- level data structures – B-trees • Prevalent in database systems • Better access performance • Much better update performance • To understand B-trees better, we start by examining binary search trees Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 7 4.2 Binary Trees • Binary trees are – Rooted and directed trees – Each node has none, one or two children – Each node (except root) has exactly one parent 0/1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 8 4.2 Binary Trees root • Some naming conventions – Nodes without children are called leaf nodes subtree red – The depth of node N is the path length from the root red – The tree height is the maximum node depth – If there is a path from node N1 to node N2, N1 is an ancestor of N2 and N2 is a descendant of N1 – The size of a node N is the number of Leaf nodes all descendants of N including itself – A subtree of a node N is formed of tree height = 3 all descendant nodes including N and the respective links red node: size = 3 depth = 1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 9 4.2 Binary Trees • Properties of binary trees • Full binary tree (or proper) – Each node has either zero or two children • Perfect binary tree Full and Balanced – All leaf nodes have the same depth – With height h, contains 2h nodes • Height-balanced binary tree – Depth of all leaf nodes differ by at most 1 Full and Perfect – With height h, contains between 2h-1 and 2h nodes • Degenerated binary tree – Each node has either zero or one child – Behaves like a linked list: search in O(n) Degenerated Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 10 4.2 Binary Search Trees • Binary search trees are binary trees with – Each node has a unique value assigned – There is a total order on all values – Left subtree of a node contains only values less than node value – Right subtree of a node contains only values larger than the node value – Aiming for O(log n) search complexity • Structurally resembles bisection search 57 0/1 33 85 17 42 61 99 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 11 4.2 Binary Search Trees • Constructing and inserting into binary search trees – Values are inserted incrementally – First value is root – Additional values sink into tree • Sink to left subtree if value smaller • Sink to right subtree if value larger • Attach to last node as left/right child, if subtree is empty • Insert order of values does highly influence resulting and intermediate tree properties Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 12 4.2 Binary Search Trees • Suppose insert order 57, 33, 42, 85, 17, 61, 99 85 57 33 57 42 17 85 33 61 17 99 61 42 Insert 57 99 Insert 33, 42 – Degenerated 57 61 57 99 33 85 33 85 17 42 17 42 61 99 Insert 85, 17 – Full and Balanced Insert 61, 99 –Perfect and Full Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 13 4.2 Binary Search Trees • Suppose insert order – 99, 85, 61, 57, 42, 33, 17 99 85 • Insert complexity is thus 61 – O(n) worst case 57 – O(log n) average case 42 33 17 Degenerated Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 14 4.2 Binary Search Tree • Search for a Key – Start with root 57 – Recursive Procedure • If node value = v 33 85 – Return node • If node is leaf 17 42 61 99 – Value not found • if v < node value – Descend to left subtree Else – Descend to right subtree • Complexity: – Average case: O(log n) – Worst case: O(n) – degenerated tree Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 15 4.2 Binary Search Tree 57 • Tree Traversal 33 85 – Accesses all nodes of the tree • Pre-Order 17 42 61 99 – Visit node – Traverse left subtree 35 49 – Traverse right subtree Pre-Order: 57-33-17-42-35-49-85-61-99 • In-Order (sorted access) – Traverse left subtree 57 – Visit node – Traverse right subtree 33 85 • Post-Order – Traverse left subtree 17 42 61 99 – Traverse right subtree – Visit node 35 49 – 17–35–49–42–33–61–99–85–57 In-Order: 17-33-35-42-49-57-61-85-99 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 16 4.2 Binary Search Tree • Deleting nodes has complexity O(n) worst case, O(log n) average case – Locate the node to delete by tree search – If node is leaf, just delete it – If node has one child, delete node and attach child to parent – If node has two children • Replace either by a) in-order successor (the left-most child of the right subtree) b) in-order predecessor (the right-most child of the left subtree) • Example: delete search key with value 57 a) b) 57 83 27 27 83 27 86 22 83 22 86 22 86 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 17 4.2 Binary Search Trees • Summary – Very simple, dynamic data structure – Efficient on average • O(log n) for all operations – Can be very inefficient for degenerated cases • O(n) for all operations 0/1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 18 4.3 Self-Balancing Binary Search Trees • Observation: – Binary Search Trees are very efficient when perfect or balanced • Idea: – Continuously optimize tree structure to keep tree balanced • Popular Implementations – AVL-Tree (classic example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree – … Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 19 4.3 Self-Balancing Binary Search Trees • Basic Concepts for Deletion: Global Rebuild (Lazy Deletion) – Start with balanced tree – Don’t delete a node, just mark it as deleted • Search algorithm scans deleted nodes, but does not return them – If Rebuild Condition is met, rebuild the whole tree without the deleted nodes • “Rebuild as soon as half of the nodes are marked as deleted” • Complete rebuild can be performed in O(n) Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 20 4.3 Self-Balancing Binary Search Trees • Global Rebuild (cont.) – Search Efficiency • n number of unmarked nodes • Tree is balanced, contains max 2n nodes overall – Number of accesses during search usually just increases by 1 – O(log n) – Delete Efficiency • Global rebuild is in O(n) – But only necessary after n deletions – Amortized additional costs per deletion is O(1) • Overall complexity – Average: O(log n) – Worst Case: O(n), if actual rebuild is performed Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 21 4.3 Self-Balancing Binary Search Trees • Global Rebuild (cont.) – Direct Deletion with Rebuild • Similar complexity as with lazy deletion – Increased per delete effort – Reduced per search effort until rebuild • Delete nodes as in normal binary trees – Increment deletion counter cd • Rebuild tree as soon as cd = n, reset cd 57 85 33 85 17 99 17 42 61 99 Delete 57,33,42,61 Rebuild Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 22 4.3 Self-Balancing Binary Search Trees • Basic Concepts for Insertion and Deletion: Local Balancing (Subtree Balancing) – Start with balanced