AVL-Tree (Classic Example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree – …

Relational Database Systems 2 4. Trees & Advanced Indexes Silke Eckstein Benjamin Köhncke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 3 Indexing • Buffer Management is very important – Holds DB blocks in primary memory • DB block are made up of several FS blocks • Find good strategies to have requested DB blocks available when needed – Each block holds some meta data and row data • Indexes drastically speed up queries – Less blocks need to be scanned – Primary Index • On primary key attribute, usually influences row storage order – Secondary Index • On any attribute, does not influence storage order Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 2 4 Trees & Advanced Indexes 4.1 Introduction 4.2 Binary Search Trees 4.3 Self Balancing Binary Search Trees 4.4 B-Trees Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 3 4.1 Introduction • Indexes need a suitable data structure – For efficient index look-ups search keys need to be ordered • Remember: All indexes should be stored in a separate database file, not together with data – A suitable number of DB blocks (adjacent on disk) is reserved at index creation time – If the space is not sufficient, another file is created and linked to the original index file Search Key 1 Block Address 1 Search Key 2 Block Address 2 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 4 4.1 Introduction • Search within an index – Bisection search possible: ⌈log2n⌉; O(log n) 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 – But usually indexes span several DB blocks • If index is in n blocks, O(n) blocks need to be read from disk • Example: search for 92 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 5 4.1 Introduction • Maintenance of index is also difficult – Insert a new search key with value 5! 5 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 4 5 6 7 16 18 21 24 33 39 47 68 72 89 92 99 – In worst case, all cells need to be shifted and all blocks need to be accessed • Similar problem occurs when deleting a value – Often: do not shift values, but mark key as deleted Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 6 4.1 Introduction • In this lecture, we discuss more efficient multi- level data structures – B-trees • Prevalent in database systems • Better access performance • Much better update performance • To understand B-trees better, we start by examining binary search trees Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 7 4.2 Binary Trees • Binary trees are – Rooted and directed trees – Each node has none, one or two children – Each node (except root) has exactly one parent 0/1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 8 4.2 Binary Trees root • Some naming conventions – Nodes without children are called leaf nodes subtree red – The depth of node N is the path length from the root red – The tree height is the maximum node depth – If there is a path from node N1 to node N2, N1 is an ancestor of N2 and N2 is a descendant of N1 – The size of a node N is the number of Leaf nodes all descendants of N including itself – A subtree of a node N is formed of tree height = 3 all descendant nodes including N and the respective links red node: size = 3 depth = 1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 9 4.2 Binary Trees • Properties of binary trees • Full binary tree (or proper) – Each node has either zero or two children • Perfect binary tree Full and Balanced – All leaf nodes have the same depth – With height h, contains 2h nodes • Height-balanced binary tree – Depth of all leaf nodes differ by at most 1 Full and Perfect – With height h, contains between 2h-1 and 2h nodes • Degenerated binary tree – Each node has either zero or one child – Behaves like a linked list: search in O(n) Degenerated Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 10 4.2 Binary Search Trees • Binary search trees are binary trees with – Each node has a unique value assigned – There is a total order on all values – Left subtree of a node contains only values less than node value – Right subtree of a node contains only values larger than the node value – Aiming for O(log n) search complexity • Structurally resembles bisection search 57 0/1 33 85 17 42 61 99 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 11 4.2 Binary Search Trees • Constructing and inserting into binary search trees – Values are inserted incrementally – First value is root – Additional values sink into tree • Sink to left subtree if value smaller • Sink to right subtree if value larger • Attach to last node as left/right child, if subtree is empty • Insert order of values does highly influence resulting and intermediate tree properties Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 12 4.2 Binary Search Trees • Suppose insert order 57, 33, 42, 85, 17, 61, 99 85 57 33 57 42 17 85 33 61 17 99 61 42 Insert 57 99 Insert 33, 42 – Degenerated 57 61 57 99 33 85 33 85 17 42 17 42 61 99 Insert 85, 17 – Full and Balanced Insert 61, 99 –Perfect and Full Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 13 4.2 Binary Search Trees • Suppose insert order – 99, 85, 61, 57, 42, 33, 17 99 85 • Insert complexity is thus 61 – O(n) worst case 57 – O(log n) average case 42 33 17 Degenerated Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 14 4.2 Binary Search Tree • Search for a Key – Start with root 57 – Recursive Procedure • If node value = v 33 85 – Return node • If node is leaf 17 42 61 99 – Value not found • if v < node value – Descend to left subtree Else – Descend to right subtree • Complexity: – Average case: O(log n) – Worst case: O(n) – degenerated tree Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 15 4.2 Binary Search Tree 57 • Tree Traversal 33 85 – Accesses all nodes of the tree • Pre-Order 17 42 61 99 – Visit node – Traverse left subtree 35 49 – Traverse right subtree Pre-Order: 57-33-17-42-35-49-85-61-99 • In-Order (sorted access) – Traverse left subtree 57 – Visit node – Traverse right subtree 33 85 • Post-Order – Traverse left subtree 17 42 61 99 – Traverse right subtree – Visit node 35 49 – 17–35–49–42–33–61–99–85–57 In-Order: 17-33-35-42-49-57-61-85-99 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 16 4.2 Binary Search Tree • Deleting nodes has complexity O(n) worst case, O(log n) average case – Locate the node to delete by tree search – If node is leaf, just delete it – If node has one child, delete node and attach child to parent – If node has two children • Replace either by a) in-order successor (the left-most child of the right subtree) b) in-order predecessor (the right-most child of the left subtree) • Example: delete search key with value 57 a) b) 57 83 27 27 83 27 86 22 83 22 86 22 86 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 17 4.2 Binary Search Trees • Summary – Very simple, dynamic data structure – Efficient on average • O(log n) for all operations – Can be very inefficient for degenerated cases • O(n) for all operations 0/1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 18 4.3 Self-Balancing Binary Search Trees • Observation: – Binary Search Trees are very efficient when perfect or balanced • Idea: – Continuously optimize tree structure to keep tree balanced • Popular Implementations – AVL-Tree (classic example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree – … Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 19 4.3 Self-Balancing Binary Search Trees • Basic Concepts for Deletion: Global Rebuild (Lazy Deletion) – Start with balanced tree – Don’t delete a node, just mark it as deleted • Search algorithm scans deleted nodes, but does not return them – If Rebuild Condition is met, rebuild the whole tree without the deleted nodes • “Rebuild as soon as half of the nodes are marked as deleted” • Complete rebuild can be performed in O(n) Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 20 4.3 Self-Balancing Binary Search Trees • Global Rebuild (cont.) – Search Efficiency • n number of unmarked nodes • Tree is balanced, contains max 2n nodes overall – Number of accesses during search usually just increases by 1 – O(log n) – Delete Efficiency • Global rebuild is in O(n) – But only necessary after n deletions – Amortized additional costs per deletion is O(1) • Overall complexity – Average: O(log n) – Worst Case: O(n), if actual rebuild is performed Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 21 4.3 Self-Balancing Binary Search Trees • Global Rebuild (cont.) – Direct Deletion with Rebuild • Similar complexity as with lazy deletion – Increased per delete effort – Reduced per search effort until rebuild • Delete nodes as in normal binary trees – Increment deletion counter cd • Rebuild tree as soon as cd = n, reset cd 57 85 33 85 17 99 17 42 61 99 Delete 57,33,42,61 Rebuild Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 22 4.3 Self-Balancing Binary Search Trees • Basic Concepts for Insertion and Deletion: Local Balancing (Subtree Balancing) – Start with balanced

AVL-Tree (Classic Example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree – …

Lecture 26 Fall 2019 Instructors: B&S Administrative Details

Search Trees

Comparison of Dictionary Data Structures

AVL Tree, Bayer Tree, Heap Summary of the Previous Lecture

AVL Insertion, Deletion Other Trees and Their Representations

Ch04 Balanced Search Trees

Balanced Binary Search Trees – AVL Trees, 2-3 Trees, B-Trees

Performance Analysis of Bsts in System Software∗

Lecture 13: AVL Trees and Binary Heaps

Balanced Tree Notes Prof Bill, Mar 2020

CMSC 420 Data Structures1

AVL Tree, Red-Black Tree, 2-3 Tree, AA Tree, Scapegoat Tree, Splay Tree, Treap,