Relational Database Systems 2 4. Trees & Advanced Indexes Silke Eckstein Benjamin Köhncke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

3 Indexing

• Buffer Management is very important – Holds DB blocks in primary memory • DB block are made up of several FS blocks • Find good strategies to have requested DB blocks available when needed – Each block holds some meta data and row data • Indexes drastically speed up queries – Less blocks need to be scanned – Primary Index • On primary key attribute, usually influences row storage order – Secondary Index • On any attribute, does not influence storage order

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 2 4 Trees & Advanced Indexes

4.1 Introduction 4.2 Binary Search Trees 4.3 Self Balancing Binary Search Trees 4.4 B-Trees

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 3 4.1 Introduction

• Indexes need a suitable – For efficient index look-ups search keys need to be ordered • Remember: All indexes should be stored in a separate database file, not together with data – A suitable number of DB blocks (adjacent on disk) is reserved at index creation time – If the space is not sufficient, another file is created and linked to the original index file

Search Key 1 Block Address 1 Search Key 2 Block Address 2

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 4 4.1 Introduction

• Search within an index

– Bisection search possible: ⌈log2n⌉; O(log n)

4 6 7 16 18 21 24 33 39 47 68 72 89 92 99

– But usually indexes span several DB blocks • If index is in n blocks, O(n) blocks need to be read from disk • Example: search for 92

4 6 7 16 18 21 24 33 39 47 68 72 89 92 99

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 5 4.1 Introduction

• Maintenance of index is also difficult – Insert a new search key with value 5! 5

4 6 7 16 18 21 24 33 39 47 68 72 89 92 99

4 5 6 7 16 18 21 24 33 39 47 68 72 89 92 99

– In worst case, all cells need to be shifted and all blocks need to be accessed • Similar problem occurs when deleting a value – Often: do not shift values, but mark key as deleted

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 6

4.1 Introduction

• In this lecture, we discuss more efficient multi- level data structures – B-trees • Prevalent in database systems • Better access performance • Much better update performance

• To understand B-trees better, we start by examining binary search trees

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 7 4.2 Binary Trees

• Binary trees are – Rooted and directed trees – Each node has none, one or two children – Each node (except root) has exactly one parent

0/1

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 8 4.2 Binary Trees

root • Some naming conventions – Nodes without children are called leaf nodes subtree red – The depth of node N is the path length from the root red – The height is the maximum node depth – If there is a path from node N1 to node N2, N1 is an ancestor of N2 and N2 is a descendant of N1 – The size of a node N is the number of Leaf nodes all descendants of N including itself – A subtree of a node N is formed of tree height = 3 all descendant nodes including N and the respective links red node: size = 3 depth = 1

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 9 4.2 Binary Trees

• Properties of binary trees • Full (or proper) – Each node has either zero or two children

• Perfect binary tree Full and Balanced – All leaf nodes have the same depth – With height h, contains 2h nodes • Height-balanced binary tree – Depth of all leaf nodes differ by at most 1 Full and Perfect – With height h, contains between 2h-1 and 2h nodes • Degenerated binary tree – Each node has either zero or one child – Behaves like a : search in O(n)

Degenerated

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 10 4.2 Binary Search Trees • Binary search trees are binary trees with – Each node has a unique value assigned – There is a total order on all values – Left subtree of a node contains only values less than node value – Right subtree of a node contains only values larger than the node value – Aiming for O(log n) search complexity • Structurally resembles bisection search 57 0/1 33 85

17 42 61 99

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 11 4.2 Binary Search Trees

• Constructing and inserting into binary search trees – Values are inserted incrementally – First value is root – Additional values sink into tree • Sink to left subtree if value smaller • Sink to right subtree if value larger • Attach to last node as left/right child, if subtree is empty • Insert order of values does highly influence resulting and intermediate tree properties

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 12 4.2 Binary Search Trees

• Suppose insert order 57, 33, 42, 85, 17, 61, 99

85 57 33 57 42 17 85 33 61 17 99 61 42

Insert 57 99 Insert 33, 42 – Degenerated

57 61 57 99 33 85 33 85

17 42 17 42 61 99

Insert 85, 17 – Full and Balanced Insert 61, 99 –Perfect and Full

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 13 4.2 Binary Search Trees

• Suppose insert order – 99, 85, 61, 57, 42, 33, 17 99 85 • Insert complexity is thus 61

– O(n) worst case 57 – O(log n) average case 42

33

17

Degenerated

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 14 4.2 Binary

• Search for a Key

– Start with root 57 – Recursive Procedure • If node value = v 33 85 – Return node • If node is leaf 17 42 61 99 – Value not found • if v < node value – Descend to left subtree Else – Descend to right subtree • Complexity: – Average case: O(log n) – Worst case: O(n) – degenerated tree

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 15 4.2

57

• Tree Traversal 33 85 – Accesses all nodes of the tree • Pre-Order 17 42 61 99 – Visit node – Traverse left subtree 35 49 – Traverse right subtree Pre-Order: 57-33-17-42-35-49-85-61-99 • In-Order (sorted access) – Traverse left subtree 57 – Visit node – Traverse right subtree 33 85 • Post-Order – Traverse left subtree 17 42 61 99 – Traverse right subtree – Visit node 35 49 – 17–35–49–42–33–61–99–85–57 In-Order: 17-33-35-42-49-57-61-85-99

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 16 4.2 Binary Search Tree

• Deleting nodes has complexity O(n) worst case, O(log n) average case – Locate the node to delete by tree search – If node is leaf, just delete it – If node has one child, delete node and attach child to parent – If node has two children • Replace either by a) in-order successor (the left-most child of the right subtree) b) in-order predecessor (the right-most child of the left subtree) • Example: delete search key with value 57 a) b) 57 83 27

27 83 27 86 22 83

22 86 22 86

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 17 4.2 Binary Search Trees

• Summary – Very simple, dynamic data structure – Efficient on average • O(log n) for all operations – Can be very inefficient for degenerated cases • O(n) for all operations 0/1

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 18 4.3 Self-Balancing Binary Search Trees

• Observation: – Binary Search Trees are very efficient when perfect or balanced • Idea: – Continuously optimize to keep tree balanced • Popular Implementations – AVL-Tree (classic example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree – …

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 19

4.3 Self-Balancing Binary Search Trees

• Basic Concepts for Deletion: Global Rebuild (Lazy Deletion) – Start with balanced tree – Don’t delete a node, just mark it as deleted • Search algorithm scans deleted nodes, but does not return them – If Rebuild Condition is met, rebuild the whole tree without the deleted nodes • “Rebuild as soon as half of the nodes are marked as deleted” • Complete rebuild can be performed in O(n)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 20 4.3 Self-Balancing Binary Search Trees

• Global Rebuild (cont.) – Search Efficiency • n number of unmarked nodes • Tree is balanced, contains max 2n nodes overall – Number of accesses during search usually just increases by 1 – O(log n) – Delete Efficiency • Global rebuild is in O(n) – But only necessary after n deletions – Amortized additional costs per deletion is O(1) • Overall complexity – Average: O(log n) – Worst Case: O(n), if actual rebuild is performed

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 21 4.3 Self-Balancing Binary Search Trees

• Global Rebuild (cont.) – Direct Deletion with Rebuild • Similar complexity as with lazy deletion – Increased per delete effort – Reduced per search effort until rebuild • Delete nodes as in normal binary trees

– Increment deletion counter cd • Rebuild tree as soon as cd = n, reset cd

57 85 33 85 17 99 17 42 61 99

Delete 57,33,42,61 Rebuild

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 22 4.3 Self-Balancing Binary Search Trees

• Basic Concepts for Insertion and Deletion: Local Balancing (Subtree Balancing) – Start with balanced tree – Insert/delete nodes normally • If a subtree becomes “too unbalanced”, locally balance subtree to regain global balance

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 23 4.3 Self-Balancing Binary Search Trees

– To detect unbalanced subtrees, each node v needs to know the size |v| and the height h(v) of it’s subtree • Unbalanced Condition: (Height Balancing) – Subtree is “too unbalanced” when |h(left(v)) - h(right(v))| > α – α is a constant which can be adjusted (for AVL, α=1) • Alternative Unbalanced Condition:

– Subtree is “too unbalanced” when h(v) > α * log2|v| – α is a constant which can be adjusted

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 24 4.3 Self-Balancing Binary Search Trees

• Local Balancing (cont.) – “After inserting a node, walk back the tree and update stored subtree statistics h(v) and |v|. If node v is too imbalanced, balance subtree of v” Height Imbalanced for α=1 |2-0| = 2 > 1 57 57 2, 5 3, 6

33 85 33 85 1, 3 0, 1 2, 4 0, 1

17 42 17 42 0, 1 0, 1 key 1, 2 0, 1

5 |v| h(v) 0, 1

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 25 4.3 Self-Balancing Binary Search Trees

– Local Balancing can be archived by • Rebuilding the subtree – O(|v|) = O(n) the worst case – However, O(log n) in average – This operation is expensive. But in the context of DBMS, it may pay off as it can also consolidate and optimize physical storage locations – Especially suited for disk based trees • Rotating – Only pointers are moved  very efficient – O(1) – Does not change physical storage of nodes – Especially suited for main-memory-based trees

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 26

4.3 Self-Balancing Binary Search Trees

• Local Balancing – Rotating – Simple Rotation (left, right) Pivot y x right x y 3 1 left 1 2 2 3 – Double Rotation (left-left, right-right, Rollercoaster) right-right right-right z y x

y x z y 4 1 x z 3 1 2 3 4 2

1 2 left-left left-left 3 4

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 27 4.3 Self-Balancing Binary Search Trees

• Local Balancing – Rotating – Double Rotation (left-right, Zig-Zag) left right z z x

y x y z 4 4 x y 1 3 1 2 3 4

2 3 1 2

– Double Rotation (right-left, Zig-Zag) – Analogous to left-right

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 28 Self-Balancing Binary Search Trees

• The presented concepts can be combined in different ways to implement self-balancing trees – AVL-Tree (classic example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 29 Self-Balancing Binary Search Trees

• Implementation: AVL-Trees – Invented 1962 by Adelson-Velsky and Landis – Uses Local Rebalancing with Rotations for Insertion and Deletion • Unbalanced criterion: |h(left(v)) - h(right(v))| > 1 – “Height difference of left and right subtree of v is 2 or more” – Height information is stored explicitly within nodes • Update backtracking after each insert and delete • Storage overhead of O(n)

– Guaranteed maximum height of 1.44 log2n

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 30 Self-Balancing Binary Search Trees

• Implementation: Scapegoat-Trees – Invented 1993 by Galperin and Rivest – Uses Global Rebuilding for Deletions – Local Balancing with Rotations for Insertions

• Unbalanced criterion: h(v) > log1/α|v| + 1; 0.5 ≤ α ≤ 1 – Node statistics (height, size) determined dynamically during backtracking • Only global statistics are stored • Storage overhead of O(1)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 31

4.4 Problems with Binary Search Trees

• Are binary trees really suitable for disk based databases? – Yes and No… – Binary Trees are great data-structures for usage in internal memory – But they have a very bad performance when stored on external storage (i.e. hard disks)

0/1 & =

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 32 4.4 Problems with Binary Search Trees

• Binary tree nodes have to be stored within hard disk blocks in linear fashion – When tree is large, nodes are scattered among the blocks – In worst case, a new block must be read from disk for every node accessed during search or traversal • Every linearization scheme for binary trees has that problem – Reading a block from disk is very expensive

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 33 4.4 Problems with Binary Search Trees

• Sample linearization – Search for ’42’ • In worst case needs to fetch 3 blocks from disk for just 4 nodes – Problem is even worse for full tree traversal Tree: 33

16 47

6 21 39 68

4 7 18 24 35 42 59 99

16

Disk/DB Blocks: 33 16 47 6 21 39 68 4 7 18 24 35 42 59 99 2 Block 1 Block 2 Block 3 Block 4 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 34 4.4 B-Trees

• B-Trees adapt concepts and techniques learned for binary trees and optimize them for harddisc storage • Basic Ideas: – Searching within a DB/disk block is very efficient • Take advantage of static nature within a block • Search can be performed in memory with bisection search • Treat entire blocks as tree nodes – Reading blocks from the disk is expensive • Reduce block reads • Most data resides in the leaf nodes – Thus minimize the height of the tree • Dramatically increase fan-out factor • Tree becomes “bushy” • Smaller serach path length

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 35 4.4 Block Search Trees

• First Improvement: Block Search Tree – Nodes are complete DB blocks

– Each node can store up to q pointers pi and q-1 unique and ordered key entries ki : • ki < ki+1 • Pointers pi link to subtrees (or are empty). All keys in subtree of pi are less than ki and greater as ki-1

Node Pointers Key Value 10 20 30 Node

5 15 17 34 38 55

13 14

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 36 4.4 Block Search Trees

• Locate a key k – Recursive Procedure: Start with root node • Use bisection search within the current node • If key found – Return it • If key not found

– If there is a pi with ki-1 < k < ki » Follow pi and repeat algorithm with link node – Else » Key not in tree – Example: Locate 14 10 20 30

5 15 17 34 38 55

13 14

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 37 4.4 Block Search Trees

• Insert a key k – Recursive Procedure: Start with root node • Use bisection search within the current node • If key found – Key cannot inserted twice, abort • If key not found

– If there is a pi with ki-1 < k < ki » Follow pi and repeat algorithm with link node – Else » If there is space left in the node • Insert key and restore sort order » Else • Create new, empty node • Insert k into new node

• Link new node to pi in current node such that with ki-1 < k < ki

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 38 4.4 Block Search Trees

• Insert a key : 44 10 20 30

5 15 17 34 38 55

13 14

10 20 30

5 15 17 34 38 55

13 14 44

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 39 4.4 Block Search Trees

• Delete a key k – Start with root node – Locate k • If k is in leaf node, delete k from node and restore order – If leaf node is now empty, delete the node • If k is in internal node – If no or only one directly adjacent pointer of k are used » Delete k and restore order – If k is a separator between two used pointers, » If space in both subnodes is sufficient • Union both nodes into one • Delete k and restore order » Else • Replace k with new separator key • Either largest key in left node or smallest key in right node – Any completely empty node is deleted as in binary search trees

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 40 4.4 Block Search Trees

• Delete a key : 10 10 20 30

5 15 17 34 38 55

13 14 44

20 30

5 15 17 34 38 55

13 14 44

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 41 4.4 Block Search Trees

• Delete a key : 20 20 30

5 15 17 34 38 55

13 14 44

30

5 15 17 34 38 55

13 14 44

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 42 4.4 Block Search Trees

• Delete a key : 30 30

5 15 17 34 38 55

13 14 44

34

5 15 17 38 55

13 14 44

EN 14.3.1 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 43 4.4 Block Search Trees

• Block Search Trees have similar properties to Binary Search Trees – Can be perfect, balanced or degenerated • Assume height h=3; fan-out-factor q=2048; and total number of keys n – Block Search Tree • One node can store up to 2047 keys and 2048 links • Perfect : n = 8581M • Balanced : 4M ≤ n ≤ 8581M • Degenerated : n = 6141 – Binary Search Tree (Block tree with q=2) • One node can store 1 key and up to 2 links • Perfect : n = 7 • Balanced : 3 < n ≤ 7 • Degenerated : n = 3

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 44

4.4 Block Search Trees

• Assume n = 1,000,000,000; fan-out-factor q=2048; and height h – Block Search Tree • Balanced : h = 3 • Degenerated : h = 488,520 – Binary Search Tree • Balanced : h = 30 • Degenerated : h = 1,000,000,000 • During search, there is one disk access per tree height in worst case – In this example, block search trees are already 10 times more efficient when balanced, 2000 times when unbalanced

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 45

4.4 Block Search Trees

• Summary BST – Data structure optimized for disk storage – Is very efficient in average case • O(log n) for all operations • Even better

– Average node-accesses to locate a key is logfan-out n » Fan-out usually in the order of several thousands

» Binary tree averages only to log2n – Accessing a node is expensive on disks, huge improvement – Can be very inefficient for degenerated cases • O(n) for all operations • Better than binary trees, but still bad

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 46 4.4 B-Trees

• B-Tree are specialized Block Search Trees for disc-based Indexing – Invented by Rudolf Bayer in 1971 – Keys may be non-unique – Tree is self-balancing • No degenerated cases anymore

EN 14.3.2 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 47 4.4 B-Trees

• Basic structure of a B-tree node – Nodes contain key values and respective data (block) pointers to the actual data records – Additionally, there are node pointers for the left, resp. right interval around a key value

Key Value Data Pointer Tree Node

Node Pointers

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 48 4.4 B-Trees

• B-Trees as Primary Index

2 6 7

1 3 4 5 8 9

, $ 99,00 $ ,

, $ 167,00 $ ,

, $ 682,56 $ ,

Tarrens

Naders

Behaim

5, Miller, $179,99 Miller, 5,

8, Smith, $675,99 Smith, 8,

7, Ruth, $ 8642,78 $ Ruth, 7,

9, 9,

2, Bertram, $19,99 Bertram, 2,

4, Cesar, $ 1866,00 $ Cesar, 4,

1, Adams, $ 887,00 $ Adams, 1,

6, 6, 3, 3,

EN 14.3.2 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 49 4.4 B-Trees

• All base operations similar to Block Search Tree with small changes – Guaranteed fill degree – Self-Balancing

• Each node contains between L(ower) and U(pper) links – Usually 2* L = U – Nodes are split during insertion as soon as they contain more than U-2 keys – Nodes are unioned during deletion as soon as they contain less than L keys – If complete node is created or deleted, use local rebalancing to re-balance tree • Local rebuilding for disk-based storage, rotations for memory based storage

EN 14.3.2 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 50 4.4 B-Trees

• All insertions start at the leaf nodes – Search the tree to find leaf node where new element should be added – If the leaf node contains fewer than the maximum legal number of elements (if |leaf node| < U) • Insert the new element in the node and restore order – Otherwise the leaf node is split into two nodes (node split) • The is chosen from among the leaf's elements and the new element • Values less than the median are put in the new left node and values greater than the median are put in the new right node, with the median acting as a separation value • That separation value is added to the node's parent, which may also cause it to be split • If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children – Remember: the lower bound on the size of internal nodes does not apply to the root

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 51 4.4 B-Trees

• Deleting elements is problematic, because node sizes can decrease under the minimum number of elements (i.e. |new node size| < L) – An element to be deleted in an internal node may be a separator for its child nodes

• Deletion from a leaf node – Search for the value to delete – If the value is in a leaf node, it can simply be deleted from the node • Test if node has too few elements; in that case the tree has to be rebalanced

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 52

4.4 B-Trees

• Rebalancing after deletion – If some leaf node is under the minimum size, some elements must be redistributed from its siblings to bring all children nodes again up to the minimum (stealing) • If all siblings have only minimum size, the parent node is affected and has to hand over an element • If the parent then falls under the minimum size, the redistribution must be applied iteratively up the tree • Since the minimum element count does not apply to the root, making the root the only deficient node is not a problem

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 53 4.4 B-Trees

• The rebalancing strategy is to find a sibling of the deficient node which has more than the minimum number of elements (for stealing) – Choose a new separator • Move it to the parent node and redistribute the values in both original nodes to the new left and right children – If the sibling node to the right of the deficient node has only the minimum number of elements, examine the sibling node to the left

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 54 4.4 B-Trees

– If both siblings have only the minimum number of elements: • create a new node with all the elements from the deficient node & all the elements from one of its siblings & the separator in the parent between the two combined sibling nodes – Remove the separator from the parent, and replace the two children it separated with the combined node. – If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root may be deficient

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 55 4.4 B-Trees

• Example: Steal Keys from Siblings (l=3)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 56 4.4 B-Trees

• Example: Join Child Nodes (l=3)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 57 4.4 B-Trees

• Each element in an internal node acts as a separation value for two subtrees. • When such an element is deleted, there are two cases: – Both of the two child nodes to the left and right of the deleted element have the minimum number of elements (L-1) and then can then be joined into a legal single node with (2L-2) elements – One of the two child nodes contains more than the minimum number of elements. Then a new separator for those subtrees must be found. There are two possible choices: • The largest element in the left subtree is the largest element which is still less than the separator • The smallest element in the right subtree is the smallest element which is still greater than the separator

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 58 4.4 B-Trees

• Deletion from an internal node – If the value is in an internal node, choose a new separator, remove it from the leaf node it is in, and replace the element to be deleted with the new separator – This has deleted an element from child node so the deletion has been passed down the tree iteratively • If the child is a leaf node the leaf node deletion procedure applies

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 59 4.4 B-Trees

• Example: Build a B-Tree

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 60 4.4 B-Trees

• Summary – Very efficient data structure for disk storage • O(log n) for all operations

• Even better 푛 + 1 log ( ) – Guaranteed maximum node-accesses to locate a key is 푓푎푛 −표푢푡 2 – Balanced binary tree guarantees only ⌈ log2 푛)⌉ – Accessing a node is expensive on disks  huge improvement – No degenerated cases • Self-Balancing rarely necessary as most updates affect just one node • Wasted space decreased due to guaranteed minimal fill factor

EN 14.3.2 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 61 4.4 B*Trees

• The B*Tree is a constrained B-Tree – All non-root nodes need to be filled to 2/3 – Implemented in various file systems • HFS • Raiser 4 – Used to be quite popular, but lost its importance…

EN 14.3.3 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 62 4.4 B+Trees

• The B+Tree is an optimization of the B-Tree – Improved traversal performance – Increased search efficiency – Increased memory efficiency • B+Tree uses different nodes for leaf nodes and internal nodes – Internal Nodes: Only unique keys and node links • No data pointers! – Leaf Nodes: Replicated keys with data pointer • Data pointers only here Node Pointer Key Value Key Value Data Pointer

Node … … Internal Node Leaf Node EN 14.3.3 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 63 4.4 B+Trees

• Internal Nodes are used for search guidance – A block can contain more keys  fan-out higher • Leafs just contain data links – All leafs are linked to each other in-order for increased traversal performance

Internal Search Nodes

Data Nodes

EN 14.3.3 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 64 4.4 B+Trees

• Summary – B+ Tree is THE super index structure for disk-based databases – Improved over B-Tree • Improved traversal performance • Increased search efficiency • Increased memory efficiency

EN 14.3.3 Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 65 4.5 IMDB’s

• Observation – Loading data from the hard disk is a major bottleneck – Available main memory still doubles every 18 month • Moore’s Law • Idea – Store all data in fast main memory! • Solutions – Use “traditional” DBMS with huge buffer pool (block cache) • DBMS are usually optimized for sequential disk access – Design special In-Memory Databases Systems • Or MMDB (Main Memory Database)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 66 4.5 IMDB’s

• Why do we need in-memory databases? – Embedded Systems • Mobile Phones • PDA’s • Sensors • Diskless Computing Devices • … – Ultra-High-Performance (Real Time) Scenarios • Network Applications • Telecommunication Applications • High-Volume Trading • …

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 67

4.5 IMDB’s

• Why should IMDB’s be different? • Traditional DBMS do also work in-memory – but they waste potential – Random access has nearly no penalty compared to sequential access • Optimizing for linear storage and block read/write unnecessary Type Media Size Random Transfer Characteristics Price Price/GB Acc. Speed Speed Pri DDR3-Ram 2 GiB 0.004 ms 8000 MB/sec Vol, Dyn, Ra, €38 € 19 (Corsair 1600C7DHX) OL Sec Harddrive Magnetic 2000 GB < 8.5 ms 138 MB/sec Stat, RA, OL €143 € 0.07 (Seagate ST32000641AS)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 68 4.5 IMDB’s

• Storing a DB in main memory also has problems – Main memory usually smaller and more expensive – ACID support • IMDBs support atomicity, consistency and isolation • Problem: main memory is not persistent – What happens in case of power failure? – How to ensure the durability requirement of DBs?

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 69 4.5 IMDB’s - Durability

• Snapshot Files / Checkpoint Images – Record state of DB at given point in time – Done periodically or in case of controlled shutdown – Only partial durability • Transaction Logging – History of actions executed by the DBMS – File of changes done in the database stored in stable storage – If DB has not been shut down properly (respectively is in inconsistent state) the DBMS reviews logs for uncommitted transactions and rolls back changes

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 70 4.5 IMDB’s - Durability

• Non-volatile random access memory (NVRAM) – Static RAM backed up with battery power (battery RAM) – Or electrically erasable programmable ROM (EEPROM)

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 71 4.5 IMDB’s - Indexes

• IMDB Index Structures – B-Trees are great, but they are shallow and bushy which is unnecessary in main memory • Can save some performance there – Hash Indexes are very suitable in main memory for unsorted data • Especially bucket chained hashing is very efficient – For sorted data: Use the T-Tree instead of B-Tree • Specialized tree for main memory databases • Blend between AVL-Tree and B-Tree

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 72 4.5 T-Tree

p • T-Tree design considerations – I/O access is cheap in main memory d1 d2 … dm-1 dm – Expensive resources are computation time l r and memory space • Properties – T-Tree is a self-balancing binary tree (AVL algorithm) – T-Tree nodes contain only links

– Each node links to m data records (d1 … dm) • Data entries are ordered, smallest left, biggest right

• All nodes contain a maximum of cmax entries • Each internal node contains cmin to cmax entries (usually cmax -cmin ≤2) – Each node has a link to it’s parent – Each node has at most a left and a right subtree • Left subtree contains only entries smaller than the minimal node entry • Right subtree contains only entries bigger than maximal node entry

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 73 4.5 IMDB Indexes

• How do main memory index structures compare?

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 74 4.5 IMDB Indexes

• How do main memory index structures compare?

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 75 4.5 IMDB Indexes

• How do main memory index structures compare?

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 76 4.5 IMDB Indexes

• Why not always use Chained Bucked Hashing? – No range queries – Storage overhead – Suboptimal if amount of data is unknown during initialization • Why do T-Tree and AVL-Tree perform better than B- Tree and ordered array for search? – Bisection search within a B-tree node/array needs to compute position of next comparison • AVL and T-Tree do only need 2 comparisons in each node • Why does T-Tree perform better than AVL for updates? – Due to larger nodes, many updates do not require a rebalancing • Why does ordered array suck for updates? – Reordering of all elements necessary for each update

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 77

4.5 References and Timeline

• AVL-Tree • G. Adelson-Velskii, E. M. Landis: “An algorithm for the organization of information”. Proceedings of the USSR Academy of Sciences 146: 263- 266. (Russian), 1962. • English translation by M. J. Ricci in Soviet Math. Doklady, 3:1259–1263, 1962 – B-Trees • R. Bayer, E. M. McCreight: “Organization and Maintenance of Large Ordered Indexes”. Acta Informatica 1, 173-189, 1972 – T-Trees • T. J. Lehman, M. J. Carey: “A Study of Index Structures for Main

Memory Database Management Systems”, 12th Int. Conf. On Very Large Database, Kyoto, August 1986 – Scapegoat Trees • I. Galperin, R. L. Rivest: “Scapegoat trees”, ACM-SIAM Symposium on Discrete Algorithms, Austin, Texas, US, 1993

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 78 4 Storage

• Tree data structures are good index structures

– O(logn) performance in average – But they may degenerate  O(n) • Balancing necesarry! – Block structure of hard discs must be considered •  Block trees – B-Tree • Self-balancing block tree with fill-guarantees – B+-Tree • Special inner nodes without data pointers • Leaf nodes optimized for linear traversal

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 79 5 Outlook

• The Query Processor – How do DBMS actually answer queries? – Query Parsing/Translation – Query Optimization – Query Execution – Implementation of Joins

Relational Database Systems 2 – Wolf-Tilo Balke– Institut für Informationssysteme 80