Balanced Binary Search Trees
Pedro Ribeiro
DCC/FCUP
2019/2020
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 1 / 48 Motivation
Let S be a set of ”comparable” objects/items:
I Let a and b be two different objects. They are ”comparable” if it is possible to say that a < b, a = b or a > b. I Example: numbers, but we could have other data types (students with names and numbers, teams with points and goal-average, ...)
A few possible problems of interest:
I Given a set S, determine if a certain item is in S I Given a dynamic set S (that changes with insertions and removals), determine if a certain item is in S I Given a dynamic set S, determine the min/max item in S I Given a dynamic set S, determine the elements in a range [a, b] I Sort a set S I ...
Binary Search Trees!
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 2 / 48 Binary Search Trees - Notation
An overview of notation for binary trees:
Node A is the root and nodes D, E and F are the leafs Nodes {B, D, E} are a subtree Node A is the parent of nodes B and C Nodes D and E are children of node B Node B is a brother of node C ... Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 3 / 48 Binary Search Trees - Overview
For all nodes of tree, the following must hold: the node is bigger than all nodes in the left subtree and smaller than all nodes in the right subtree
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 4 / 48 Binary Search Trees - Example
The smallest element is... in the leftmost node The biggest element is... in the rightmost node
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 5 / 48 Binary Search Trees - Search
Searching for values in binary search trees:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 6 / 48 Binary Search Trees - Search
Searching for values in binary search trees:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 7 / 48 Binary Search Trees - Search
Seaching for values in binary search trees:
Searching in a binary search tree (true/false to check if exists) Search(T , v): If Null(T ) then return false Else If v < T .value then return Search(T .left child, v) Else If v > T .value then return Search(T .right child, v) Else return true
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 8 / 48 Binary Search Trees - Insertion
Inserting values in binary search trees:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 9 / 48 Binary Search Trees - Insertion
Inserting values in binary search trees:
Insertion on a binary search tree Insert(T , v): If Null(T ) then return new Node(v) If v < T .value then T .left child = Insert(T .left child, v) Else If v > T .value then T .right child = Insert(T .right child, v) return T
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 10 / 48 Binary Search Trees - Removal
Removing values from binary search trees:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 11 / 48 Binary Search Trees - Removal
After finding the node we need to decide how to remove
I 3 possible cases:
⇓ ⇓ ⇓
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 12 / 48 Binary Search Trees - Execution Time
How to characterize the execution time of each operation?
I All operations search for a node traversing the height of the tree
Complexity of operations in a binary search tree Let h be the height of a binary search tree T . The complexity of finding the minimum, maximum, or searching for an element, or inserting or removing an element in T is O(h).
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 13 / 48 Binary Search Trees - Visualization
A nice visualization of search, insertion and removal can be seen in:
https://www.cs.usfca.edu/˜galles/visualization/BST.html
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 14 / 48 Unbalanced Trees
The problem of the previous methods:
The height of the tree can be of the order of O(n) (where n is the number of elements)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 15 / 48 Balancing Strategies
There are many strategies to guarantee that the complexity of the search, insertion and removal operations are better than O(n)
Balanced Trees: Other Data Structures: (height O(log n)) I Skip Lists
I AVL Trees I Hash Tables
I Red-Black Trees I Bloom Filters
I Splay Trees
I Treaps
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 16 / 48 Balancing Strategies
A simple strategy: reconstruct the tree once in a while
⇒
On a ”perfect” binary tree with n nodes, the height is... O(log(n))
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 17 / 48 Balancing Strategies
Given a sorted list of numbers, in which order should we insert them in a binary search tree so that it stays as balanced as possible?
Answer: “binary search”, insert the element in the middle, split the reamaining list in two (smaller and bigger) based on that element and insert each half applying the same method
How frequently should we reconstruct the binary search tree so that we can guarantee efficiency? If we reconstruct often we have many O(n) operations If we rarely reconstruct, the tree may become unbalanced √ A possible answer: after O( N) insertions
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 18 / 48 Balancing Strategies
Simple case: how to balance the following tree (between parenthesis is the height):
⇒
This operation is called a right rotation
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 19 / 48 Balancing Strategies
The relevant rotation operations are the following: I Note that we must not break the properties that turn the tree into a binary search tree
Right Rotation
⇒
Left Rotation
⇒
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 20 / 48 AVL Trees
AVL Tree A binary search tree that guarantees that for each node, the heights of the left and right subtrees differ by at most one unit (height invariant)
When inserting and removing nodes, we change the tree so that we keep the height invariant
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 21 / 48 AVL Trees
Inserting on a AVL tree works like inserting on any binary search tree. However, the tree might break the height invariant (and stop being ”balanced”) The following cases may occur:
+2 on the left +2 on the right
Let’s see how to correct the first case with simple rotations. Correcting the second case is similar, but with mirrored rotations
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 22 / 48 AVL Trees
In the first case, we have two different possible shapes of the AVL Tree The first:
⇒
Left is too ”heavy”, case 1 We correct by making a right rotation starting in X
Note: the height of Y2 might be h + 1 or h: this correction works for both cases
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 23 / 48 AVL Trees
The second:
⇒ ⇒
Left is too ”heavy”, case 2 We correct by making a left rotation starting in Y , followed by a right rotation starting in X
Note: the height of Y21 or Y22 might be h or h − 1: this correction works for both cases
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 24 / 48 AVL Trees
By inserting nodes we might unbalance the tree (breaking the height invariant)
In order to correct this, we apply rotations along the path where the node was inserted
There are two analogous unbalancing types: to the left or to the right
Each type has two possible cases, that are solved by applying different rotations
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 25 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
Example of node insertion:
(after two rotations)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees
To remove elements, we apply the same idea of insertion
First, we find the node to remove
We apply one of the modifications seen for binary search trees
We apply rotations as described along the path until we reach the root
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 27 / 48 AVL Trees
For the search operation, we only traverse the tree height
For the insertion operation, we traverse the tree height and the we apply at most two rotations (why only two?), that take O(1)
For the removal operation, we traverse the tree height and the we apply at most two rotations over the path until the root
We conclude that the complexity of each operation is O(h), where h is the tree height
What is the maximum height of an AVL Tree?
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 28 / 48 AVL Trees
To calculate the worst case of the tree height, let’s do the following exercise:
I What is the smallest AVL tree (following the height invariant) with height exactly h? I We will call N(h) to the number of nodes of a tree with height h
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 29 / 48 AVL Trees
Height 1 Height 2 Height 3
Height 4 Height 5
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 30 / 48 AVL Trees
Summarizing:
I N(1) = 1 I N(2) = 2 I N(3) = 4 I N(4) = 7 I N(5) = 12 I ... I N(h) = N(h − 2) + N(h − 1) + 1 It has a behavior similar to the Fibonacci sequence! Remembering your linear algebra courses: h I N(h) ≈ φ I log(N(h)) ≈ log(φ)h 1 I h ≈ log(φ) log(N(h))
The height h of an AVL Tree with n nodes obeys to h ≤ 1.44 log(n)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 31 / 48 AVL Tree
Advantages of AVL Trees:
I Search, insertion and removal operations with guaranteed worst case complexity of O(log n); I Very efficient search (when comparing with other related data structures), because the height limit of 1.44 log(n) is small; Disadvantages of AVL trees:
I Complex implementation (we can simplify removal by using lazy delete, similar to the idea of reconstructing); I Implementation requires two extra bits of memory per node (to store the ”unbalancedness” of a node: +1, 0 or -1); I Insertion and removal less efficient (when comparing with other related data structures) because of having to guarantee a smaller maximum height; I The rotations frequently change the tree structure (not cache or disk friendly);
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 32 / 48 AVL Trees
The name AVL comes from the authors: G. Adelson-Velsky and E. Landis. The original paper describing them is from 1962 (”An algorithm for the organization of information”, Proceedings of the USSR Academy of Sciences)
You can use an AVL Tree visualization to ”play” a little bit with the concept and seeing how are insertions, removals and rotations made. https://www.cs.usfca.edu/˜galles/visualization/AVLtree.html
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 33 / 48 Red-Black Trees
We will now explore another type of binary search trees known as red-black trees This type of trees appeared as an ”adaptation” of 2-3-4 trees to binary trees
The original paper is from 1978 and was it was written by L. Guibas e R. Sedgewick (”A Dichromatic Framework for Balanced Trees”) The authors say they use the red and black colors because they looked good when printed and because those were the pen colors thay they had available to draw the trees :) Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 34 / 48 Red-Black Trees
Red-Black Tree A binary search tree where each node is either black or red and: (root property) The root node is black (leaf property) The leaves are null/empty black nodes (red property) The children of a red node are black (black property) For each node, a path to any of its descending leaves has the same number of black nodes
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 35 / 48 Red-Black Trees
For better visibility, the images may not contain the ”null” leaves, but you may assume those nodes exist. We call internal nodes to the non null nodes.
The number of black nodes in a path from a node n to any of its leaves (not including the node itself) is known as black height and will be denoted as bh(n)
I Ex: → bh(12) = 2 and bh(21) = 1
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 36 / 48 Red-Black Trees
What type of balance do the restrictions guarantee? If bh(n) = k, then a path from n to a leaf has:
I At least k nodes (only black nodes) I At most 2k nodes (alternating between black and red nodes) [recall that there are never two consecutive red nodes] The height of a branch is therefore at most double the height of a sister branch
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 37 / 48 Red-Black Trees
Theorem - Height of a Red-Black Tree
A red-black tree with n nodes has height h ≤ 2 × log2(n + 1) [that is, the height h of a red-black tree is O(log n)]
Intuition: Let’s merge the red nodes with their black parents:
This process produces a tree with 2, 3 or 4 children This 2-3-4 tree has leaves at an uniform height of h’ (where h’ is the black height)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 38 / 48 Red-Black Trees
Theorem - Height of a Red-Black Tree
A red-black tree with n nodes has height h ≤ 2 × log2(n + 1) [that is, the height h of a red-black tree is O(log n)]
The height of this tree is at least half of the original: h0 ≥ h/2 0 A complete binary tree of height h0 has 2h − 1 internal (non null) nodes 0 The number of internal nodes of the new tree is ≥ 2h − 1 (it is a 2-3-4 tree) 0 The original tree had even more nodes than the new one: n ≥ 2h − 1 0 n + 1 ≥ 2h 0 log2(n + 1) ≥ h ≥ h/2
h ≤ 2 log2(n + 1) Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 39 / 48 Red-Black Trees
How to make an insertion?
Inserting a node in a non empty red-black tree Insert as in any binary search tree Color the inserted node as red (adding the null black nodes) Recolor and restructure if needed (restore the invariants)
Because the tree is non empty we don’t break the root property Because the inserted node is red, we don’t break the black property The only invariant than can be broken is the red property
I If the parent of the inserted node is black, nothing needs to be done I If the parent is red we now have two consecutive red nodes
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 40 / 48 Red-Black Trees
When the parent of the inserted node is black nothing needs to be done:
Example:
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 41 / 48 Red-Black Trees
Red-Red after insertion (red parent) Case 1.a) The uncle is a black node and the inserted node x is the left child
Description: right rotate the grandfather, followed by swapping the colors between the parent and the grandfather
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 42 / 48 Red-Black Trees
Red-Red after insertion (red parent) Case 1.b) The uncle is a black node and the inserted node x is the right child
Description: left rotation of parent followed by the moves of 1.a [If the parent was the right child of the grandfather, we would have similar cases, but symmetric in relation to these] Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 43 / 48 Red-Black Trees
Red-Red after insertion (red parent) Case 2: The uncle is a red node, with x being the inserted node
Description: swap colors of parent, uncle and grandfather Now, if the father of the grandfather is red, we have a new red-red situation and we can simply apply one of the cases we already know (if the gradparent is the root, we simply color it as black)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 44 / 48 Red-Black Trees
Let’s visualize some insertions (try the indicated url):
https://www.cs.usfca.edu/˜galles/visualization/RedBlack.html
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 45 / 48 Red-Black Trees
The cost of an insertion is therefore O(log n)
I O(log n) to get to the insertion position I O(1) to eventually recolor and restructure
The removals are similar albeit a bit more complicated, but they also cost O(log n) (we will not detail in class, but you can try the visualizations)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 46 / 48 Red-Black Trees
Comparison of Red-Black Trees (RB) with AVL trees
I Both are implemented with balanced binary search trees (search, insertion and removal are O(log n))
I RB are a little bit more unbalanced in the worst case, with height ∼ 2 log(n) vs AVL with height ∼ 1.44 log(n)
I RB may take a little bit more time to search (at the worst case, because of the height)
I RB are a bit faster in insertions/removals on average (”lighter” rebalancing)
I RB spend less memory (RB only need 1 extra bit for color, AVL 2 bits for unbalancedness)
I RB are (probably) more used in the classical programming languages Examples of data structures that use them:
F C++ STL: set, multiset, map, multiset F Java: java.util.TreeMap , java.util.TreeSet F Linux kernel: scheduler, linux/rbtree.h
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 47 / 48 Use in C/C++, Java and other languages
Any typical programming language has an implementation of balanced binary search trees The associated main data structures are:
I set: search, insert and remove elements I multiset: a set with possibly repeated elements I map: associative array (associates a key with a value) ex: associating strings to ints) I multimap: a map with the possibility of repeated keys The nodes may contain any data types as long as they are comparable Because there is relative order between nodes, you can use iterators to traverse the trees in order (ex: in increasing order, from min to max)
Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 48 / 48