Balanced Binary Search Trees

Pedro Ribeiro

DCC/FCUP

2019/2020

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 1 / 48 Motivation

Let S be a of ”comparable” objects/items:

I Let a and b be two different objects. They are ”comparable” if it is possible to say that a < b, a = b or a > b. I Example: numbers, but we could have other data types (students with names and numbers, teams with points and goal-average, ...)

A few possible problems of interest:

I Given a set S, determine if a certain item is in S I Given a dynamic set S (that changes with insertions and removals), determine if a certain item is in S I Given a dynamic set S, determine the min/max item in S I Given a dynamic set S, determine the elements in a range [a, b] I Sort a set S I ...

Binary Search Trees!

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 2 / 48 Binary Search Trees - Notation

An overview of notation for binary trees:

Node A is the root and nodes D, E and F are the leafs Nodes {B, D, E} are a subtree Node A is the parent of nodes B and Nodes D and E are children of node B Node B is a brother of node C ... Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 3 / 48 Binary Search Trees - Overview

For all nodes of , the following must hold: the node is bigger than all nodes in the left subtree and smaller than all nodes in the right subtree

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 4 / 48 Binary Search Trees - Example

The smallest element is... in the leftmost node The biggest element is... in the rightmost node

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 5 / 48 Binary Search Trees - Search

Searching for values in binary search trees:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 6 / 48 Binary Search Trees - Search

Searching for values in binary search trees:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 7 / 48 Binary Search Trees - Search

Seaching for values in binary search trees:

Searching in a binary (true/false to check if exists) Search(T , v): If Null(T ) then return false Else If v < T .value then return Search(T .left child, v) Else If v > T .value then return Search(T .right child, v) Else return true

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 8 / 48 Binary Search Trees - Insertion

Inserting values in binary search trees:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 9 / 48 Binary Search Trees - Insertion

Inserting values in binary search trees:

Insertion on a binary search tree Insert(T , v): If Null(T ) then return new Node(v) If v < T .value then T .left child = Insert(T .left child, v) Else If v > T .value then T .right child = Insert(T .right child, v) return T

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 10 / 48 Binary Search Trees - Removal

Removing values from binary search trees:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 11 / 48 Binary Search Trees - Removal

After finding the node we need to decide how to remove

I 3 possible cases:

⇓ ⇓ ⇓

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 12 / 48 Binary Search Trees - Execution Time

How to characterize the execution time of each operation?

I All operations search for a node traversing the height of the tree

Complexity of operations in a binary search tree Let h be the height of a binary search tree T . The complexity of finding the minimum, maximum, or searching for an element, or inserting or removing an element in T is O(h).

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 13 / 48 Binary Search Trees - Visualization

A nice visualization of search, insertion and removal can be seen in:

https://www.cs.usfca.edu/˜galles/visualization/BST.html

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 14 / 48 Unbalanced Trees

The problem of the previous methods:

The height of the tree can be of the order of O(n) (where n is the number of elements)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 15 / 48 Balancing Strategies

There are many strategies to guarantee that the complexity of the search, insertion and removal operations are better than O(n)

Balanced Trees: Other Data Structures: (height O(log n)) I Skip Lists

I AVL Trees I Hash Tables

I Red-Black Trees I Bloom Filters

I Splay Trees

I

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 16 / 48 Balancing Strategies

A simple strategy: reconstruct the tree once in a while

On a ”perfect” with n nodes, the height is... O(log(n))

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 17 / 48 Balancing Strategies

Given a sorted list of numbers, in which order should we insert them in a binary search tree so that it stays as balanced as possible?

Answer: “binary search”, insert the element in the middle, split the reamaining list in two (smaller and bigger) based on that element and insert each half applying the same method

How frequently should we reconstruct the binary search tree so that we can guarantee efficiency? If we reconstruct often we have many O(n) operations If we rarely reconstruct, the tree may become unbalanced √ A possible answer: after O( N) insertions

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 18 / 48 Balancing Strategies

Simple case: how to balance the following tree (between parenthesis is the height):

This operation is called a right rotation

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 19 / 48 Balancing Strategies

The relevant rotation operations are the following: I Note that we must not break the properties that turn the tree into a binary search tree

Right Rotation

Left Rotation

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 20 / 48 AVL Trees

AVL Tree A binary search tree that guarantees that for each node, the heights of the left and right subtrees differ by at most one unit (height invariant)

When inserting and removing nodes, we change the tree so that we keep the height invariant

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 21 / 48 AVL Trees

Inserting on a AVL tree works like inserting on any binary search tree. However, the tree might break the height invariant (and stop being ”balanced”) The following cases may occur:

+2 on the left +2 on the right

Let’s see how to correct the first case with simple rotations. Correcting the second case is similar, but with mirrored rotations

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 22 / 48 AVL Trees

In the first case, we have two different possible shapes of the AVL Tree The first:

Left is too ”heavy”, case 1 We correct by making a right rotation starting in X

Note: the height of Y2 might be h + 1 or h: this correction works for both cases

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 23 / 48 AVL Trees

The second:

⇒ ⇒

Left is too ”heavy”, case 2 We correct by making a left rotation starting in Y , followed by a right rotation starting in X

Note: the height of Y21 or Y22 might be h or h − 1: this correction works for both cases

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 24 / 48 AVL Trees

By inserting nodes we might unbalance the tree (breaking the height invariant)

In order to correct this, we apply rotations along the path where the node was inserted

There are two analogous unbalancing types: to the left or to the right

Each type has two possible cases, that are solved by applying different rotations

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 25 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

Example of node insertion:

(after two rotations)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 26 / 48 AVL Trees

To remove elements, we apply the same idea of insertion

First, we find the node to remove

We apply one of the modifications seen for binary search trees

We apply rotations as described along the path until we reach the root

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 27 / 48 AVL Trees

For the search operation, we only traverse the tree height

For the insertion operation, we traverse the tree height and the we apply at most two rotations (why only two?), that take O(1)

For the removal operation, we traverse the tree height and the we apply at most two rotations over the path until the root

We conclude that the complexity of each operation is O(h), where h is the tree height

What is the maximum height of an AVL Tree?

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 28 / 48 AVL Trees

To calculate the worst case of the tree height, let’s do the following exercise:

I What is the smallest AVL tree (following the height invariant) with height exactly h? I We will call N(h) to the number of nodes of a tree with height h

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 29 / 48 AVL Trees

Height 1 Height 2 Height 3

Height 4 Height 5

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 30 / 48 AVL Trees

Summarizing:

I N(1) = 1 I N(2) = 2 I N(3) = 4 I N(4) = 7 I N(5) = 12 I ... I N(h) = N(h − 2) + N(h − 1) + 1 It has a behavior similar to the Fibonacci sequence! Remembering your linear algebra courses: h I N(h) ≈ φ I log(N(h)) ≈ log(φ)h 1 I h ≈ log(φ) log(N(h))

The height h of an AVL Tree with n nodes obeys to h ≤ 1.44 log(n)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 31 / 48 AVL Tree

Advantages of AVL Trees:

I Search, insertion and removal operations with guaranteed worst case complexity of O(log n); I Very efficient search (when comparing with other related data structures), because the height limit of 1.44 log(n) is small; Disadvantages of AVL trees:

I Complex implementation (we can simplify removal by using lazy delete, similar to the idea of reconstructing); I Implementation requires two extra bits of memory per node (to store the ”unbalancedness” of a node: +1, 0 or -1); I Insertion and removal less efficient (when comparing with other related data structures) because of having to guarantee a smaller maximum height; I The rotations frequently change the (not cache or disk friendly);

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 32 / 48 AVL Trees

The name AVL comes from the authors: G. Adelson-Velsky and E. Landis. The original paper describing them is from 1962 (”An algorithm for the organization of information”, Proceedings of the USSR Academy of Sciences)

You can use an AVL Tree visualization to ”play” a little bit with the concept and seeing how are insertions, removals and rotations made. https://www.cs.usfca.edu/˜galles/visualization/AVLtree.html

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 33 / 48 Red-Black Trees

We will now explore another type of binary search trees known as red-black trees This type of trees appeared as an ”adaptation” of 2-3-4 trees to binary trees

The original paper is from 1978 and was it was written by L. Guibas e R. Sedgewick (”A Dichromatic Framework for Balanced Trees”) The authors say they use the red and black colors because they looked good when printed and because those were the pen colors thay they had available to draw the trees :) Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 34 / 48 Red-Black Trees

Red-Black Tree A binary search tree where each node is either black or red and: (root property) The root node is black (leaf property) The leaves are null/empty black nodes (red property) The children of a red node are black (black property) For each node, a path to any of its descending leaves has the same number of black nodes

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 35 / 48 Red-Black Trees

For better visibility, the images may not contain the ”null” leaves, but you may assume those nodes exist. We call internal nodes to the non null nodes.

The number of black nodes in a path from a node n to any of its leaves (not including the node itself) is known as black height and will be denoted as bh(n)

I Ex: → bh(12) = 2 and bh(21) = 1

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 36 / 48 Red-Black Trees

What type of balance do the restrictions guarantee? If bh(n) = k, then a path from n to a leaf has:

I At least k nodes (only black nodes) I At most 2k nodes (alternating between black and red nodes) [recall that there are never two consecutive red nodes] The height of a branch is therefore at most double the height of a sister branch

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 37 / 48 Red-Black Trees

Theorem - Height of a Red-Black Tree

A red-black tree with n nodes has height h ≤ 2 × log2(n + 1) [that is, the height h of a red-black tree is O(log n)]

Intuition: Let’s merge the red nodes with their black parents:

This process produces a tree with 2, 3 or 4 children This 2-3-4 tree has leaves at an uniform height of h’ (where h’ is the black height)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 38 / 48 Red-Black Trees

Theorem - Height of a Red-Black Tree

A red-black tree with n nodes has height h ≤ 2 × log2(n + 1) [that is, the height h of a red-black tree is O(log n)]

The height of this tree is at least half of the original: h0 ≥ h/2 0 A complete binary tree of height h0 has 2h − 1 internal (non null) nodes 0 The number of internal nodes of the new tree is ≥ 2h − 1 (it is a 2-3-4 tree) 0 The original tree had even more nodes than the new one: n ≥ 2h − 1 0 n + 1 ≥ 2h 0 log2(n + 1) ≥ h ≥ h/2

h ≤ 2 log2(n + 1) Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 39 / 48 Red-Black Trees

How to make an insertion?

Inserting a node in a non empty red-black tree Insert as in any binary search tree Color the inserted node as red (adding the null black nodes) Recolor and restructure if needed (restore the invariants)

Because the tree is non empty we don’t break the root property Because the inserted node is red, we don’t break the black property The only invariant than can be broken is the red property

I If the parent of the inserted node is black, nothing needs to be done I If the parent is red we now have two consecutive red nodes

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 40 / 48 Red-Black Trees

When the parent of the inserted node is black nothing needs to be done:

Example:

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 41 / 48 Red-Black Trees

Red-Red after insertion (red parent) Case 1.a) The uncle is a black node and the inserted node x is the left child

Description: right rotate the grandfather, followed by swapping the colors between the parent and the grandfather

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 42 / 48 Red-Black Trees

Red-Red after insertion (red parent) Case 1.b) The uncle is a black node and the inserted node x is the right child

Description: left rotation of parent followed by the moves of 1.a [If the parent was the right child of the grandfather, we would have similar cases, but symmetric in relation to these] Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 43 / 48 Red-Black Trees

Red-Red after insertion (red parent) Case 2: The uncle is a red node, with x being the inserted node

Description: swap colors of parent, uncle and grandfather Now, if the father of the grandfather is red, we have a new red-red situation and we can simply apply one of the cases we already know (if the gradparent is the root, we simply color it as black)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 44 / 48 Red-Black Trees

Let’s visualize some insertions (try the indicated url):

https://www.cs.usfca.edu/˜galles/visualization/RedBlack.html

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 45 / 48 Red-Black Trees

The cost of an insertion is therefore O(log n)

I O(log n) to get to the insertion position I O(1) to eventually recolor and restructure

The removals are similar albeit a bit more complicated, but they also cost O(log n) (we will not detail in class, but you can try the visualizations)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 46 / 48 Red-Black Trees

Comparison of Red-Black Trees (RB) with AVL trees

I Both are implemented with balanced binary search trees (search, insertion and removal are O(log n))

I RB are a little bit more unbalanced in the worst case, with height ∼ 2 log(n) vs AVL with height ∼ 1.44 log(n)

I RB may take a little bit more time to search (at the worst case, because of the height)

I RB are a bit faster in insertions/removals on average (”lighter” rebalancing)

I RB spend less memory (RB only need 1 extra bit for color, AVL 2 bits for unbalancedness)

I RB are (probably) more used in the classical programming languages Examples of data structures that use them:

F C++ STL: set, multiset, map, multiset F Java: java.util.TreeMap , java.util.TreeSet F Linux kernel: scheduler, linux/rbtree.h

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 47 / 48 Use in C/C++, Java and other languages

Any typical programming language has an implementation of balanced binary search trees The associated main data structures are:

I set: search, insert and remove elements I multiset: a set with possibly repeated elements I map: (associates a key with a value) ex: associating strings to ints) I : a map with the possibility of repeated keys The nodes may contain any data types as long as they are comparable Because there is relative order between nodes, you can use iterators to traverse the trees in order (ex: in increasing order, from min to max)

Pedro Ribeiro (DCC/FCUP) Balanced Binary Search Trees 2019/2020 48 / 48