File Processing & Organization Course No: 5901227-3

Lecture 8 and Splay Trees Tries

• A , also called digital //prefix tree. • It is an ordered tree that is used to store a of strings. • The term trie comes from retrieval. • Properties of a trie: – It is a multi-way tree. – The root of the tree is associated with the empty string . – Each edge of the tree is labeled with a character. – Each leaf node corresponds to the stored string, which is a concatenation of characters on a path from the root to this node. Trie Example 1 • Set of strings: {bear, bid, bulk, bull, sun, sunday}. s • all strings end with “$” b e u u i a n d r l d $ $ l a $ k y $ $ $ • In some trees the nodes can store the keys, that is, the characters on the path from root to the node. Trie Example 2 [nodes store the keys]

A trie for keys "A", "tea", "to", "ted", "ten", "i", "in", and "inn". Trie Example 3

• For strings – { bear, bell, bid, bull, buy, sell, stock, stop }

b s

e i u e t

a l d l y l o

r l l l c p

k • Tries are most commonly used for character strings. But they can be used for other data as well. • A bitwise trie is keyed on the individual bits making up an integer number or memory address. Compact/ Compressed Tries

– Replace a chain of one-child nodes with an edge labeled with a string – Each non-leaf node (except root) has at least two children

b s b sun e u u i ear$ a n day$ id$ ul $ r d l d $ $ l l$ $ a k k$ y $ $ $ Compact Tries

• Compact representation of a compressed trie • Approach – For an array of strings S = S[0], … S[S-1] – Store ranges of indices at each node • Instead of substring – Represent as a triplet of integers (i, j, k) • Such that X = s[i][j..k] – Example: S[0] = “abcd”, (0,1,2) = “bc” • Properties – Uses O(s) space, where s = number of strings in the array – Serves as an auxiliary index structure Compact Representation Example

• . 0 1 2 3 4 0 1 2 3 0 1 2 3 S[0] = s e e S[4] = b u l l S[7] = h e a r S[1] = b e a r S[5] = b u y S[8] = b e l l S[2] = s e l l S[6] = b i d S[9] = s t o p S[3] = s t o c k

1, 0, 0 7, 0, 3 0, 0, 0

1, 1, 1 6, 1, 2 4, 1, 1 0, 1, 1 3, 1, 2

1, 2, 3 8, 2, 3 4, 2, 3 5, 2, 2 0, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3 Search and Insertion in Tries

Trie-Search(t, P[k..m]) //searches string P in t 01 if t is leaf then return true 02 else if t.child(P[k])=nil then return false 03 else return Trie-Search(t.child(P[k]), P[k+1..m])

■ The just follows the path down the tree (starting with Trie-Search(root, P[0..m]))

Trie-Insert(t, P[k..m]) 01 if t is not leaf then //otherwise P is already present 02 if t.child(P[k])=nil then 03 Create a new child of t and a “branch” starting with that child and storing P[k..m] 04 else Trie-Insert(t.child(P[k]), P[k+1..m])

10 Splay Trees • Splay trees are binary search trees (BSTs) that: – Are not perfectly balanced all the time – Allow search and insertion operations to try to balance the tree so that future operations may run faster

• Based on the heuristic: – If X is accessed once, it is likely to be accessed again. – After node X is accessed, perform “splaying” operations to bring X up to the root of the tree. – Do this in a way that leaves the tree more or less balanced as a whole.

11 Motivating Example

Root Root

15 After Search(12) 12 2 6 18 Splay idea: Get 12 6 15 1 up to the root 3 12 using rotations 3 9 14 18 9 14 After splaying with 12 Initial tree • Not only splaying with 12 makes the tree balanced, subsequent accesses for 12 will take O(1) time. • Active (recently accessed) nodes will move towards the root and inactive nodes will slowly move further from the root

12 Terminology • Let X be a non-root node, i.e., has at least 1 ancestor. • Let P be its parent node. • Let G be its grandparent node (if it exists) • Consider a path from G to X: – Each time we go left, we say that we “zig” – Each time we go right, we say that we “zag” • There are 6 possible cases:

G G G G

P P P P P P

X X X X X X 1. zig 2. zig-zig 3. zig-zag 4. zag-zig 5. zag-zag 6. zag

13 Splay Tree Operations • When node X is accessed, apply one of six rotation operations: – Single Rotations (X has a P but no G) • zig, zag

– Double Rotations (X has both a P and a G) • zig-zig, zig-zag • zag-zig, zag-zag

14 Splay Trees: Zig Operation • “Zig” is just a single rotation, as in an AVL tree • Suppose 6 was the node that was accessed (e.g. using Search)

15 6 Zig-Right 6 18 3 15

3 12 12 18

• “Zig-Right” moves 6 to the root. • Can access 6 faster next time: O(1) • Notice that this is simply a right rotation in AVL tree terminology. 15 Splay Trees: Zig-Zig Operation • “Zig-Zig” consists of two single rotations of the same type • Suppose 3 was the node that was accessed (e.g., using Search)

15 6 3 Zig-Right Zig-Right 1 6 6 18 3 15

4 15 3 12 1 4 12 18

1 4 12 18

• Due to “zig-zig” splaying, 3 has bubbled to the top! • Note: Parent-Grandparent is rotated first.

16 Splay Trees: Zig-Zag Operation • “Zig-Zag” consists of two rotations of the opposite type • Suppose 12 was the node that was accessed (e.g., using Search)

15 Zag-Left 15 Zig-Right 12

6 18 12 18 6 15

3 10 14 18 3 12 6 14

10 14 3 10

• Due to “zig-zag” splaying, 12 has bubbled to the top! • Notice that this is simply an LR imbalance correction in AVL tree terminology (first a left rotation, then a right rotation)

17 Splay Trees: Zag-Zig Operation • “Zag-Zig” consists of two rotations of the opposite type • Suppose 17 was the node that was accessed (e.g., using Search)

15 15 Zig-Right Zag-Left 17

15 6 20 6 17 20 6 16 18 30 17 30 16 20

18 30 16 18

• Due to “zag-zig” splaying, 17 has bubbled to the top! • Notice that this is simply an RL imbalance correction in AVL tree terminology (first a right rotation, then a left rotation) 18 Splay Trees: Zag-Zag Operation • “Zag-Zag” consists of two single rotations of the same type • Suppose 30 was the node that was accessed (e.g., using Search)

15 Zag-Left 20 Zag-Left 30

15 30 6 20 20 40

6 17 25 40 15 25 17 30

6 17 25 40

• Due to “zag-zag” splaying, 30 has bubbled to the top! • Note: Parent-Grandparent is rotated first.

19 Splay Trees: Zag Operation • “Zag” is just a single rotation, as in an AVL tree • Suppose 15 was the node that was accessed (e.g., using Search)

6 15 Zag-Left 3 15 6 18

12 18 3 12

• “Zag-Left”moves 15 to the root. • Can access 15 faster next time: O(1) • Notice that this is simply a left rotation in AVL tree terminology 20 Splaying during other operations • Splaying can be done not just after Search, but also after other operations such as Insert/Delete.

• Insert X: After inserting X at a leaf node (as in a regular BST), splay X up to the root

• Delete X: Do a Search on X and get X up to the root. Delete X at the root and move the largest item in its left sub-tree, i.e, its predecessor, to the root using splaying.

• Note on Search X: If X was not found, splay the leaf node that the Search ended up with to the root.

21