<<

Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees

„ Binary search operations such as insert, delete, retrieve, etc. depend on the length of the path to the desired node

„ The path length can vary from log2(n+1) to O(n) depending on how balanced or unbalanced the tree is „ The shape of the tree is determined by the values of the items and the order in which they were inserted

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 2 Examples

10 40 20 20 60 30

40 10 30 50 70 50 Can you get the same tree with 60 different insertion orders?

70

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 3 2-3 Trees

„ Each internal node has two or three children „ All leaves are at the same level „ 2 children = 2-node, 3 children = 3-node

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 4 2-3 Trees (continued)

„ 2-3 trees are not binary trees (duh) „ A 2-3 tree of height h always has at least 2h-1 nodes „ i.e. greater than or equal to a binary tree of height h „ A 2-3 tree with n nodes has height less than or equal

to log2(n+1) „ i.e less than or equal to the height of a full binary tree with n nodes

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 5 Definition of 2-3 Trees

„ T is a 2-3 tree of height h if „ T is empty (height 0), OR „ T is of the form r

TL TR

„ Where r is a node that contains one item and TL and TR are 2-3 trees, each of height h-1, and the search key of r is greater than any in TL and less than any in TR, OR

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 6 Definition of 2-3 Trees (continued)

„ T is of the form r

TL TM TR

„ Where r is a node that contains two data items and

TL, TM, and TR are 2-3 trees, each of height h-1, and the smaller search key of r is greater than any in TL and less than any in TM and the larger search key in r is greater than any in TM and smaller than any in TR.

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 7 Placing Data in a 2-3 tree

1. A 2-node must contain „ a single data item whose value is „ greater than any in its left subtree, and „ smaller than any in its right subtree 2. A 3-mode must contain two data items, „ the smaller of which is „ greater than any in its left subtree, and „ smaller than any in its middle subtree, and „ the greater of which is „ greater than any in its middle subtree, and „ smaller than any in its right subtree 3. A leaf may contain either one or two data items

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 8 2-3 Node Code import searchkeys.*; public class Tree23Node { private KeyedItem smallitem; private KeyedItem largeItem; private Tree23Node leftChild; private Tree23Node middleChild; private Tree23Node rightChild; }

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 9 Traversing a 2-3 Tree

„ Just like a binary tree: preorder, inorder, postorder

50 90

20 70 120 150

10 30 40 60 80 100 110 130 140 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 10 Searching a 2-3 tree

„ Same efficiency as a balanced binary : O(logn) „ BUT, easy to keep balanced, unlike binary search trees „ This means that as you insert elements, the balance is easily maintained, and worst case performance remains O(logn)

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 11 Inserting 39 Into A 2-3 Tree

50

30 70 90

10 20 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 12 Inserting 38

50

30 70 90

10 20 39 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 13 Inserting 38 (continued)

50

30 70 90

10 20 38 39 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 14 Inserting 37

50

30 39 70 90

10 20 38 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 15 Inserting 36

50

30 39 70 90

10 20 37 38 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 16 Inserting 36 (continued)

50

30 39 70 90

10 20 36 37 38 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 17 Inserting 36 (continued)

50

30 37 39 70 90

10 20 36 38 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 18 Inserting 75

37 50

30 39 70 90

10 20 36 38 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 19 Inserting 77

37 50

30 39 70 90

10 20 36 38 40 60 75 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 20 Inserting 77 (continued)

37 50

30 39 70 90

10 20 36 38 40 60 75 77 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 21 Inserting 77 (continued)

37 50

30 39 70 77 90

10 20 36 38 40 60 75 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 22 Inserting 77 (continued)

37 50 77

30 39 70 90

10 20 36 38 40 60 75 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 23 Inserting 77 (continued)

50

37 77

30 39 70 90

10 20 36 38 40 60 75 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 24 Inserting Into A 2-3 Tree

„ Insert into the leaf node in which the search key belongs „ If the leaf has two values, stop „ If the leaf has three values, split the node into two nodes with the smallest and largest values, and „ Push the middle value into the parent node „ Continue with the parent node until either „ you push a value into a node that had only one value, or „ you create a new root node

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 25 Deleting From A 2-3 Tree

„ The inverse of inserting „ Delete the value (in a leaf), then „ Merge empty nodes „ If necessary, delete empty root

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 26 Redistribute I

P L

S L S P

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 27 Merge I

L

S S L

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 28 Redistribute II

P L

S L S P

„a „b „c „d „a „b „c „d

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 29 Merge II

L

S S L

„c „a „b „a „b „c

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 30 Delete

S L

S L

„a „b „c

„a „b „c

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 31 2-3 Trees: Results

„ Slightly more complicated than binary search trees, BUT „ 2-3 Trees are always balanced „ Every operation takes O(logn)

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 32 2-3-4 Trees

„ Slightly less complicated than 2-3 Trees „ Each node can contain 1–3 values and have 1–4 children „ Inserting „ Split 4-nodes on the way down „ Insert into leaf „ Deleting: „ Only delete from 3-node or 4-node

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 33 Red-Black Trees

„ 2-3 Trees are always balanced „ O(logn) time for all operations „ 2-3-4 Trees are always balanced „ O(logn) time for all operations, and „ Insertion and deletion can be done in a single pass from root to leaf „ But, require slightly more storage per node „ Red-Black Trees have the advantage of 2-3-4 trees, without the overhead „ Represent the 2-3-4 Tree as a binary tree with colored references (red or black)

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 34 Red-Black Representation of a 4-Node

M

S M L S L

„a „b „c „d „a „b „c „d

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 35 Red-Black Representation of a 3-Node

L

S „c S L „a „b

„a „b „c S

„a L

„b „c

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 36 2-3-4 Tree

37 50

30 35 39 70 90

10 20 32 33 34 36 38 40 60 80 160

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 37 Equivalent Red-Black Tree

37

30 50

20 35 39 90

10 33 36 38 40 70 100

32 34 60 80

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 38 Red-Black Tree Node public class RBTreeNode { public static final int RED = 0; public static final int BLACK = 1;

private KeyedItem item; private RBTreeNode leftChild; private RBTreeNode rightChild; private int leftColor; private int rightColor; }

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 39 Searching and Traversing Red-Black Trees

„ Red-Black trees are binary search trees „ Just search them the same way you would any other „ Inserting „ Split 4-nodes on the way down by changing paired red child references to black „ Insert into a leaf

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 40 Splitting a 4-node root

M M

S L S L

„a „b „c „d „a „b „c „d

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 41 Splitting a 4-node

P P

M „e M „e

S L S L

„a „b „c „d „a „b „c „d

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 42 Splitting a 4-node

P P

„a „a M M

S L S L

„b „c „d „e „b „c „d „e

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 43 Splitting a 4-node

P P

M Q M Q

„e „f S L „e „f S L

„a „b „c „d „a „b „c „d

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 44 Splitting a 4-node

Q P

P „f M Q

M „e S L „e „f S L

„a „b „c „d

„a „b „c „d

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 45 Splitting a 4-node

P M „a Q P Q

M „f „a S L „f S L

„b „c „d „e

„b „c „d „e

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 46 AVL Trees

„ Insert in the appropriate spot „ Rotate as necessary to restore balance „ Rotate your way back up the tree „ Every operation is O(logn)

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 47 Hashing

„ Trees allow for efficient table operations, but „ What if you want O(1) behavior? „ What if time is much more critical than space? „ Basic idea: Array „ Same for insert, lookup, delete

index Search Hash Number Key Function in 0..n-1

“Hash Table”

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 48 Issues

„ How to create a „ Easy or difficult, depending upon the desired properties „ How large the array should be „ Factors: how many items will be stored, how much memory you have, how fast you want the operations to be „ Static or dynamic „ Hash collisions: what if two input values produce the same hash value?

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 49 Hash Functions

„ Goals „ Fast, easy to compute „ Distributes hash values evenly in the target range „ Possible functions for an n-entry hash table „ h(k) = random number in 0..n-1 „ h(k) = the sum of the digits in k

„ h(k) = the first m digits of k, where m = log10(n) „ h(k) = k×mod(n) „ Are these good or bad? Why?

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 50 Hash Functions on Strings

„ Need to convert string to a number „ Possible solutions: „ Add the binary representations of the characters „ Concatenate the binary representations to get a very large number 4 3 2 „ Hello = H×32 + e×32 + l×32 + l×32 + o „ Horner’s rule: (((H×32 + e) × 32 + l) × 32 + l) × 32 + o „ This is a very big number, so „ (A × B) mod n = (A mod n × B mod n) mod n „ (A + B) mod n = (A mod n + B mod n) mod n „ Apply the modulo operator early and often to keep the number small

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 51 Dealing with Hash Collisions

„ Open addressing: Collision ⇒ try to place the object in other locations in a predictable sequence „ : search starting from the current location „ Issues: wrapping, slow, empty/deleted entries, clustering „ Quadratic probing: search +1,4,9,16,25modn „ : Use second hash function to determine the size of the steps:

„ h2(k) ≠ 0 „ h2 ≠ h1 „ Example: h1(k) = k mod 11, h2(k) = 7 – (k mod 7) „ Note: size of steps and size of table must be relatively prime so that all entries get visited „ Use prime size, step, etc.

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 52 Increase The Size of the Hash Table

„ Increasing the actual size is infeasible „ Everything must be rehashed and moved

„ Chain-bucket hashing „ Each entry in a hash table is a bucket into which multiple values may be added „ Each bucket can be implemented as a chain, or

„ Now the size of the hash table is variable

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 53 Efficiency of Hashing

„ Load factor

α = Number _ of _ items Size _ of _ table

„ α is a measure of how full the table is „ Small α ⇒ little chance of collision and low search time „ Large α ⇒ high chance of collision and high search time

„ Unsuccessful searches require more time than successful searches

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 54 Linear Probing

„ Successful search 1  + 1  1  2  1−α 

„ Unsuccessful search

1  + 1  1 2  2  ()1−α 

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 55 Quadratic Probing and Double Hashing

„ Successful search − ()−α loge 1 α

„ Unsuccessful search 1 1−α

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 56 Chain-Bucket Hashing

„ Successful search α 1+ 2

„ Unsuccessful search α

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 57 How well does a hash function work?

„ How fast is it to compute? „ How well does it scatter random data? „ How well does it scatter non-random data? „ This can be very important „ It is always possible to construct a worst case „ General principles: „ The hash function should involve the whole search key „ If a hash function involves a , the base should be prime

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 58 Hashing vs. Binary Trees

„ Hashing supports efficient insert and remove „ Hashing does not support efficient sorting „ Hashing does not support efficient range queries

CMPS 12B, UC Santa Cruz Binary searching & introduction to trees 59