<<

BST, Traversals, and Hashing Binary

● Specific type of tree

● Every node has 0, 1, or 2 children. Never any more

● Complete - all nodes except leaves have 2 children ○ Lots of great properties - we will talk about these more when we start to talk about heaps Introduction to Binary

(BST) is an ADT that uses trees to store an ordered dynamic of • BST combines the flexibility of updates in and the high efficiency of search(query) operations in ordered arrays

• BST is used widely in computer software because of its efficiency

9/20/2015 3 Definition of BST

• A BST is a binary tree in which root

• All nodes has a comparable , • Both left and right subtree are BSTs which satisfy that

• Key in any node of the left subtree is less than that of root

• Key in any node of the right subtree is larger than left right that of root subtre subtre e e

9/20/2015 4 Operations of BST

9/20/2015 5

key size • BST used by the following algorithms are 2 3 left right represented by a set of nodes connected by pointer fields in the node, and each node contains key size key size 1 1 3 1 left right left right • A field recording the value of key NULL NULL NULL NULL • A size field recording the number of nodes in the subtree the node represents

• Pointers to its left and right children

• If some child does not exist, the relevant pointer is set to NULL 9/20/2015 6 Algorithms/Search

• Using recursive strategy Algorithm BSTSearch(k, root) Input: a key k, and a BST represented by • According to property that left root node Output: if a node whose key is k exists, subtree < root < right subtree, return it; else return NULL at most one subtree is if root == NULL then searched and the other is return NULL else if root->key == k then

skipped return root else if root->key > k then • The run time is at most return BSTSearch(k, root->left) else // root->key < k BST.height + 1 return BSTSearch(k, root->right)

9/20/2015 7 NUL L 11 < 17 size field is omitted in the diagram BSTSearch(11, t) t 17

5 11 > 5 19

2 10 11 > 10 18 22

1 4 9 16 11 < 16 20

21 3 7 15 11 < 15

6 8 11 11 == 11

13

12 14 9/20/2015 8 NUL L size field is omitted in the diagram BSTSearch(30, t) 30 > 17 t 17 30 > 19

5 19 30 > 22

2 10 18 22

1 4 9 16 20

3 7 15 21

6 8 11

13

12 14 9/20/2015 9 Algorithm BSTRank(k, root) Algorithms/Rank Input: a key k, and a BST represented by root node Output: if a node whose key is k exists, return the number of nodes whose keys are less than k; else return -1 • Similar to search operation, The node exists! if root == NULL The number of nodes because one subtree is skipped return -1 less than k is the size of in each invocation, the run its left subtree else if root->key == k then time is limited to BST.height + if root->left == NULL then 1 return 0 Start from the else left subtree all return root->left->size over again

root->key > k else if then The number of nodes return BSTRank(k, root->left) less than k = sizeof(left subtree) + else // root->key < k root + sizeof(nodes rr = BSTRank(k, root->right) less than k in right if rr == -1 then subtree) return -1 else 9/20/2015 return root->size – root->right->size10 + rr Result:14 BSTRank(15, return BSTRank(15, t1) t0) 14 Example BSTRank(15, 17 > 15 size return t1->size - t1->right->size 22 NUL + BSTRank(15, t2) t1) 17 L 5 < 15 9 16 t0 5 BSTRank(15, 5 19 return t2->size - 10 < 15 t2->right->size + BSTRank(15, t2) 4 t1 11 1 3 t3) 2 10 18 22 4 16 > 15 BSTRank(15, 2 return BSTRank(15, t4) 1 2 4 t2 6 t3) 1 4 9 16 20 t3 4 1 3 5 1 BSTRank(15,3 7 15 21 return t4->left->size t4) t4 1 1 4 6 8 11

3 13

1 1 12 14

9/20/2015 11 Result:-1 return BSTRank(16, t1) BSTRank(16, t0) -1 BSTRank(16, Example return -1 23 > 16 size t1) 22 NUL -1 23 L BSTRank(16, 5 < 16 return -1 16 t0 5 t2) 5 25 -1 11 < 16 BSTRank(16,4 t1 11 1 3 return BSTRank(16, t4) 2 10 24 30 t3) 20 > 16 -1 1 2 4 t2 6 2 BSTRank(16,1 4 9 20 27 return -1 t4) 15 < 16 t3 1 -1 1 3 5 BSTRank(16,3 7 15 29 return -1 t5) t4 1 1 4 6 8 11 t5 3 13

1 1 12 14

9/20/2015 12 Insertion

• Follow path, add in new node where needed

• Build tree on board... Tree Traversals

● We often want to visit every single node in a tree

● While we are looking at BSTs today, these methods work for all trees

1. Preorder 2. Postorder 3. Inorder Preorder traversal void preorder(tree T, node n) { if(n == NULL) return else { print n.value; preorder(n.left); preorder(n.right); } } Postorder traversal void postorder(tree T, node n) { if(n == NULL) return else { postorder(n.left); postorder(n.right); print n.value; } } InOrder Traversal void inorder(tree T, node n) { if(n == NULL) return else { inorder(n.left); print n.value; inorder(n.right); } } Hashing Hashing

● In its simplest form, hashing is a “Mathematical blender” ● An input is passed in and some scrambled version comes out ● We usually want it to: ○ Generate uniformly random values ○ Be one-way ■ Open problem in CS - Do one way functions exist? (P = NP?) ○ Be collision resistant ■ Hard to find two things that hash to the same value ○ Quickly computable (Sometimes the opposite though!) ○ Deterministic (Doesn’t rely on randomness) Hash tables

● Wanting to look things up quickly is a very, very common problem ● We use hashing as a trick to make lookups very fast (on average) ● General idea: ○ When we store an item x in a table, we try to store it at index H(x). i.e. table[H(x)] = x; ○ To find something, start looking at index H(x) ○ We usually find things in O(1) time, which is great! Collisions

● Tricky to deal with - multiple solutions to consider ○ : If the cell is occupied, just traverse the array to find the next open one To search - have to traverse until we find the item or an empty cell ○ Quadratic probing: Similar to linear, but instead of going in order you check for the nth time you check the n2’th cell after it ○ Linked list - Just maintain a linked list at each cell ● Load factor = n/N = Average number of items per cell ● As long as the load factor is “OK” (depends on collision handling), we have O(1) insert and lookup time Cryptographic Hash Functions

● Hashing is very useful for cryptographic purposes ○ These hash functions require a lot of expert analysis. Never, ever, EVER try to write them yourself! (“Don’t roll your own crypto”)

● Used to store passwords, create digital signatures, check for corruption, verify files, etc…

● You’ll spend a good amount of time working with them in a few weeks Password Storage

● Password leaks are a problem Offline Attacks: A Common Problem

• Password breaches at major companies have affected millions of users. Password storage

● Uses cryptographic hash functions

● Dont store (username, password)

● Don’t store(username, H(password))

● Store (username, , H(salt + password)) Picking your function is important

● Lots of old functions still floating around

● They have been broken! ○ MD5 ○ SHA1

● People will argue endlessly over what you should use.

Memory Hard Hashing

● Very recent stuff

● Idea: Easy for adversaries to build machines to compute hashes fast

● BUT they don’t really have that advantage with memory

● Make the take up memory in addition to time

● Adversary must now deal with memory delay Examples

○ Proven to be maximally memory hard

○ Winner of Password Hashing Competition So how does hashing make this better?

● Instead of the passwords being leaked in plain format, all that shows is a garbage value ● Yahoo used a cryptographic hash function called ● Instead hashed records are leaked ● Requires a lot more work to find the user’s password - you must find a hash collision! ● (Unless you made your password “password”) Merkle Trees - Preview of next project

● You want to verify a very large file stored remotely

● Say, 1TB

● Method 1 - Check that the hash value is right

● But how do you know remote storage isnt lying to you?

● Absurd to have to resend file to verify it is stored correctly

● What if we could verify it but just checking a few blocks?

● Merkle tree - Tree of hash values representing a file

● Root node can be reconstructed by requesting certain values from a server Merkle Trees P2 Preview

● You will be implementing a merkle tree and using it to verify files

● Some other work with hash functions as well