BST, Traversals, and Hashing Binary Tree
● Specific type of tree
● Every node has 0, 1, or 2 children. Never any more
● Complete binary tree - all nodes except leaves have 2 children ○ Lots of great properties - we will talk about these more when we start to talk about heaps Introduction to Binary Search Tree
• Binary Search Tree(BST) is an ADT that uses trees to store an ordered dynamic set of data • BST combines the flexibility of updates in linked list and the high efficiency of search(query) operations in ordered arrays
• BST is used widely in computer software because of its efficiency
9/20/2015 3 Definition of BST
• A BST is a binary tree in which root
• All nodes has a comparable key, • Both left and right subtree are BSTs which satisfy that
• Key in any node of the left subtree is less than that of root
• Key in any node of the right subtree is larger than left right that of root subtre subtre e e
9/20/2015 4 Operations of BST
•
9/20/2015 5 Data structure
key size • BST used by the following algorithms are 2 3 left right represented by a set of nodes connected by pointer fields in the node, and each node contains key size key size 1 1 3 1 left right left right • A field recording the value of key NULL NULL NULL NULL • A size field recording the number of nodes in the subtree the node represents
• Pointers to its left and right children
• If some child does not exist, the relevant pointer is set to NULL 9/20/2015 6 Algorithms/Search
• Using recursive strategy Algorithm BSTSearch(k, root) Input: a key k, and a BST represented by • According to property that left root node Output: if a node whose key is k exists, subtree < root < right subtree, return it; else return NULL at most one subtree is if root == NULL then searched and the other is return NULL else if root->key == k then
skipped return root else if root->key > k then • The run time is at most return BSTSearch(k, root->left) else // root->key < k BST.height + 1 return BSTSearch(k, root->right)
9/20/2015 7 NUL L 11 < 17 size field is omitted in the diagram BSTSearch(11, t) t 17
5 11 > 5 19
2 10 11 > 10 18 22
1 4 9 16 11 < 16 20
21 3 7 15 11 < 15
6 8 11 11 == 11
13
12 14 9/20/2015 8 NUL L size field is omitted in the diagram BSTSearch(30, t) 30 > 17 t 17 30 > 19
5 19 30 > 22
2 10 18 22
1 4 9 16 20
3 7 15 21
6 8 11
13
12 14 9/20/2015 9 Algorithm BSTRank(k, root) Algorithms/Rank Input: a key k, and a BST represented by root node Output: if a node whose key is k exists, return the number of nodes whose keys are less than k; else return -1 • Similar to search operation, The node exists! if root == NULL The number of nodes because one subtree is skipped return -1 less than k is the size of in each invocation, the run its left subtree else if root->key == k then time is limited to BST.height + if root->left == NULL then 1 return 0 Start from the else left subtree all return root->left->size over again
root->key > k else if then The number of nodes return BSTRank(k, root->left) less than k = sizeof(left subtree) + else // root->key < k root + sizeof(nodes rr = BSTRank(k, root->right) less than k in right if rr == -1 then subtree) return -1 else 9/20/2015 return root->size – root->right->size10 + rr Result:14 BSTRank(15, return BSTRank(15, t1) t0) 14 Example BSTRank(15, 17 > 15 size return t1->size - t1->right->size 22 NUL + BSTRank(15, t2) t1) 17 L 5 < 15 9 16 t0 5 BSTRank(15, 5 19 return t2->size - 10 < 15 t2->right->size + BSTRank(15, t2) 4 t1 11 1 3 t3) 2 10 18 22 4 16 > 15 BSTRank(15, 2 return BSTRank(15, t4) 1 2 4 t2 6 t3) 1 4 9 16 20 t3 4 1 3 5 1 BSTRank(15,3 7 15 21 return t4->left->size t4) t4 1 1 4 6 8 11
3 13
1 1 12 14
9/20/2015 11 Result:-1 return BSTRank(16, t1) BSTRank(16, t0) -1 BSTRank(16, Example return -1 23 > 16 size t1) 22 NUL -1 23 L BSTRank(16, 5 < 16 return -1 16 t0 5 t2) 5 25 -1 11 < 16 BSTRank(16,4 t1 11 1 3 return BSTRank(16, t4) 2 10 24 30 t3) 20 > 16 -1 1 2 4 t2 6 2 BSTRank(16,1 4 9 20 27 return -1 t4) 15 < 16 t3 1 -1 1 3 5 BSTRank(16,3 7 15 29 return -1 t5) t4 1 1 4 6 8 11 t5 3 13
1 1 12 14
9/20/2015 12 Insertion
• Follow path, add in new node where needed
• Build tree on board... Tree Traversals
● We often want to visit every single node in a tree
● While we are looking at BSTs today, these methods work for all trees
1. Preorder 2. Postorder 3. Inorder Preorder traversal void preorder(tree T, node n) { if(n == NULL) return else { print n.value; preorder(n.left); preorder(n.right); } } Postorder traversal void postorder(tree T, node n) { if(n == NULL) return else { postorder(n.left); postorder(n.right); print n.value; } } InOrder Traversal void inorder(tree T, node n) { if(n == NULL) return else { inorder(n.left); print n.value; inorder(n.right); } } Hashing Hashing
● In its simplest form, hashing is a “Mathematical blender” ● An input is passed in and some scrambled version comes out ● We usually want it to: ○ Generate uniformly random values ○ Be one-way ■ Open problem in CS - Do one way functions exist? (P = NP?) ○ Be collision resistant ■ Hard to find two things that hash to the same value ○ Quickly computable (Sometimes the opposite though!) ○ Deterministic (Doesn’t rely on randomness) Hash tables
● Wanting to look things up quickly is a very, very common problem ● We use hashing as a trick to make lookups very fast (on average) ● General idea: ○ When we store an item x in a table, we try to store it at index H(x). i.e. table[H(x)] = x; ○ To find something, start looking at index H(x) ○ We usually find things in O(1) time, which is great! Collisions
● Tricky to deal with - multiple solutions to consider ○ Linear Probing: If the cell is occupied, just traverse the array to find the next open one To search - have to traverse until we find the item or an empty cell ○ Quadratic probing: Similar to linear, but instead of going in order you check for the nth time you check the n2’th cell after it ○ Linked list - Just maintain a linked list at each cell ● Load factor = n/N = Average number of items per cell ● As long as the load factor is “OK” (depends on collision handling), we have O(1) insert and lookup time Cryptographic Hash Functions
● Hashing is very useful for cryptographic purposes ○ These hash functions require a lot of expert analysis. Never, ever, EVER try to write them yourself! (“Don’t roll your own crypto”)
● Used to store passwords, create digital signatures, check for corruption, verify files, etc…
● You’ll spend a good amount of time working with them in a few weeks Password Storage
● Password leaks are a problem Offline Attacks: A Common Problem
• Password breaches at major companies have affected millions of users. Password storage
● Uses cryptographic hash functions
● Dont store (username, password)
● Don’t store(username, H(password))
● Store (username, salt, H(salt + password)) Picking your function is important
● Lots of old functions still floating around
● They have been broken! ○ MD5 ○ SHA1
● People will argue endlessly over what you should use.
Memory Hard Hashing
● Very recent stuff
● Idea: Easy for adversaries to build machines to compute hashes fast
● BUT they don’t really have that advantage with memory
● Make the hash function take up memory in addition to time
● Adversary must now deal with memory delay Examples
● SCRYPT ○ Proven to be maximally memory hard
● Argon2 ○ Winner of Password Hashing Competition So how does hashing make this better?
● Instead of the passwords being leaked in plain format, all that shows is a garbage value ● Yahoo used a cryptographic hash function called BCRYPT ● Instead hashed records are leaked ● Requires a lot more work to find the user’s password - you must find a hash collision! ● (Unless you made your password “password”) Merkle Trees - Preview of next project
● You want to verify a very large file stored remotely
● Say, 1TB
● Method 1 - Check that the hash value is right
● But how do you know remote storage isnt lying to you? Merkle Tree
● Absurd to have to resend file to verify it is stored correctly
● What if we could verify it but just checking a few blocks?
● Merkle tree - Tree of hash values representing a file
● Root node can be reconstructed by requesting certain values from a server Merkle Trees P2 Preview
● You will be implementing a merkle tree and using it to verify files
● Some other work with hash functions as well