BST, Traversals, and Hashing Binary Tree

BST, Traversals, and Hashing Binary Tree ● Specific type of tree ● Every node has 0, 1, or 2 children. Never any more ● Complete binary tree - all nodes except leaves have 2 children ○ Lots of great properties - we will talk about these more when we start to talk about heaps Introduction to Binary Search Tree • Binary Search Tree(BST) is an ADT that uses trees to store an ordered dynamic set of data • BST combines the flexibility of updates in linked list and the high efficiency of search(query) operations in ordered arrays • BST is used widely in computer software because of its efficiency 9/20/2015 3 Definition of BST • A BST is a binary tree in which root • All nodes has a comparable key, • Both left and right subtree are BSTs which satisfy that • Key in any node of the left subtree is less than that of root • Key in any node of the right subtree is larger than left right that of root subtre subtre e e 9/20/2015 4 Operations of BST • 9/20/2015 5 Data structure key size • BST used by the following algorithms are 2 3 left right represented by a set of nodes connected by pointer fields in the node, and each node contains key size key size 1 1 3 1 left right left right • A field recording the value of key NULL NULL NULL NULL • A size field recording the number of nodes in the subtree the node represents • Pointers to its left and right children • If some child does not exist, the relevant pointer is set to NULL 9/20/2015 6 Algorithms/Search • Using recursive strategy Algorithm BSTSearch(k, root) Input: a key k, and a BST represented by • According to property that left root node Output: if a node whose key is k exists, subtree < root < right subtree, return it; else return NULL at most one subtree is if root == NULL then searched and the other is return NULL else if root->key == k then skipped return root else if root->key > k then • The run time is at most return BSTSearch(k, root->left) else // root->key < k BST.height + 1 return BSTSearch(k, root->right) 9/20/2015 7 NUL L 11 < 17 size field is omitted in the diagram BSTSearch(11, t) t 17 5 11 > 5 19 2 10 11 > 10 18 22 1 4 9 16 11 < 16 20 21 3 7 15 11 < 15 6 8 11 11 == 11 13 12 14 9/20/2015 8 NUL L size field is omitted in the diagram BSTSearch(30, t) 30 > 17 t 17 30 > 19 5 19 30 > 22 2 10 18 22 1 4 9 16 20 3 7 15 21 6 8 11 13 12 14 9/20/2015 9 Algorithm BSTRank(k, root) Algorithms/Rank Input: a key k, and a BST represented by root node Output: if a node whose key is k exists, return the number of nodes whose keys are less than k; else return -1 • Similar to search operation, The node exists! if root == NULL The number of nodes because one subtree is skipped return -1 less than k is the size of in each invocation, the run its left subtree else if root->key == k then time is limited to BST.height + if root->left == NULL then 1 return 0 Start from the else left subtree all return root->left->size over again root->key > k else if then The number of nodes return BSTRank(k, root->left) less than k = sizeof(left subtree) + else // root->key < k root + sizeof(nodes rr = BSTRank(k, root->right) less than k in right if rr == -1 then subtree) return -1 else 9/20/2015 return root->size – root->right->size10 + rr Result:14 BSTRank(15, return BSTRank(15, t1) t0) 14 Example BSTRank(15, 17 > 15 size return t1->size - t1->right->size 22 NUL + BSTRank(15, t2) t1) 17 L 5 < 15 9 16 t0 5 BSTRank(15, 5 19 return t2->size - 10 < 15 t2->right->size + BSTRank(15, t2) 4 t1 11 1 3 t3) 2 10 18 22 4 16 > 15 BSTRank(15, 2 return BSTRank(15, t4) 1 2 4 t2 6 t3) 1 4 9 16 20 t3 4 3 5 1 1 21 BSTRank(15,3 7 15 return t4->left->size t4) t4 1 1 4 6 8 11 3 13 1 1 12 14 9/20/2015 11 Result:-1 return BSTRank(16, t1) BSTRank(16, t0) -1 BSTRank(16, Example return -1 23 > 16 size t1) 22 NUL -1 23 L BSTRank(16, 5 < 16 return -1 16 t0 5 t2) 5 25 -1 11 < 16 BSTRank(16,4 t1 11 1 3 return BSTRank(16, t4) 2 10 24 30 t3) 20 > 16 -1 1 2 4 t2 6 2 BSTRank(16,1 4 9 20 27 return -1 t4) 15 < 16 t3 3 5 1 -1 1 29 BSTRank(16,3 7 15 return -1 t5) t4 1 1 4 6 8 11 t5 3 13 1 1 12 14 9/20/2015 12 Insertion • Follow path, add in new node where needed • Build tree on board... Tree Traversals ● We often want to visit every single node in a tree ● While we are looking at BSTs today, these methods work for all trees 1. Preorder 2. Postorder 3. Inorder Preorder traversal void preorder(tree T, node n) { if(n == NULL) return else { print n.value; preorder(n.left); preorder(n.right); } } Postorder traversal void postorder(tree T, node n) { if(n == NULL) return else { postorder(n.left); postorder(n.right); print n.value; } } InOrder Traversal void inorder(tree T, node n) { if(n == NULL) return else { inorder(n.left); print n.value; inorder(n.right); } } Hashing Hashing ● In its simplest form, hashing is a “Mathematical blender” ● An input is passed in and some scrambled version comes out ● We usually want it to: ○ Generate uniformly random values ○ Be one-way ■ Open problem in CS - Do one way functions exist? (P = NP?) ○ Be collision resistant ■ Hard to find two things that hash to the same value ○ Quickly computable (Sometimes the opposite though!) ○ Deterministic (Doesn’t rely on randomness) Hash tables ● Wanting to look things up quickly is a very, very common problem ● We use hashing as a trick to make lookups very fast (on average) ● General idea: ○ When we store an item x in a table, we try to store it at index H(x). i.e. table[H(x)] = x; ○ To find something, start looking at index H(x) ○ We usually find things in O(1) time, which is great! Collisions ● Tricky to deal with - multiple solutions to consider ○ Linear Probing: If the cell is occupied, just traverse the array to find the next open one To search - have to traverse until we find the item or an empty cell ○ Quadratic probing: Similar to linear, but instead of going in order you check for the nth time you check the n2’th cell after it ○ Linked list - Just maintain a linked list at each cell ● Load factor = n/N = Average number of items per cell ● As long as the load factor is “OK” (depends on collision handling), we have O(1) insert and lookup time Cryptographic Hash Functions ● Hashing is very useful for cryptographic purposes ○ These hash functions require a lot of expert analysis. Never, ever, EVER try to write them yourself! (“Don’t roll your own crypto”) ● Used to store passwords, create digital signatures, check for corruption, verify files, etc… ● You’ll spend a good amount of time working with them in a few weeks Password Storage ● Password leaks are a problem Offline Attacks: A Common Problem • Password breaches at major companies have affected millions of users. Password storage ● Uses cryptographic hash functions ● Dont store (username, password) ● Don’t store(username, H(password)) ● Store (username, salt, H(salt + password)) Picking your function is important ● Lots of old functions still floating around ● They have been broken! ○ MD5 ○ SHA1 ● People will argue endlessly over what you should use. Memory Hard Hashing ● Very recent stuff ● Idea: Easy for adversaries to build machines to compute hashes fast ● BUT they don’t really have that advantage with memory ● Make the hash function take up memory in addition to time ● Adversary must now deal with memory delay Examples ● SCRYPT ○ Proven to be maximally memory hard ● Argon2 ○ Winner of Password Hashing Competition So how does hashing make this better? ● Instead of the passwords being leaked in plain format, all that shows is a garbage value ● Yahoo used a cryptographic hash function called BCRYPT ● Instead hashed records are leaked ● Requires a lot more work to find the user’s password - you must find a hash collision! ● (Unless you made your password “password”) Merkle Trees - Preview of next project ● You want to verify a very large file stored remotely ● Say, 1TB ● Method 1 - Check that the hash value is right ● But how do you know remote storage isnt lying to you? Merkle Tree ● Absurd to have to resend file to verify it is stored correctly ● What if we could verify it but just checking a few blocks? ● Merkle tree - Tree of hash values representing a file ● Root node can be reconstructed by requesting certain values from a server Merkle Trees P2 Preview ● You will be implementing a merkle tree and using it to verify files ● Some other work with hash functions as well.

Load more