BST, Traversals, and Hashing Binary Tree

BST, Traversals, and Hashing Binary Tree ● Specific type of tree ● Every node has 0, 1, or 2 children. Never any more ● Complete binary tree - all nodes except leaves have 2 children ○ Lots of great properties - we will talk about these more when we start to talk about heaps Introduction to Binary Search Tree • Binary Search Tree(BST) is an ADT that uses trees to store an ordered dynamic set of data • BST combines the flexibility of updates in linked list and the high efficiency of search(query) operations in ordered arrays • BST is used widely in computer software because of its efficiency 9/20/2015 3 Definition of BST • A BST is a binary tree in which root • All nodes has a comparable key, • Both left and right subtree are BSTs which satisfy that • Key in any node of the left subtree is less than that of root • Key in any node of the right subtree is larger than left right that of root subtre subtre e e 9/20/2015 4 Operations of BST • 9/20/2015 5 Data structure key size • BST used by the following algorithms are 2 3 left right represented by a set of nodes connected by pointer fields in the node, and each node contains key size key size 1 1 3 1 left right left right • A field recording the value of key NULL NULL NULL NULL • A size field recording the number of nodes in the subtree the node represents • Pointers to its left and right children • If some child does not exist, the relevant pointer is set to NULL 9/20/2015 6 Algorithms/Search • Using recursive strategy Algorithm BSTSearch(k, root) Input: a key k, and a BST represented by • According to property that left root node Output: if a node whose key is k exists, subtree < root < right subtree, return it; else return NULL at most one subtree is if root == NULL then searched and the other is return NULL else if root->key == k then skipped return root else if root->key > k then • The run time is at most return BSTSearch(k, root->left) else // root->key < k BST.height + 1 return BSTSearch(k, root->right) 9/20/2015 7 NUL L 11 < 17 size field is omitted in the diagram BSTSearch(11, t) t 17 5 11 > 5 19 2 10 11 > 10 18 22 1 4 9 16 11 < 16 20 21 3 7 15 11 < 15 6 8 11 11 == 11 13 12 14 9/20/2015 8 NUL L size field is omitted in the diagram BSTSearch(30, t) 30 > 17 t 17 30 > 19 5 19 30 > 22 2 10 18 22 1 4 9 16 20 3 7 15 21 6 8 11 13 12 14 9/20/2015 9 Algorithm BSTRank(k, root) Algorithms/Rank Input: a key k, and a BST represented by root node Output: if a node whose key is k exists, return the number of nodes whose keys are less than k; else return -1 • Similar to search operation, The node exists! if root == NULL The number of nodes because one subtree is skipped return -1 less than k is the size of in each invocation, the run its left subtree else if root->key == k then time is limited to BST.height + if root->left == NULL then 1 return 0 Start from the else left subtree all return root->left->size over again root->key > k else if then The number of nodes return BSTRank(k, root->left) less than k = sizeof(left subtree) + else // root->key < k root + sizeof(nodes rr = BSTRank(k, root->right) less than k in right if rr == -1 then subtree) return -1 else 9/20/2015 return root->size – root->right->size10 + rr Result:14 BSTRank(15, return BSTRank(15, t1) t0) 14 Example BSTRank(15, 17 > 15 size return t1->size - t1->right->size 22 NUL + BSTRank(15, t2) t1) 17 L 5 < 15 9 16 t0 5 BSTRank(15, 5 19 return t2->size - 10 < 15 t2->right->size + BSTRank(15, t2) 4 t1 11 1 3 t3) 2 10 18 22 4 16 > 15 BSTRank(15, 2 return BSTRank(15, t4) 1 2 4 t2 6 t3) 1 4 9 16 20 t3 4 3 5 1 1 21 BSTRank(15,3 7 15 return t4->left->size t4) t4 1 1 4 6 8 11 3 13 1 1 12 14 9/20/2015 11 Result:-1 return BSTRank(16, t1) BSTRank(16, t0) -1 BSTRank(16, Example return -1 23 > 16 size t1) 22 NUL -1 23 L BSTRank(16, 5 < 16 return -1 16 t0 5 t2) 5 25 -1 11 < 16 BSTRank(16,4 t1 11 1 3 return BSTRank(16, t4) 2 10 24 30 t3) 20 > 16 -1 1 2 4 t2 6 2 BSTRank(16,1 4 9 20 27 return -1 t4) 15 < 16 t3 3 5 1 -1 1 29 BSTRank(16,3 7 15 return -1 t5) t4 1 1 4 6 8 11 t5 3 13 1 1 12 14 9/20/2015 12 Insertion • Follow path, add in new node where needed • Build tree on board... Tree Traversals ● We often want to visit every single node in a tree ● While we are looking at BSTs today, these methods work for all trees 1. Preorder 2. Postorder 3. Inorder Preorder traversal void preorder(tree T, node n) { if(n == NULL) return else { print n.value; preorder(n.left); preorder(n.right); } } Postorder traversal void postorder(tree T, node n) { if(n == NULL) return else { postorder(n.left); postorder(n.right); print n.value; } } InOrder Traversal void inorder(tree T, node n) { if(n == NULL) return else { inorder(n.left); print n.value; inorder(n.right); } } Hashing Hashing ● In its simplest form, hashing is a “Mathematical blender” ● An input is passed in and some scrambled version comes out ● We usually want it to: ○ Generate uniformly random values ○ Be one-way ■ Open problem in CS - Do one way functions exist? (P = NP?) ○ Be collision resistant ■ Hard to find two things that hash to the same value ○ Quickly computable (Sometimes the opposite though!) ○ Deterministic (Doesn’t rely on randomness) Hash tables ● Wanting to look things up quickly is a very, very common problem ● We use hashing as a trick to make lookups very fast (on average) ● General idea: ○ When we store an item x in a table, we try to store it at index H(x). i.e. table[H(x)] = x; ○ To find something, start looking at index H(x) ○ We usually find things in O(1) time, which is great! Collisions ● Tricky to deal with - multiple solutions to consider ○ Linear Probing: If the cell is occupied, just traverse the array to find the next open one To search - have to traverse until we find the item or an empty cell ○ Quadratic probing: Similar to linear, but instead of going in order you check for the nth time you check the n2’th cell after it ○ Linked list - Just maintain a linked list at each cell ● Load factor = n/N = Average number of items per cell ● As long as the load factor is “OK” (depends on collision handling), we have O(1) insert and lookup time Cryptographic Hash Functions ● Hashing is very useful for cryptographic purposes ○ These hash functions require a lot of expert analysis. Never, ever, EVER try to write them yourself! (“Don’t roll your own crypto”) ● Used to store passwords, create digital signatures, check for corruption, verify files, etc… ● You’ll spend a good amount of time working with them in a few weeks Password Storage ● Password leaks are a problem Offline Attacks: A Common Problem • Password breaches at major companies have affected millions of users. Password storage ● Uses cryptographic hash functions ● Dont store (username, password) ● Don’t store(username, H(password)) ● Store (username, salt, H(salt + password)) Picking your function is important ● Lots of old functions still floating around ● They have been broken! ○ MD5 ○ SHA1 ● People will argue endlessly over what you should use. Memory Hard Hashing ● Very recent stuff ● Idea: Easy for adversaries to build machines to compute hashes fast ● BUT they don’t really have that advantage with memory ● Make the hash function take up memory in addition to time ● Adversary must now deal with memory delay Examples ● SCRYPT ○ Proven to be maximally memory hard ● Argon2 ○ Winner of Password Hashing Competition So how does hashing make this better? ● Instead of the passwords being leaked in plain format, all that shows is a garbage value ● Yahoo used a cryptographic hash function called BCRYPT ● Instead hashed records are leaked ● Requires a lot more work to find the user’s password - you must find a hash collision! ● (Unless you made your password “password”) Merkle Trees - Preview of next project ● You want to verify a very large file stored remotely ● Say, 1TB ● Method 1 - Check that the hash value is right ● But how do you know remote storage isnt lying to you? Merkle Tree ● Absurd to have to resend file to verify it is stored correctly ● What if we could verify it but just checking a few blocks? ● Merkle tree - Tree of hash values representing a file ● Root node can be reconstructed by requesting certain values from a server Merkle Trees P2 Preview ● You will be implementing a merkle tree and using it to verify files ● Some other work with hash functions as well.

BST, Traversals, and Hashing Binary Tree

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support