Sets & Hash Tables

Sets & Hash Tables

Sets & Hash Tables What are sets? Week 13 ! A set is a collection of objects of the same type that has the following two properties: Weiss: 20 - there are no duplicates in the collection Main & Savitch: 3, 12.2-3 - the order of the objects in the collection is irrelevant. CS 5301 ! {6,9,11,-5} and {11,9,6,-5} are equivalent. Fall 2014 ! There is no first element, and no successor Jill Seaman of 9. 1 2 Set Operations Set Operations ! Set construction ! Member(element): boolean - the empty set (0 elements in the set) - True, if element is a member of the set; false, ! isEmpty() otherwise - True, if the set is empty; false, otherwise. ! Union(Set1,Set2): Set ! Insert(element) - returns a Set containing all elements of the two Sets, no duplications. - If element is already in the set, do nothing; otherwise add it to the set ! Intersection(Set1,Set2): Set ! Delete(element) - returns a Set containing all elements common to both sets. - If element is not a member of the set, do nothing; otherwise remove it from the set. 3 4 Set Operations Implementation ! Difference(Set1,Set2): Set ! Array of elements implementation - returns a Set containing all elements of the first - each element of the set will occupy an element of set except for the elements that are in common the array. with the second set. - the member (find) operation will be inefficient, must use linear search. ! Subset(Set1,Set2): boolean ! see Lab 6, exercise 1 - True, if Set2 is a subset of Set1 (if all elements of the Set2 are also elements of Set1). - represented a set of integers, but called it a “Batch” - class contained a pointer to a dynamically allocated array of ints ! Exercise: implement all of the set operations for 5 this set 6 Implementation Implementation ! Boolean array implementation ! Boolean array implementation - size of the array must be equal to number of all - need a mapping function to convert an element possible elements (the universe). of the universe to a position in the array //This array will represent a set of days of the week // (Sunday, Monday, Tuesday, . .) int map(string x) { if (x=="Sunday") return 0; bool daysOfWeek[7] = {false}; //sets all elements to false if (x=="Monday") return 1; if (x=="Tuesday") return 2; if (x=="Wednesday") return 3; - Here is the set {Monday, Wednesday, Friday}: if (x=="Thursday") return 4; if (x=="Friday") return 5; if (x=="Saturday") return 6; FALSE TRUE FALSE TRUE FALSE TRUE FALSE } Sunday Monday Tuesday Wednesday Thursday Friday Saturday 0 1 2 3 4 5 6 - if daysOfWeek[map(“Monday”)] is true, then - if daysOfWeek[1] is true, then Monday is in the Set. Monday is in the Set. 7 8 Implementation What are hash tables? ! Boolean array implementation: member ! A Hash Table is used to implement a set (or bool member(string x) { int pos = map(x); a search table), providing basic operations if (0<=pos && pos<7) return daysOfWeek[pos]; in constant time: return false; } - insert - delete (optional) ! Boolean array implementation: union - find (also called “member”) // c will be the union of a and b: void union(bool a[], bool b[], bool c[]) { for (int pos=0; pos<7; pos++) - makeEmpty (need not be constant time) // if either a or b is true for pos, make c true for pos c[pos] = (a[pos] || b[pos]); ! It uses a function that maps an object in the } set (a key) to its location in the table. - Exercise: implement all of the set operations for ! The function is called a hash function. the set implemented as a boolean array 9 10 Using a hash function Placing elements in the array values values [ 0 ] Empty HandyParts company [ 0 ] Empty Use the hash function [ 1 ] 4501 makes no more than 100 [ 1 ] 4501 [ 2 ] Empty different parts. But the [ 2 ] Empty Hash(partNum) = partNum % 100 parts all have four digit numbers. [ 3 ] 8903 [ 3 ] 8903 7803 7803 [ 4 ] [ 4 ] to place the element with Empty This hash function can be used to Empty part number 5502 in the . 8 . 8 . store and retrieve parts in an array. array. .10 10. Hash(partNum) = partNum % 100 . [ 97] [ 97] Empty Empty [ 98] [ 98] 2298 [ 99] [ 99] 2298 3699 3699 41 42 Placing elements in the array How to resolve the collision? values values [ 0 ] Empty Next place part number [ 0 ] Empty One way is by linear probing. [ 1 ] 4501 6702 in the array. [ 1 ] 4501 This uses the following function [ 2 ] 5502 [ 2 ] 5502 [ 3 ] Hash(partNum) = partNum % 100 [ 3 ] (HashValue + 1) % 100 7803 7803 [ 4 ] [ 4 ] Empty 6702 % 100 = 2 Empty repeatedly until an empty location . is found for part number 6702. But values[2] is already . occupied. [ 97] [ 97] Empty Empty [ 98] [ 98] COLLISION OCCURS [ 99] 2298 [ 99] 2298 3699 3699 43 44 Resolving the collision Collision resolved values values [ 0 ] Empty Still looking for a place for 6702 [ 0 ] Empty Part 6702 can be placed at [ 1 ] 4501 using the function [ 1 ] 4501 the location with index 4. [ 2 ] 5502 [ 2 ] 5502 [ 3 ] (HashValue + 1) % 100 [ 3 ] 7803 7803 [ 4 ] [ 4 ] Empty Empty . [ 97] [ 97] Empty Empty [ 98] [ 98] [ 99] 2298 [ 99] 2298 3699 3699 45 46 Collision resolved Hashing concepts ! values Hash Table: (usually an array) where objects are stored according to their key [ 0 ] Empty Part 6702 is placed at [ 1 ] 4501 the location with index 4. - key: attribute of an object used for searching/ sorting [ 2 ] 5502 [ 3 ] - number of valid keys usually greater than number 7803 [ 4 ] Where would the part with of slots in the table 6702 number 4598 be placed using . - number of keys in use usually much smaller than . linear probing? . table size. ! . Hash function: maps a key to a Table index [ 97] Empty ! [ 98] Collision: when two separate keys hash to the [ 99] 2298 same location 18 3699 47 Hashing concepts Implementation ! Standard array implementation ! Collision resolution: method for finding an open spot in the table for a key that has - keys are ints: collided with another key already in the table. class HashTable { private: ! struct Entry { Load Factor: the fraction of the hash table int value; int status; // 0 = open 1 = occupied 2 = deleted that is full }; Entry *array; //array of elements - may be given as a percentage: 50% int size; //size of array int hash (int key) ; // maps key to position in array - may be given as a fraction in the range from 0 to public: 1, as in: .5 HashTable (int); //initialize elements status to 0 ~HashTable(); bool find(int); //return true if int in table void insert (int); //add int to table 19 void display(); //show elements in table 20 void remove(int) //remove int from table }; Hash Function Collision Resolution: Linear Probing ! Goals: ! Insert: When there is a collision, search - computation should be fast sequentially for the next open slot - should minimize collisions (good distribution) ! Find: if the key is not at the hashed location, ! Some issues: keep searching sequentially for it. - should depend on ALL of the key - if it reaches an open slot, the key is not found (not just the last 2 digits or first 3 characters, which may ! not themselves be well distributed) Remove: if the key is not at the hashed location, keep searching sequentially for it. ! Final step of hash function is usually: temp % size - if the key is found, set the status to open - temp is some intermediate result ! - size is the hash table size Problem: Removing an element in the middle of a chain. The Find method needs to know to - ensures the value is a valid location in the table 21 keep searching to the end of the chain. 22 Remove Solution Linear Probing: Example ! Remove: if the key is not at the hashed ! Insert: 89, 18, 49, 58, 69, hash(k) = k mod 10 location, keep searching sequentially for it. Probing function (attempt i): hi(K) = (hash(K) + i) % tablesize - skip deleted items and occupied items not matching the key 49 is in 0 because - if the key is found, mark it as deleted. 9 was full ! Find: if the key is not at the hashed location, 58 is in 1 because keep searching sequentially for it. 8, 9, 0 were full - skip deleted items and occupied items not matching the key 69 is in 1 because 9, 0 were full ! Insert: Use Find, if the key is NOT in the list: Start the search again: - skip occupied items not matching the key - this way we can reuse the deleted spots 23 24 Collision Resolution: Implementation Separate chaining ! ! Use an array of linked lists for the hash table Array of linked lists implementation ! Each linked list contains all objects that hashed to that - The data structure: location class HashTable { private: - no collisions static const int SIZE = 100; struct Node { int key; node *nextNode; }; Hash function is still: Node* pTable[SIZE]; //array of pointers to Nodes h(K) = k % 10 . }; // constr should init all pointers in the array to NULL 25 26 Separate Chaining ! To insert a an object: - compute hash(k) - insert at front of list at that location (if empty, make first node) - (do not insert if already in the list) ! To find an object: - compute hash(k) - search the linked list there for the key of the object ! To delete an object: - compute hash(k) - search the linked list there for the key of the object 27 - if found, remove it.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us