Collision Resolution Separate Chaining Open Hashing

Total Page:16

File Type:pdf, Size:1020Kb

Collision Resolution Separate Chaining Open Hashing Collision Resolution Separate Chaining Insert: 0 10 Collision: when two keys map to the same 22 1 location in the hash table. 107 2 12 3 42 Two ways to resolve collisions: 4 • Separate 1. Separate Chaining 5 chaining: All keys 2. Open Addressing (linear probing, 6 that map to the quadratic probing, double hashing) 7 same hash value 8 are kept in a list (or “bucket”). 9 Open Hashing (Chaining) Analysis of find • Defn: The load factor, λ, of a hash table 0 is the ratio: N ← no. of elements 1 a b M 2 • h(a) = h(b) and h(d) = h(g) ← table size 3 • Chains may be ordered or 4 d g unordered. Little advantage For separate chaining, λ = average # of 5 to ordering. 6 f elements in a bucket 7 8 c • Unsuccessful find cost: 9 • Successful find cost: Closed Hashing (Open How big should the hash table be? Addressing) • For Separate Chaining: • No chaining, every key fits in the hash table. • Probe sequence › h(k) › (h(k) + f(1)) mod HSize › (h(k) + f(2)) mod HSize , … • Insertion: Find the first probe with an empty slot. • Find: Find the first probe that equals the query or is empty. Stop at HSize probe, in any case. • Deletion: lazy deletion is needed. That is, mark locations as deleted, if a deleted key resides there. 1 Open Addressing Linear Probing Insert: 38 0 19 f(i) = i 1 8 109 2 10 • Probe sequence: 3 0th probe = h(k) mod TableSize 4 • Linear Probing: 1th probe = (h(k) + 1) mod TableSize 5 after checking 2th probe = (h(k) + 2) mod TableSize 6 spot h(k), try spot . h(k)+1, if that is th 7 full, try h(k)+2, i probe = (h(k) + i) mod TableSize 8 then h(k)+3, etc. 9 Terminology Alert! Linear Probing Example 76 93 40 47 10 55 “Open Hashing” “Closed Hashing” 0 0 0 0 47 0 47 0 47 equals equals 1 1 1 1 1 1 55 “Separate Chaining” “Open Addressing” 2 2 93 2 93 2 93 2 93 2 93 3 3 3 3 3 10 3 10 Weiss 4 4 4 4 4 4 5 5 5 40 5 40 5 40 5 40 6 76 6 76 6 76 6 76 6 76 6 76 Probes 1 1 1 3 1 3 Write pseudocode for find(k) for Linear Probing – Clustering Open Addressing with linear probing • Find(k) returns i where T(i) = k no collision collision in small cluster no collision collision in large cluster [R. Sedgewick] Student Activity 2 Quadratic Probing Less likely Load Factor in Linear Probing to encounter 2 Primary • For any λ < 1, linear probing will find an empty slot f(i) = i Clustering • Expected # of probes (for large table sizes) › successful search: • Probe sequence: 1 1 0th probe = h(k) mod TableSize 1+ 2 (1− λ) 1th probe = (h(k) + 1) mod TableSize › unsuccessful search: 2th probe = (h(k) + 4) mod TableSize 1 1 1 + 2 th 2 (1− λ) 3 probe = (h(k) + 9) mod TableSize . • Linear probing suffers from primary clustering ith probe = (h(k) + i2) mod TableSize • Performance quickly degrades for λ > 1/2 Exercise: Quadratic Probing Quadratic Probing Example 0 Insert: insert(76) insert(40) insert(48) insert(5) insert(55) 76%7 = 6 40%7 = 5 48%7 = 6 5%7 = 5 55%7 = 6 1 89 18 0 2 49 1 insert(47) But… 58 47%7 = 5 3 2 79 4 3 5 4 6 5 7 6 76 8 9 Quadratic Probing: Success guarantee for λ < 1/2 Quadratic Probing may Fail if λ > 1/2 • If size is prime and λ < 1/2, then quadratic probing 51 mod 7 = 2 ; i = 0 will find an empty slot in size/2 probes or fewer. 51 (2 + 1) mod 7 = 3; i = 1 › show for all 0 ≤ i,j < size/2 and i ≠ j 0 (3 + 3) mod 7 = 6; i = 2 (h(x) + i2) mod size ≠ (h(x) + j2) mod size 1 (6 + 5) mod 7 = 4; i = 3 › by contradiction: suppose that for some i ≠ j: 2 16 (h(x) + i2) mod size = (h(x) + j2) mod size (4 + 7) mod 7 = 4; i = 4 3 45 ⇒ i2 mod size = j2 mod size (4 + 9) mod 7 = 6; i = 5 2 2 4 59 ⇒ (i - j ) mod size = 0 (6 + 11) mod 7 = 3; i = 6 ⇒ [(i + j)(i - j)] mod size = 0 5 BUT size does not divide (i-j) or (i+j) 6 76 (3 + 13) mod 7 = 2, i = 7 … 3 Quadratic Probing: Properties Double Hashing f(i) = i * g(k) • For any λ < 1/2, quadratic probing will find an empty slot; for bigger λ, quadratic probing may find where g is a second hash function a slot • Probe sequence: • Quadratic probing does not suffer from primary th clustering: keys hashing to the same area are not 0 probe = h(k) mod TableSize bad 1th probe = (h(k) + g(k)) mod TableSize 2th probe = (h(k) + 2*g(k)) mod TableSize • But what about keys that hash to the same spot? th › Secondary Clustering! 3 probe = (h(k) + 3*g(k)) mod TableSize . ith probe = (h(k) + i*g(k)) mod TableSize Double Hashing Example Resolving Collisions with Double Hashing 0 Hash Functions: H(K) = K mod M h(k) = k mod 7 and g(k) = 5 – (k mod 5) 1 H2(K) = 1 + ((K/M) mod (M-1)) 2 M = 76 93 40 47 10 55 3 0 0 0 0 0 0 4 Insert these values into the hash table in this order. Resolve any collisions 1 1 1 1 47 1 47 1 47 5 with double hashing: 2 2 93 2 93 2 93 2 93 2 93 13 3 3 3 3 3 10 3 10 6 28 4 4 4 4 4 4 55 7 33 5 5 5 40 5 40 5 40 5 40 8 6 76 6 76 6 76 6 76 6 76 6 76 147 Probes 1 1 1 2 1 2 9 43 Double Hashing is Safe for λ < 1 Deletion in Hashing • Let h(k) = k mod p and g(k) = q – (k mod q) where 2 < • Open hashing (chaining) – no problem q < p and p and q are primes. The probe sequence • Closed hashing – must do lazy deletion. Deleted keys h(k) + ig(k) mod p probes every entry of the hash table. are marked as deleted. Let 0 < m < p, h = h(k), and g = g(k). We show that h+ig mod p = › Find: done normally m for some i. 0 < g < p, so g and p are relatively prime. By › Insert: treat marked slot as an empty slot and fill it. extended Euclid’s algorithm that are s and t such that sg + tp = 1. Choose i = (m-h)s mod p (h + ig) mod p = 0 0 Insert 30 h(k) = k mod 7 (h + (m-h)sg) mod p = 1 1 (h + (m-h)sg + (m-h)tp) mod p = Linear probing 2 16 2 16 (h + (m-h)(sg + tp) mod p = Find 59 3 23 3 30 (h + (m-h)) mod p = m mod p = m 4 59 4 59 5 5 6 76 6 76 4 Rehashing Rehashing Example Idea: When the table gets too full, create • Open hashing – h1(x) = x mod 5 rehashes to a bigger table (usually 2x as large) and h2(x) = x mod 11. hash all the items from the original table into the new table. 0 1 2 3 4 • When to rehash? λ = 1 25 37 83 › half full (λ = 0.5) 52 98 › when an insertion fails 0 1 2 3 4 5 6 7 8 9 10 › some other threshold λ = 5/11 • Cost of rehashing? 25 37 83 52 98 Amortized Analysis of Rehashing Picture Rehashing • Starting with table of size 2, double when • Cost of inserting n keys is < 3n load factor > 1. • 2k + 1 < n < 2k+1 › Hashes = n hashes › Rehashes = 2 + 22 + … + 2k = 2k+1 – 2 rehashes › Total = n + 2k+1 – 2 < 3n • Example › n = 33, Total = 33 + 64 –2 = 95 < 99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 Case Study Storage • Spelling Dictionary - 30,000 words • Assume word are stored as strings and entries in the • Goals arrays are pointers to the strings. › Fast spell checking › Minimal storage Binary search Open hashing Closed hashing • Possible solutions › Sorted array and binary search › Open hashing (chaining) › Closed hashing with linear probing • Notes › Almost all searches are successful › 30,000 word average 8 bytes per word, 240,000 bytes › Pointers are 4 bytes N pointers N/λ + 2N pointers N/λ pointers 5 Analysis Extendible Hashing • Binary Search • Extendible hashing is a technique for storing › Storage = N pointers + words = 360,000 bytes large data sets that do not fit in memory. › Time = log N < 15 probes in worst case 2 • An alternative to B-trees • Open hashing › Storage = 2N + N/ λ pointers + words 3 bits of hash value used λ = 1 implies 600,000 bytes 000 001 010 011 100 101 110 111 › Time = 1 + λ/2 probes per access In memory λ = 1 implies 1.5 probes per access • Closed hashing › Storage = N/ λ pointers + words (2) (2) (3) (3) (2) Pages λ = 1/2 implies 480,000 bytes 00001 01001 10001 10101 11001 00011 01011 10011 10110 11011 › Time = (1/2)(1+1/(1-λ)) probes 00100 01100 10111 11100 = 1/2 implies 1.5 probes per access λ 00110 11110 Splitting Rehashing 000 001 010 011 100 101 110 111 00 01 10 11 Insert 00111 (2) (2) (3) (3) (2) Insert 11000 (2) (2) (2) (2) 00001 01001 10001 10101 11001 00001 01001 10000 11101 00011 01011 10011 10110 11011 00011 01011 10001 11110 00100 01100 10111 11100 00100 01100 10011 00110 11110 00110 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 (2) (2) (3) (3) (3) (3) (3) (3) (2) (2) (2) 00001 01001 10001 10101 11000 11100 00001 00100 01001 10000 11101 00011 01011 10011 10110 11001 11110 00011 00110 01011 10001 11110 00100 01100 10111 11011 00111 01100 10011 00110 Fingerprints Fingerprint Math • Given a string x we want a fingerprint x’ with the properties.
Recommended publications
  • Cmsc 132: Object-Oriented Programming Ii
    CMSC 132: OBJECT-ORIENTED PROGRAMMING II Hashing Department of Computer Science University of Maryland, College Park © Department of Computer Science UMD Announcements • Video “What most schools don’t teach” • http://www.youtube.com/watch?v=nKIu9yen5nc © Department of Computer Science UMD Introduction • If you need to find a value in a list what is the most efficient way to perform the search? • Linear search • Binary search • Can we have O(1)? © Department of Computer Science UMD Hashing • Remember that modulus allows us to map a number to a range • X % N → X mapped to value between 0 and N - 1 • Suppose you have 4 parking spaces and need to assign each resident a space. How can we do it? parkingSpace(ssn) = ssn % 4 • Problems?? • What if two residents are assigned the same spot? Collission! • What if we want to use name instead of ssn? • Generate integer out of the name • We just described hashing © Department of Computer Science UMD Hashing • Hashing • Technique for storing key-value entries into an array • In Java we will have an array of Objects where each Object has a key (e.g., student’s name) and a reference to data of interest (e.g., student’s grades) • The array is called the hash table • Ideally can result in O(1) search times • Hash Function • Takes a search key (Ki) and returns a location in the array (an integer index (hash index)) • A search key maps (hashes) to index i • Ideal Hash Function • If every search key corresponds to a unique element in the hash table © Department of Computer Science UMD Hashing • If we have a large range of possible search keys, but a subset of them are used, allocating a large table would a waste of significant space • Typical hash function (two steps) 1.
    [Show full text]
  • Lecture 12: Hash Tables
    CSI 3334, Data Structures Lecture 12: Hash Tables Date: 2012-10-10 Author(s): Philip Spencer, Chris Martin Lecturer: Fredrik Niemelä This lecture covers hashing, hash functions, collisions and how to deal with them, and probability of error in bloom filters. 1 Introduction to Hash Tables Hash Functions take input and randomly sort the input and evenly distribute it. An example of a hash function is randu: 31 ni+1 = 65539 ¤ nimod2 randu has flaws in that if you give it an odd number, it will only return odd num- bers. Also, all numbers generated by randu fit in 15 planes. An example of a bad hash function: return 9 2 Dealing with Collisions Definition 2.1 Separate Chaining - Adds to a linked list when collisions occur. Positives of Separate Chaining: There is no upper-bound on size of table and it’s easy to implement. Negatives of Separate Chaining: It gets slower as you have more collisions. Definition 2.2 Linear Probing - If the space is full, take the next space. Negatives of Linear Probing: Things linearly gather making it likely to cluster and slow down. This is known as primary clustering. This also increases the likelihood of new things being added to the cluster. It breaks when the hash table is full. 1 2 CSI 3334 – Fall 2012 Definition 2.3 Quadratic Probing - If the space is full, move i2spaces: Positives of Quadratic Probing: Avoids the primary clustering from linear probing. Negatives of Quadratic Probing: It breaks when the hash table is full. Secondary clustering occurs when things follow the same quadratic path slowing down hashing.
    [Show full text]
  • Lecture 26 — Hash Tables
    Parallel and Sequential Data Structures and Algorithms — Lecture 26 15-210 (Spring 2012) Lecture 26 — Hash Tables Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Margaret Reid-Miller — 24 April 2012 Today: - Hashing - Hash Tables 1 Hashing hash: transitive verb1 1. (a) to chop (as meat and potatoes) into small pieces (b) confuse, muddle 2. ... This is the definition of hash from which the computer term was derived. The idea of hashing as originally conceived was to take values and to chop and mix them to the point that the original values are muddled. The term as used in computer science dates back to the early 1950’s. More formally the idea of hashing is to approximate a random function h : α β from a source (or universe) set U of type α to a destination set of type β. Most often the source! set is significantly larger than the destination set, so the function not only chops and mixes but also reduces. In fact the source set might have infinite size, such as all character strings, while the destination set always has finite size. Also the source set might consist of complicated elements, such as the set of all directed graphs, while the destination are typically the integers in some fixed range. Hash functions are therefore many to one functions. Using an actual randomly selected function from the set of all functions from α to β is typically not practical due to the number of such functions and hence the size (number of bits) needed to represent such a function.
    [Show full text]
  • Hashing (Part 2)
    Hashing (part 2) CSE 2011 Winter 2011 14 March 2011 1 Collision Handling Separate chaining Pbi(Probing (open a ddress ing ) Linear probing Quadratic probing Double hashing 2 1 Quadratic Probing Linear probing: Quadratic probing Insert item (k, e) A[i] is occupied i = h(k) Try A[(i+1) mod N]: used A[i] is occupied Try A[(i+22) mod N]: used Try A[(i+1) mod N]: used Try A[(i+32) mod N] Try A[(i+2) mod N] and so on and so on until an empty cell is found May not be able to find an empty cell if N is not prime, or the hash table is at least half full 3 Double Hashing Double hashing uses a Insert item (k, e) secondary hash function i = h(()k) d(k) and handles collisions A[i] is occupied by placing an item in the first available cell of the Try A[(i+d(k))mod N]: used series Try A[(i+2d(k))mod N]: used (i j d(k)) mod N Try A[(i+3d(k))mod N] for j 0, 1, … , N 1 and so on until The secondary hash an empty cell is found function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells 4 2 Example of Double Hashing kh(k ) d (k ) Probes Consider a hash 18 5 3 5 tbltable s tor ing itinteger 41 2 1 2 22 9 6 9 keys that handles 44 5 5 5 10 collision with double 59 7 4 7 32 6 3 6 hashing 315 4 590 N 13 73 8 4 8 h(k) k mod 13 d(k) 7 k mod 7 0123456789101112 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 31 41 18 32 59 73 22 44 0123456789101112 5 Double Hashing (2) d(k) should be chosen to minimize clustering Common choice of compression map for the secondary hash function: d(k) q k mod q where q N q is a prime The possible values for d(k) are 1, 2, … , q Note: linear probing has d(k) 1.
    [Show full text]
  • Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering
    Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering Michael A. Bender* Bradley C. Kuszmaul William Kuszmaul† Stony Brook Google Inc. MIT Abstract First introduced in 1954, the linear-probing hash table is among the oldest data structures in computer science, and thanks to its unrivaled data locality, linear probing continues to be one of the fastest hash tables in practice. It is widely believed and taught, however, that linear probing should never be used at high load factors; this is because of an effect known as primary clustering which causes insertions at a load factor of 1 − 1=x to take expected time Θ(x2) (rather than the intuitive running time of Θ(x)). The dangers of primary clustering, first discovered by Knuth in 1963, have now been taught to generations of computer scientists, and have influenced the design of some of the most widely used hash tables in production. We show that primary clustering is not the foregone conclusion that it is reputed to be. We demonstrate that seemingly small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions: if these design decisions are made correctly, then even if a hash table operates continuously at a load factor of 1 − Θ(1=x), the expected amortized cost per insertion/deletion is O~(x). This is because the tombstones left behind by deletions can actually cause an anti-clustering effect that combats primary clustering. Interestingly, these design decisions, despite their remarkable effects, have historically been viewed as simply implementation-level engineering choices.
    [Show full text]
  • Hashing and Hash Tables
    Hash tables Professors Clark F. Olson and Carol Zander Hash tables Binary search trees are data structures that allow us to perform many operations in O(log n) time on average for a collection of n objects (and balanced binary search trees can guarantee this in the worst case). However, if we are only concerned with insert/delete/retrieve operations (not sorting, searching, findMin, or findMax) then we can sometimes do even better. Hash tables are able to perform these operations in O(1) average time under some assumptions. The basic idea is to use a hash function that maps the items we need to store into positions in an array where it can be stored. Since ASCII characters use seven bits, we could hash all three‐letter strings according to the following function: hash(string3) = int(string3[0]) + int(string3[1]) * 128 + int(string3[2]) * 128 * 128 This guarantees that each three‐letter string is hashed to a different number. In the ideal case, you know every object that you could ever need to store in advance and you can devise a “perfect” hash function that maps each object to a different number and use each number from 0 to n‐1, where n is the number of objects that you need to store. There are cases where this can be achieved. One example is hashing keywords in a programming language. You know all of the n keywords in advance and sometimes a function can be determined that maps these perfectly into the n values from 0 to n‐1.
    [Show full text]
  • Hash Tables with Open Addressing
    Outline 1 Open Addressing Computer Science 331 2 Operations Hash Tables with Open Addressing Search Insert Delete Mike Jacobson 3 Collision Resolution Department of Computer Science University of Calgary Linear Probing Quadratic Probing Lecture #20 Double Hashing Analysis 4 Summary Mike Jacobson (University of Calgary) Computer Science 331 Lecture #20 1 / 26 Mike Jacobson (University of Calgary) Computer Science 331 Lecture #20 2 / 26 Open Addressing Open Addressing Open Addressing Example 0 1 2 3 4 5 6 7 T: NIL 25 2 NIL 12 NIL 14 22 In a hash table with open addressing, all elements are stored in the hash table itself. U = f1; 2;:::; 200g For 0 i < m, T [i] is either m = 8 an element of the dictionary being stored, T : as shown above NIL, or h0 : Function such that DELETED (to be described shortly!) h0 : f1; 2;:::; 200g ! f0; 1;:::; 7g Note: The textbook refers to DELETED as a \dummy value." Eg. h0(k) = k mod 8 for k 2 U: h0 used here for rst try to place key in table. Mike Jacobson (University of Calgary) Computer Science 331 Lecture #20 3 / 26 Mike Jacobson (University of Calgary) Computer Science 331 Lecture #20 4 / 26 Open Addressing Open Addressing New De nition of a Hash Function Probe Sequence We may need to make more than one attempt to nd a place to insert an The sequence of addresses element. hh(k; 0); h(k; 1);:::; h(k; m 1)i We'll use hash functions of the form is called the probe sequence for key k h : U ¢ f0; 1;:::; m 1g ! f0; 1;:::; m 1g Initial Requirement: h(k; i): Location to choose to place key k on an i th attempt to insert the key, if the locations examined on attempts hh(k; 0); h(k; 1);:::; h(k; m 1)i 0; 1;:::; i 1 were already full.
    [Show full text]
  • MA/CSSE 473 Day 28
    MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming MA/CSSE 473 Day 28 • Convex Hull Late Day until Saturday at 8 AM HW 11 is a good one to try to earn an extra late day. It is shorter than most assignments. • Take‐home exam available by Oct 29 (Friday) at 9:55 AM, due Nov 1 (Monday) at 8 AM. • Student Questions Convo schedule • Hashing summary Monday: •B‐Trees –a quick look Section 1: 9:35 AM •Dynamic Programming Section 2: 10:20 AM 1 Take‐Home Exam • Available no later than 9:55 AM on Friday October 29. Due 8 AM Monday, Nov 2 •Two parts, each with a time limit of 3‐4 hours –exact limit will be set after I finish writing questions. –Measured from time of ANGEL view of problems to submission back to ANGEL drop box. •Covers through HW 12 and Section 8.2. •A small number of problems (3‐5 in each part) •For most problems, partial credit for good ideas, even if you don't entirely get it. •A blackout on communicating with other students about this course during the entire period from exam availability to exam due time. Some Hashing Details •The next slides are from CSSE 230. •They are here in case you didn't "get it" the first time. •We will not go over all of them in detail in class. •If you don't understand the effect of clustering, you might find the animation that is linked from the slides especially helpful. 2 Collision Resolution: Linear Probing •When an item hashes to a table location occupied by a non‐equal item, simply use the next available space.
    [Show full text]
  • Comparative Analysis of Linear Probing, Quadratic Probing and Double
    International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 685 ISSN 2229-5518 COMPARATIVE ANALYSIS OF LINEAR PROBING, QUADRATIC PROBING AND DOUBLE HASHING TECHNIQUES FOR RESOLVING COLLUSION IN A HASH TABLE Saifullahi Aminu Bello1 Ahmed Mukhtar Liman2 Abubakar Sulaiman Gezawa3 Abdurra’uf Garba4 Abubakar Ado5 Abstract— Hash tables are very common data structures. They provide efficient key based operations to insert and search for data in containers. Like many other things in Computer Science, there are tradeoffs associated to the use of hash tables. They are not good choices when there is a need for sort and select operations. There are two main issues regarding the implementation of hash based containers: the hash function and the collision resolution mechanism. The hash function is responsible for the arithmetic operation that transforms a particular key into a particular table address. The collision resolution mechanism is responsible for dealing with keys that hash to the same address. In this research paper ways by which collision is resolved are implemented, comparison between them is made and conditions under which one techniques may be preferable than others are outlined. Index Terms— Double hashing, hash function , hash table, linear probing , load factor, open addressing, quadratic probing, —————————— —————————— 1 INTRODUCTION HE two main methods of collision resolution in hash ta- cost is nearly constant as the load factor increases from 0 up to T bles are are chaining (close addressing) and open address- 0.7 or so. Beyond that point, the probability of collisions and ing. The three main techniques under open addressing are the cost of handling them increases.
    [Show full text]
  • Lecture 12: Introduction to Hashing
    Data Structures in Java Lecture 12: Introduction to Hashing. 10/19/2015 Daniel Bauer Homework • Due Friday, 11:59pm. • Jarvis is now grading HW3. Recitation Sessions • Recitations this week: • Review of balanced search trees. • Implementing AVL rotations. • Implementing maps with BSTs. • Hashing (Friday/Next Mon & Tue). Midterm • Midterm next Wednesday (in-class) • Closed books/notes/electronic devices. • Ideally, bring a pen, water, and nothing else. • 60 minutes • Midterm review this Wednesday in class. How to Prepare? • Midterm will cover all content up to (and including) this week. • Know all ADTs, operations defined on them, data structures, running times. • Know basics of running time analysis (big-O). • Understand recursion, inductive proofs, tree traversals, … • Practice questions out today. Discussed Wednesday. • Good idea to review slides & homework! How to Prepare Even More? • Optional: • Solve Weiss textbook exercises and discuss on Piazza. • Try to implement data structures from scratch. Map ADT • A map is collection of (key, value) pairs. • Keys are unique, values need not be (keys are a Set!). • Two operations: • get(key) returns the value associated with this key • put(key, value) (overwrites existing keys) key1 value1 key2 value2 key3 value3 key4 Implementing Maps Implementing Maps • Option 1: Use any set implementation to store special (key,value) objects. • Comparing these objects means comparing the key (testing for equality or implementing the Comparable interface) Implementing Maps • Option 1: Use any set implementation to store special (key,value) objects. • Comparing these objects means comparing the key (testing for equality or implementing the Comparable interface) • Option 2: Specialized implementations • B+ Tree: nodes contain keys, leaves contain values.
    [Show full text]
  • Open Addressing
    Outline 1 Open Addressing Computer Science 331 2 Operations Hash Tables: Open Addressing Search Insert Delete Mike Jacobson 3 Collision Resolution Department of Computer Science University of Calgary Linear Probing Quadratic Probing Lecture #19 Double Hashing Analysis 4 Summary Mike Jacobson (University of Calgary) Computer Science 331 Lecture #19 1 / 25 Mike Jacobson (University of Calgary) Computer Science 331 Lecture #19 2 / 25 Open Addressing Open Addressing Open Addressing Example 0 1 2 3 4 5 6 7 T: NIL 25 2 NIL 12 NIL 14 22 In a hash table with open addressing, all elements are stored in the hash table itself. U = {1, 2,..., 200} For 0 ≤ i < m, T [i] is either m = 8 an element of the dictionary being stored, T : as shown above NIL, or h0 : Function such that DELETED (to be described shortly!) h0 : {1, 2,..., 200} → {0, 1,..., 7} Note: The textbook refers to DELETED as a “dummy value.” Eg. h0(k) = k mod 8 for k ∈ U. h0 used here for first try to place key in table. Mike Jacobson (University of Calgary) Computer Science 331 Lecture #19 3 / 25 Mike Jacobson (University of Calgary) Computer Science 331 Lecture #19 4 / 25 Open Addressing Open Addressing New Definition of a Hash Function Probe Sequence We may need to make more than one attempt to find a place to insert The sequence of addresses an element. hh(k, 0), h(k, 1),..., h(k, m − 1)i We’ll use hash functions of the form is called the probe sequence for key k h : U × {0, 1,..., m − 1} → {0, 1,..., m − 1} Nonstandard Requirement: some references, including Introduction h(k, i): Location to choose to place key k on an ith attempt to to Algorithms, require that insert the key, if the locations examined on attempts 0, 1,..., i − 1 were already full.
    [Show full text]
  • CMSC 420 – 0201 – Fall 2019 Lecture 11
    CMSC 420 – 0201 – Fall 2019 Lecture 11 Hashing – Handling Collisions Hashing - Recap . We store the keys in a table containing entries . We assume that the table size is at least a small constant factor larger than . We scatter the keys throughout the table using a pseudo-random hash function − [0 … 1] − Store at entry ( ) in the table ℎ ∈ − . = Sometimes differentℎ keys collide: , but ≠ ℎ ℎ 2 CMSC 420 – Dave Mount Hashing - Recap Defining issues . What is the hash function? Recall common methods: − Multiplicative hashing: ( ) = ( ) mod mod (for 0 and prime ) − Linear hashing: ( ) = ( + ) mod mod (for 0 and prime ) ℎ ≠ − Polynomial: = , , , , … , , = ( + + + + ) mod ℎ ≠ 2 3 − Universal hashing: 0 ,1( 2) =3 (( + ) mod 0) mod1 (where,2 3 and are random and is prime) ℎ ⋯ ℎ . How to resolve collisions? We will consider several methods: − Separate chaining − Linear probing − Quadratic probing − Double hashing 3 CMSC 420 – Dave Mount Separate Chaining . Given a hash table table[] with entries . table[i] stores a linked list containing the keys such that ( ) = ℎ 4 CMSC 420 – Dave Mount Separate Chaining Hash operations reduce to linked-list operations . insert(x, v): Compute i=h(x), invoke table[i].insert(x,v) . delete(x): Compute i=h(x), invoke table[i].delete(x) . find(x): Compute i=h(x), invoke table[i].find(x) 5 CMSC 420 – Dave Mount Separate Chaining Load factor and running time . Given a hash table table[m] containing entries . Define load factor: = . Assuming keys are uniformly distributed, there are on average entries per list . Expected search times: − Successful search (key found): Need to search half the list on average = 1 + − Unsuccessful search (key not found): Need to search entire list ⁄2 = 1 + 6 CMSC 420 – Dave Mount Controlling the Load Factor Rehashing .
    [Show full text]