Hashing (Part 2)

Hashing (part 2) CSE 2011 Winter 2011 14 March 2011 1 Collision Handling Separate chaining Pbi(Probing (open a ddress ing ) Linear probing Quadratic probing Double hashing 2 1 Quadratic Probing Linear probing: Quadratic probing Insert item (k, e) A[i] is occupied i = h(k) Try A[(i+1) mod N]: used A[i] is occupied Try A[(i+22) mod N]: used Try A[(i+1) mod N]: used Try A[(i+32) mod N] Try A[(i+2) mod N] and so on and so on until an empty cell is found May not be able to find an empty cell if N is not prime, or the hash table is at least half full 3 Double Hashing Double hashing uses a Insert item (k, e) secondary hash function i = h(()k) d(k) and handles collisions A[i] is occupied by placing an item in the first available cell of the Try A[(i+d(k))mod N]: used series Try A[(i+2d(k))mod N]: used (i j d(k)) mod N Try A[(i+3d(k))mod N] for j 0, 1, … , N 1 and so on until The secondary hash an empty cell is found function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells 4 2 Example of Double Hashing kh(k ) d (k ) Probes Consider a hash 18 5 3 5 tbltable s tor ing itinteger 41 2 1 2 22 9 6 9 keys that handles 44 5 5 5 10 collision with double 59 7 4 7 32 6 3 6 hashing 315 4 590 N 13 73 8 4 8 h(k) k mod 13 d(k) 7 k mod 7 0123456789101112 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 31 41 18 32 59 73 22 44 0123456789101112 5 Double Hashing (2) d(k) should be chosen to minimize clustering Common choice of compression map for the secondary hash function: d(k) q k mod q where q N q is a prime The possible values for d(k) are 1, 2, … , q Note: linear probing has d(k) 1. 6 3 Comparing Collision Handling Schemes = n/N Unsuccessful: key not found Successful: key found 7 Comparing Collision Handling Schemes (2) Separate chaining: Linear probing: items are – simple implementation clustered into contiguous – faster than open runs (primary clustering). addressing in general – using more memory Quadratic probing: secondary clustering. Open addressing: – using less memory Double hashing: – slower than chaining in distributes keys more general uniformly than linear probing does. – more complex removals 8 4 Performance of Hashing In the worst case, searches, The expected running time insertions and removals on a of all the dictionary ADT hash table take O(n) time. operations in a hash table is The worst case occurs when O(1). all the keys inserted into the In practice, hashing is very map collide. fast provided the load factor The load factor nN is not close to 100%. affects the performance of a rehashing. hash table. Applications of hash tables: Assuming that the hash small databases values are like random numbers, it can be shown compilers that the expected number of browser caches probes for an insertion with converting non-integer open addressing is keys to integers. 1 (1 ) 9 Keys That Are Strings We need to convert a string to an integer before hashing. One option is to add up the ASCII values of the characters in the string. Is this a good strategy? Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits). x0 x1 … xn1 We evaluate the polynomial 2 n1 p(z) x0 x1 z x2 z … xn1z at a fixed value z, ignoring overflows. 10 5 Polynomial Accumulation Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p0(z) xn1 pi (z) xni1 zpi1(z) (i 1, 2, …, n 1) We have p(z) pn1(z) GdGood z values: 3337394133, 37, 39, 41. Especially suitable for strings z 33 gives at most 6 collisions on a set of 50,000 English words. 11 Summary Purpose of hash tables: If collision occurs, use to obtain O(()1) expected one of the collision query time using O(n+N) handling schemes, taking space. into account available If the keys are not memory space. integers, convert them to If the load factor nN integer keys. approaches the specified Map integer keys to the threshold, rehash. hash table entries using a compression map function. 12 6 Next time… Graphs (chapter 13) 13 7.

Hashing (Part 2)

Cmsc 132: Object-Oriented Programming Ii

Lecture 12: Hash Tables

Hash Tables & Searching Algorithms

Chapter 5 Hashing

Cuckoo Hashing

Double Hashing

Lecture 26 — Hash Tables

Linear Probing with Constant Independence

More Analysis of Double Hashing for Balanced Allocations

An Overview of Cuckoo Hashing Charles Chen

Hashing, Load Balancing and Multiple Choice

Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering