Hash Function

Hash Function

Motivation Introduction to Algorithms z Arrayyps provide an indirect way to access a set. z Many times we need an association between Hash Tables two sets, or a set of keys and associated data. z Ideally we would like to access this data directly with the keys. CSE 680 z We would like a data structure that supports fast searchih, inserti on, and ddlti deletion. Prof. Roger Crawfis z Do not usually care about sorting. z The abstract data type is usually called a Dictionary or Partial Map z float googleStockPrice = stocks[“Goog”].CurrentPrice; Dictionaries Direct Addressing z What is the best wayyp to implement this? z Let’ s look at an easy case, suppose: z Linked Lists? z The range of keys is 0..m-1 z Double Linked Lists? z Keys are distinct z Queues? z Stacks? z Possible solution z Multiple indexed arrays (e.g., data[key[i]])? z Set up an array T[0..m-1] in which z To answer this, as k w ha ttht the comp lex ity o fthf the z T[i] = x if x∈ T and key[x] = i operations are: z T[i] = NULL otherwise z Insertion z This is called a direct-address table z Deletion z Operations take O(1) time! z Search z So what’s the problem? Direct Addressing Hash Table z Direct addressing works well when the z Hash Tables provide O(1) support for all range m of keys is relatively small of these operations! z But what if the keys are 32-bit integers? z The key is rather than index an array z Problem 1: direct-address table will have directly, index it through some function, 232 entries, more than 4 billion h(x), called a hash function. z Problem 2: even if memory is not an issue, the z myArray[ h(index) ] time to initialize the elements to NULL may be z Keyyq questions: z Solution: map keys to smaller range 0..p-1 z What is the set that the x comes from? z Desire p = O(m). z What is h() and what is its range? Hash Table Hash Functions z Consider this problem: z In ggpygpeneral a difficult problem. Try something simpler. z If I know a prior the m keys from some finite U 0 set U, is it possible to develop a function (universe of keys) h(x) that will uniquely map the m keys onto h(k1) k1 the set of numbers 0..m-1? h(k4) K k4 k5 (actual h(k ) = h(k ) keys) 2 5 k2 h(k ) k3 3 m-1 Hash Functions Hash Functions z A collision occurs when h(x) maps two keys to the z A hash function, h, mappys keys of a g iven t ype to same ltilocation. integers in a fixed interval [0, N − 1] U 0 z Example: (universe of keys) h(x) = x mod N collisionh(k1) is a hash function for integer keys k 1 h(k ) 4 z ThThee inintegerteger h(x) iiss cacalledlled tthehe hhashash vvaluealue ooff x. K k4 (actual k5 z A hash table for a given key type consists of h(k2) = h(k5) keys) z Hash function h z Array (called table) of size N k2 k h(k3) 3 z The goal is to store item (k, o) at index i = h(k) p -1 Example Example z We design a hash table 0 z Our hash table uses an 0 ∅ array ofif size N = 100. ∅ storing employees 1 1 025-612-0001 z We have n = 49 025-612-0001 records using their 2 2 981-101-0002 employees. 981-101-0002 social security number, 3 z Need a method to handle 3 SSN as the key. ∅ collisions. ∅ 4 4 451-229-0004 451-229-0004 z SSN is a nine-digit z As long as the chance … positive in teger … for co llis ion is low, we z Our hash table uses an can achieve this goal. 9997 z 9997 array of size N = 10,000 ∅ Setting N = 1000 and ∅ and the hash function 9998 looking at the last four 9998 200-751-9998 digits will reduce the 200-751-9998 h(x) = last four digits of x 9999 9999 176-354-9998 ∅ chance of collision. ∅ Collisions Chaining z Can collisions be avoided? z Chaining puts elements that hash to the z In general, no. See perfect hashing for the case same slot in a linked list: were the set of keys is static (not covered). U —— (universe of keys) k k —— z Two primary techniques for resolving 1 4 —— k collisions: 1 —— K k4 z Chaining – keep a collection at each key k5 —— (actual k k k k —— slot. keyy)s) 7 5 2 7 —— k z Open addressing – if the current slot is full k2 3 k8 k3 —— k6 use the next open one. k8 k6 —— —— Chaining Chaining z How do we insert an element? z How do we delete an element? z Do we need a doubly-linked list for efficient delete? U —— U —— (universe of keys) k1 k4 —— (universe of keys) k1 k4 —— —— —— k k 1 —— 1 —— k4 k4 K k5 —— K k5 —— (actual k k k k —— (actual k k k k —— keyy)s) 7 5 2 7 keyy)s) 7 5 2 7 —— —— k k k2 3 k2 3 k8 k3 —— k8 k3 —— k6 k6 k8 k6 —— k8 k6 —— —— —— Chaining Oppgen Addressing z How do we search for a element with a z Basic idea: z To insert: if slot is full, try another slot, …, until given key? T an opp(en slot is found (ppgrobing) U —— (universe of keys) k1 k4 —— z To search, follow same sequence of probes as —— would be used when inserting the element k1 —— z If reach element with correct key, return it k4 K k5 —— z If reach a NULL pointer, element is not in table (actual k k k k —— keys) 7 5 2 7 z Goo dfd for fidfixed se ts ( (ddiadding btbut no dlti)deletion) —— k k2 3 z Example: spell checking k8 k3 —— k6 k8 k6 —— —— Oppgen Addressing Probing z The colliding item is placed in a z They key question is what should the different cell of the table. next cell to try be? z No dynamic memory. z Random would be great, but we need to z Fixed Table size. z Load factor: n/N, where n is the number be able to repeat it. ofitf items t o st ore and dNth N the si ze of fth the h as h z Three common tec hn iques: table. z Linear Probing (useful for discussion only) z Cleary, n ≤ NorN, or n/N ≤ 1. z Quadratic Probing z To get a reasonable performance, n/N<0.5. z Double Hashing Linear Probing Search with Linear Probing z Consider a hash table A that Algorithm get(k) z Linear probing handles z Example: uses linear probing i ← h(k) collisions by placing the z h(x) = x mod 13 z get(k) p ← 0 colliding item in the next z We start at cell h(k) z Insert keys 18, 41, 22, 44, repeat (circularly) available table z We p robe con secutiv e 59, 32, 31, 73, in this locations until one of the c ← A[i] cell. order following occurs if c =∅ z z An item with key k is found, Each table cell inspected or return null is referred to as a probe. z AllifdAn empty cell is found, or eliflse if c.key () = k z N cells have been return c.element() z Colliding items lump 0123456789101112 unsuccessfully probed else together, causing future z To ensure the efficiency, if k collis ions to cause a is not in the table, we want to i ← (i + 1) mod N find an empty cell as soon as p ← p + 1 41 18 44 59 32 22 31 73 longer sequence of possible. The load factor can until p = N 0123456789101112 NOT be close to 1. probes. return null Linear Probing Uppgdates with Linear Probing z To handle insertions and z put(k, o) z Search for key=20 . z Example: dldelet ions, we intro duce a z We throw an exception if the z h(20)=20 mod 13 =7. z h(x) = x mod 13 special object, called table is full z Go through rank 8, 9, …, 12, z Insert keys 18, 41, 22, 44, AVAILABLE, which replaces z We start at cell h(k) 0. 59, 32, 31, 73, 12, 20 in deleted elements z We probe consecutive cells this order z remove(k) until one of the following z Search for key=15 occurs z We search for an entry with z h(()15)=15 mod 13=2. key k z A cell i is found that is either empty or stores z Go through rank 2, 3 and z If such an entry (k, o) is AVAILABLE, or found, we replace it with the return null. 0123456789101112 z N cells have been special item AVAILABLE unsuccessfully probed and we return element o z We store entry (k, o) in cell i z Have to modify other methods 20 41 18 44 59 32 22 31 73 12 to skip available cells. 0123456789101112 Quadratic Probing Quadratic Probing z Primary clustering occurs with linear z Suppose that an element should appear probing because the same linear pattern: in bin h: z if a bin is inside a cluster, then the next bin z if bin h is occupied, then check the following must either: sequence of bins: z also be in that cluster, or h +1+ 12, h +2+ 22, h +3+ 32, h +4+ 42, h +5+ 52, ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us