7.7 Hashing 7.7 Hashing 7.7 Hashing 7.7 Hashing

Total Page:16

File Type:pdf, Size:1020Kb

7.7 Hashing 7.7 Hashing 7.7 Hashing 7.7 Hashing 7.7 Hashing 7.7 Hashing Dictionary: Definitions: ñ S.insert(x): Insert an element x. ñ Universe U of keys, e.g., U N0. U very large. ñ S.delete(x): Delete the element pointed to by x. ⊆ ñ Set S U of keys, S m n. ñ S.search(k): Return a pointer to an element e with ⊆ j j = ≤ key[e] k in S if it exists; otherwise return null. ñ Array T[0; : : : ; n 1] hash-table. = − ñ Hash function h : U [0; : : : ; n 1]. ! − So far we have implemented the search for a key by carefully choosing split-elements. The hash-function h should fulfill: ñ Fast to evaluate. Then the memory location of an object x with key k is determined ñ Small storage requirement. by successively comparing k to split-elements. ñ Good distribution of elements over the whole table. Hashing tries to directly compute the memory location from the given key. The goal is to have constant search time. EADS EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 201 c Ernst Mayr, Harald Räcke 202 7.7 Hashing 7.7 Hashing Ideally the hash function maps all keys to different memory Suppose that we know the set S of actual keys (no insert/no locations. delete). Then we may want to design a simple hash-function that maps all these keys to different memory locations. k1 k1 k1 k7 k7 k1 U k7 universe k6 k3 x k7 of keys U universe k6 k3 x k3 of keys k6 S (actual keys) k3 k6 This special case is known as Direct Addressing. It is usually very unrealistic as the universe of keys typically is quite large, and in Such a hash function h is called a perfect hash function for set S. particular larger than the available memory. EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 203 c Ernst Mayr, Harald Räcke 204 7.7 Hashing 7.7 Hashing Typically, collisions do not appear once the size of the set S of actual keys gets close to n, but already once S ω(pn). If we do not know the keys in advance, the best we can hope for j j ≥ is that the hash function distributes keys evenly across the table. Lemma 21 The probability of having a collision when hashing m elements Problem: Collisions into a table of size n under uniform hashing is at least Usually the universe U is much larger than the table-size n. m(m 1) m2 − 1 e− 2 1 e− 2n : − ≈ − Hence, there may be two elements k1; k2 from the set S that map to the same memory location (i.e., h(k1) h(k2)). This is called a = collision. Uniform hashing: Choose a hash function uniformly at random from all functions f : U [0; : : : ; n 1]. ! − EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 205 c Ernst Mayr, Harald Räcke 206 7.7 Hashing f (x) Proof. Let Am;n denote the event that inserting m keys into a table of size n does not generate a collision. Then m m 1 Y n ` 1 Y− j Pr[Am;n] − + 1 = n = − n ` 1 j 0 = = m 1 Y− Pm 1 j m(m 1) j=n j −0 n 2 − e− e− = e− n : ≤ = = j 0 = x fg(x) (x) 1e− x x Here the first equality follows since the `-th element that is = − n ` 1 hashed has a probability of − + to not generate a collision n under the condition that the previous elements did not induce collisions. x The inequality 1 x e− is derived by stopping the − ≤x tayler-expansion of e− after the second term. EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 207 c Ernst Mayr, Harald Räcke 208 Resolving Collisions Hashing with Chaining Arrange elements that map to the same position in a linear list. ñ Access: compute h(x) and search list for key[x]. ñ Insert: insert at the front of the list. The methods for dealing with collisions can be classified into the two main types k1 k4 ñ open addressing, aka. closed hashing k1 ñ hashing with chaining. aka. closed addressing, open k4 k5 k5 k2 k7 k7 hashing. U k2 universe k6 k3 k8 x of keys S (actual keys) k3 k8 k6 EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 209 c Ernst Mayr, Harald Räcke 210 7.7 Hashing Hashing with Chaining Let A denote a strategy for resolving collisions. We use the following notation: The time required for an unsuccessful search is 1 plus the length of the list that is examined. The average length of a list is α m . ñ A+ denotes the average time for a successful search when = n using A; Hence, if A is the collision resolving strategy “Hashing with Chaining” we have ñ A− denotes the average time for an unsuccessful search A− 1 α : when using A; = + ñ We parameterize the complexity results in terms of α : m , = n the so-called fill factor of the hash-table. Note that this result does not depend on the hash-function that is used. We assume uniform hashing for the following analysis. EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 211 c Ernst Mayr, Harald Räcke 212 Hashing with Chaining Hashing with Chaining For a successful search observe that we do not choose a list at random, but we consider a random key in the hash-table and k m m m m 1 X X 1 X X ask for the search-time for k. E 1 Xij 1 E Xij m + = m + i 1 j i 1 i 1 j i 1 = = + = = + This is 1 plus the number of elements that lie before k in k’s list. 1 Xm Xm 1 1 = m + n Let k` denote the `-th key inserted into the table. i 1 j i 1 = = + 1 Xm Let for two keys ki and kj, Xij denote the event that i and j hash 1 (m i) = + mn − to the same position. Clearly, Pr[X 1] 1=n for uniform i 1 ij = = = hashing. 1 m(m 1) 1 m2 + = + mn − 2 The expected successful search cost is m 1 α α keys before ki 1 − 1 : m m = + 2n = + 2 − 2m 1 X X E 1 Xij m + Hence, the expected cost for a successful search is A 1 α . i 1 j i 1 + ≤ + 2 = = + cost for key ki EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 213 c Ernst Mayr, Harald Räcke 214 Open Addressing Open Addressing All objects are stored in the table itself. Define a function h(k, j) that determines the table-position to be Choices for h(k, j): examined in the -th step. The values 0 ,. , 1 form j h(k, ) h(k, n ) ñ h(k, i) h(k) i mod n. Linear probing. − = + a permutation of 0,..., n 1. 2 − ñ h(k, i) h(k) c1i c2i mod n. Quadratic probing. = + + Search : Try position 0 ; if it is empty your search fails; ñ h(k, i) h1(k) ih2(k) mod n. Double hashing. (k) h(k, ) = + otw. continue with h(k, 1), h(k, 2),.... For quadratic probing and double hashing one has to ensure that Insert(x): Search until you find an empty slot; insert your the search covers all positions in the table (i.e., for double element there. If your search reaches h(k, n 1), and this slot is hashing h2(k) must be relatively prime to n; for quadratic − non-empty then your table is full. probing c1 and c2 have to be chosen carefully). EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 215 c Ernst Mayr, Harald Räcke 216 Linear Probing Quadratic Probing ñ Advantage: Cache-efficiency. The new probe position is very ñ Not as cache-efficient as Linear Probing. likely to be in the cache. ñ Secondary clustering: caused by the fact that all keys ñ Disadvantage: Primary clustering. Long sequences of mapped to the same position have the same probe sequence. occupied table-positions get longer as they have a larger probability to be hit. Furthermore, they can merge forming larger sequences. Lemma 23 Let Q be the method of quadratic probing for resolving collisions: Lemma 22 1 α Q+ 1 ln ≈ + 1 α − 2 Let L be the method of linear probing for resolving collisions: − 1 1 1 1 L+ 1 Q− ln α ≈ 2 + 1 α ≈ 1 α + 1 α − − − − 1 1 L− 1 ≈ 2 + (1 α)2 − EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 217 c Ernst Mayr, Harald Räcke 218 Double Hashing 7.7 Hashing ñ Any probe into the hash-table usually creates a cash-miss. Lemma 24 Some values: Let A be the method of double hashing for resolving collisions: α Linear Probing Quadratic Probing Double Hashing 1 1 L+ L− Q+ Q− D+ D− D+ ln ≈ α 1 α 0.5 1.5 2.5 1.44 2.19 1.39 2 − 0.9 5.5 50.5 2.85 11.40 2.55 10 1 0.95 10.5 200.5 3.52 22.05 3.15 20 D− ≈ 1 α − EADS 7.7 Hashing EADS 7.7 Hashing c Ernst Mayr, Harald Räcke 219 c Ernst Mayr, Harald Räcke 220 i Pr[X i] = 7.7 Hashing Analysis of Idealized Open Address Hashing #probes Let X denote a random variable describing the number of probes in an unsuccessful search.
Recommended publications
  • SAHA: a String Adaptive Hash Table for Analytical Databases
    applied sciences Article SAHA: A String Adaptive Hash Table for Analytical Databases Tianqi Zheng 1,2,* , Zhibin Zhang 1 and Xueqi Cheng 1,2 1 CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; [email protected] (Z.Z.); [email protected] (X.C.) 2 University of Chinese Academy of Sciences, Beijing 100049, China * Correspondence: [email protected] Received: 3 February 2020; Accepted: 9 March 2020; Published: 11 March 2020 Abstract: Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google’s SwissTable and Facebook’s F14Table. It has been merged into the ClickHouse database and shows promising results in production. Keywords: hash table; analytical database; string data 1.
    [Show full text]
  • Chapter 5 Hashing
    Introduction hashing performs basic operations, such as insertion, Chapter 5 deletion, and finds in average time Hashing 2 Hashing Hashing Functions a hash table is merely an of some fixed size let be the set of search keys hashing converts into locations in a hash hash functions map into the set of in the table hash table searching on the key becomes something like array lookup ideally, distributes over the slots of the hash table, to minimize collisions hashing is typically a many-to-one map: multiple keys are if we are hashing items, we want the number of items mapped to the same array index hashed to each location to be close to mapping multiple keys to the same position results in a example: Library of Congress Classification System ____________ that must be resolved hash function if we look at the first part of the call numbers two parts to hashing: (e.g., E470, PN1995) a hash function, which transforms keys into array indices collision resolution involves going to the stacks and a collision resolution procedure looking through the books almost all of CS is hashed to QA75 and QA76 (BAD) 3 4 Hashing Functions Hashing Functions suppose we are storing a set of nonnegative integers we can also use the hash function below for floating point given , we can obtain hash values between 0 and 1 with the numbers if we interpret the bits as an ______________ hash function two ways to do this in C, assuming long int and double when is divided by types have the same length fast operation, but we need to be careful when choosing first method uses
    [Show full text]
  • Chapter 5. Hash Tables
    CSci 335 Software Design and Analysis 3 Prof. Stewart Weiss Chapter 5 Hashing and Hash Tables Hashing and Hash Tables 1 Introduction A hash table is a look-up table that, when designed well, has nearly O(1) average running time for a nd or insert operation. More precisely, a hash table is an array of xed size containing data items with unique keys, together with a function called a hash function that maps keys to indices in the table. For example, if the keys are integers and the hash table is an array of size 127, then the function hash(x), dened by hash(x) = x%127 maps numbers to their modulus in the nite eld of size 127. Key Space Hash Function Hash Table (values) 0 1 Japan 2 3 Rome 4 Canada 5 Tokyo 6 Ottawa Italy 7 Figure 1: A hash table for world capitals. Conceptually, a hash table is a much more general structure. It can be thought of as a table H containing a collection of (key, value) pairs with the property that H may be indexed by the key itself. In other words, whereas you usually reference an element of an array A by writing something like A[i], using an integer index value i, with a hash table, you replace the index value "i" by the key contained in location i in order to get the value associated with it. For example, we could conceptualize a hash table containing world capitals as a table named Capitals that contains the set of pairs (Italy, Rome), (Japan, Tokyo), (Canada, Ottawa) and so on, and conceptually (not actually) you could write a statement such as print Capitals[Italy] This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
    [Show full text]
  • Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering
    Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering Michael A. Bender* Bradley C. Kuszmaul William Kuszmaul† Stony Brook Google Inc. MIT Abstract First introduced in 1954, the linear-probing hash table is among the oldest data structures in computer science, and thanks to its unrivaled data locality, linear probing continues to be one of the fastest hash tables in practice. It is widely believed and taught, however, that linear probing should never be used at high load factors; this is because of an effect known as primary clustering which causes insertions at a load factor of 1 − 1=x to take expected time Θ(x2) (rather than the intuitive running time of Θ(x)). The dangers of primary clustering, first discovered by Knuth in 1963, have now been taught to generations of computer scientists, and have influenced the design of some of the most widely used hash tables in production. We show that primary clustering is not the foregone conclusion that it is reputed to be. We demonstrate that seemingly small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions: if these design decisions are made correctly, then even if a hash table operates continuously at a load factor of 1 − Θ(1=x), the expected amortized cost per insertion/deletion is O~(x). This is because the tombstones left behind by deletions can actually cause an anti-clustering effect that combats primary clustering. Interestingly, these design decisions, despite their remarkable effects, have historically been viewed as simply implementation-level engineering choices.
    [Show full text]
  • Hash Tables II Parallelism Announcements
    Data Structures and Hash Tables II Parallelism Announcements Exercise 4 due TODAY. We’ll try to have it graded before the midterm. Section is midterm review. P2 Checkpoint is a week away Today Collision Resolution part II: Open Addressing General Purpose hashCode() int result = 17; // start at a prime foreach field f int fieldHashcode = boolean: (f ? 1: 0) byte, char, short, int: (int) f long: (int) (f ^ (f >>> 32)) float: Float.floatToIntBits(f) double: Double.doubleToLongBits(f), then above Object: object.hashCode( ) result = 31 * result + fieldHashcode; return result; Collision Resolution Last time: Separate Chaining when you have a collision, stuff everything into that spot Using a data structure. Today: Open Addressing If the spot is full, go somewhere else. Where? How do we find the elements later? Linear Probing First idea: linear probing h(key) % TableSize full? Try (h(key) + 1) % TableSize. Also full? (h(key) + 2) % TableSize. Also full? (h(key) + 3) % TableSize. Also full? (h(key) + 4) % TableSize. … Linear Probing Example Insert the hashes: 38, 19, 8, 109, 10 into an empty hash table of size 10. 0 1 2 3 4 5 6 7 8 9 8 109 10 38 19 How Does Delete Work? Just find the key and remove it, what’s the problem? How do we know if we should keep probing on a find? Delete 109 and call find on 10. If we delete something placed via probe we won’t be able to tell! If you’re using open addressing, you have to use lazy deletion. How Long Does Insert Take? If 휆 < 1 we’ll find a spot eventually.
    [Show full text]
  • Robin Hood Hashing
    COP4530 Notes Summer 2019 TBD 1 Hashing Hashing is an old and time-honored method of accessing key/value pairs that have fairly random keys. The idea here is to create an array with a fixed size of N, and then “hash” the key to compute an index into the array. Ideally the hashing function should “spatter” the bits fairly evenly over the range N. 2 Hash functions As Weiss remarks, if the keys are just integers that are reasonably randomly distributed, simply using the mod function (key % N) (where N is prime) should work reasonably well. His example of a bad function would be one that has a table with 10 elements and keys that all end in 0, thus all computing to the same index of 0. For strings, Weiss points out that could just add the bytes in the string together into a sum and then again compute sum % N. This generalizes for any arbitrary key type; just add the actual bytes of the key in the same fashion. Another possibility is to use XOR rather the sum (though your XOR needs to be multibyte, with enough bytes to successfully get to the maximum range value.) You still have undesirable properties in the distribution of bits in most string coding schemes; ASCII leaves the high bit at zero, and UTF-8 unifor- mally sets the high bit for any code point outside of ASCII. This can be cured; C++ provides XOR-based hashing via FNV hashing as a standard option. Weiss gives this example for a good string hash function: 1 for(char c: key) 2 { 3 hash_value = 37 * hash_value+c; 4 } 5 return hash_value% table_size; 3 Collision resolution This is all well and good, but what happens when two dierent keys compute the same hash index? This is called a “collision” (and this term is used throughout the wider literature on hashing, and not just for the data structure hash table.) Resolving hashes, along with choosing hash functions, is among the primary concerns in designing hash tables.
    [Show full text]
  • Hashing: Part 2 332
    Adam BlankLecture 11 Winter 2017 CSE 332: Data Structures and Parallelism CSE Hashing: Part 2 332 Data Structures and Parallelism HashTable Review 1 Hashing Choices 2 Hash Tables 1 Choose a hash function Provides 1 core Dictionary operations (on average) 2 Choose a table size We call the key space the “universe”: U and the Hash Table T O( ) We should use this data structure only when we expect U T 3 Choose a collision resolution strategy (Or, the key space is non-integer values.) S S >> S S Separate Chaining Linear Probing Quadratic Probing Double Hashing hash function mod T collision? E int Table Index Fixed Table Index Other issues to consider: S S ÐÐÐÐÐÐÐ→Hash Table Client ÐÐÐÐ→ HashÐÐÐÐÐ→ Table Library 4 Choose an implementation of deletion ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ 5 Choose a λ that means the table is “too full” Another Consideration? What do we do when λ (the load factor) gets too large? We discussed the first few of these last time. We’ll discuss the rest today. Review: Collisions 3 Open Addressing 4 Definition (Collision) Definition (Open Addressing) A collision is when two distinct keys map to the same location in the Open Addressing is a type of collision resolution strategy that resolves hash table. collisions by choosing a different location when the natural choice is full. A good hash function attempts to avoid as many collisions as possible, There are many types of open addressing. Here’s the key ideas: but they are inevitable. We must be able to duplicate the path we took. How do we deal with collisions? We want to use all the spaces in the table.
    [Show full text]
  • Data Structures
    Data structures PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 17 Nov 2011 20:55:22 UTC Contents Articles Introduction 1 Data structure 1 Linked data structure 3 Succinct data structure 5 Implicit data structure 7 Compressed data structure 8 Search data structure 9 Persistent data structure 11 Concurrent data structure 15 Abstract data types 18 Abstract data type 18 List 26 Stack 29 Queue 57 Deque 60 Priority queue 63 Map 67 Bidirectional map 70 Multimap 71 Set 72 Tree 76 Arrays 79 Array data structure 79 Row-major order 84 Dope vector 86 Iliffe vector 87 Dynamic array 88 Hashed array tree 91 Gap buffer 92 Circular buffer 94 Sparse array 109 Bit array 110 Bitboard 115 Parallel array 119 Lookup table 121 Lists 127 Linked list 127 XOR linked list 143 Unrolled linked list 145 VList 147 Skip list 149 Self-organizing list 154 Binary trees 158 Binary tree 158 Binary search tree 166 Self-balancing binary search tree 176 Tree rotation 178 Weight-balanced tree 181 Threaded binary tree 182 AVL tree 188 Red-black tree 192 AA tree 207 Scapegoat tree 212 Splay tree 216 T-tree 230 Rope 233 Top Trees 238 Tango Trees 242 van Emde Boas tree 264 Cartesian tree 268 Treap 273 B-trees 276 B-tree 276 B+ tree 287 Dancing tree 291 2-3 tree 292 2-3-4 tree 293 Queaps 295 Fusion tree 299 Bx-tree 299 Heaps 303 Heap 303 Binary heap 305 Binomial heap 311 Fibonacci heap 316 2-3 heap 321 Pairing heap 321 Beap 324 Leftist tree 325 Skew heap 328 Soft heap 331 d-ary heap 333 Tries 335 Trie
    [Show full text]
  • Fundamental Data Structures
    Fundamental Data Structures PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 17 Nov 2011 21:36:24 UTC Contents Articles Introduction 1 Abstract data type 1 Data structure 9 Analysis of algorithms 11 Amortized analysis 16 Accounting method 18 Potential method 20 Sequences 22 Array data type 22 Array data structure 26 Dynamic array 31 Linked list 34 Doubly linked list 50 Stack (abstract data type) 54 Queue (abstract data type) 82 Double-ended queue 85 Circular buffer 88 Dictionaries 103 Associative array 103 Association list 106 Hash table 107 Linear probing 120 Quadratic probing 121 Double hashing 125 Cuckoo hashing 126 Hopscotch hashing 130 Hash function 131 Perfect hash function 140 Universal hashing 141 K-independent hashing 146 Tabulation hashing 147 Cryptographic hash function 150 Sets 157 Set (abstract data type) 157 Bit array 161 Bloom filter 166 MinHash 176 Disjoint-set data structure 179 Partition refinement 183 Priority queues 185 Priority queue 185 Heap (data structure) 190 Binary heap 192 d-ary heap 198 Binomial heap 200 Fibonacci heap 205 Pairing heap 210 Double-ended priority queue 213 Soft heap 218 Successors and neighbors 221 Binary search algorithm 221 Binary search tree 228 Random binary tree 238 Tree rotation 241 Self-balancing binary search tree 244 Treap 246 AVL tree 249 Red–black tree 253 Scapegoat tree 268 Splay tree 272 Tango tree 286 Skip list 308 B-tree 314 B+ tree 325 Integer and string searching 330 Trie 330 Radix tree 337 Directed acyclic word graph 339 Suffix tree 341 Suffix array 346 van Emde Boas tree 349 Fusion tree 353 References Article Sources and Contributors 354 Image Sources, Licenses and Contributors 359 Article Licenses License 362 1 Introduction Abstract data type In computing, an abstract data type (ADT) is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics.
    [Show full text]
  • Choosing Best Hashing Strategies and Hash Functions
    Choosing Best Hashing Strategies and Hash Functions Thesis submitted in partial fulfillment of the requirements for the award of Degree of Master of Engineering in Software Engineering By: Name: Mahima Singh Roll No: 80731011 Under the supervision of: Dr. Deepak Garg Assistant Professor, CSED & Mr. Ravinder Kumar Lecturer, CSED COMPUTER SCIENCE AND ENGINEERING DEPARTMENT THAPAR UNIVERSITY PATIALA – 147004 JUNE 2009 Certificate I hereby certify that the work which is being presented in the thesis report entitled, “Choosing Best Hashing Strategies and Hash Functions”, submitted by me in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer Science and Engineering submitted in Computer Science and Engineering Department of Thapar University, Patiala, is an authentic record of my own work carried out under the supervision of Dr. Deepak Garg and Mr. Ravinder Kumar and refers other researcher’s works which are duly listed in the reference section. The matter presented in this thesis has not been submitted for the award of any other degree of this or any other university. (Mahima Singh) This is to certify that the above statement made by the candidate is correct and true to the best of my knowledge. (Dr. Deepak Garg) Assistant Professor, CSED Thapar University Patiala And (Mr. Ravinder Kumar) Lecturer, CSED Thapar University Patiala Countersigned by: Dr. Rajesh Kumar Bhatia (Dr. R.K.SHARMA) Assistant Professor & Head Dean (Academic Affairs) Computer Science & Engineering. Department Thapar University, Thapar University Patiala. Patiala. i Acknowledgement No volume of words is enough to express my gratitude towards my guide, Dr. Deepak Garg Assistant Professor.
    [Show full text]
  • Hashing Dynamic Dictionaries
    Hashing Dynamic Dictionaries Operations: • create • insert • find • remove • max/ min • write out in sorted order Only defined for object classes that are Comparable Hash tables Operations: • create • insert • find • remove • max/ min • write out in sorted order Only defined for object classes that are Comparable have equals defined Hash tables Operations: Java specific: From the Java • create documentation • insert • find • remove • max/ min • write out in sorted order Only defined for object classes that are Comparable have equals defined Hash tables – implementation • Have a table (an array) of a fixed tableSize • A hash function determines where in this table each item should be stored 2174 % 10 = 4 hash(item)item % tableSize [a positive integer] THE DESIGN QUESTIONS 1. Choosing tableSize 2. Choosing a hash function 3. What to do when a collision occurs Hash tables – tableSize • Should depend on the (maximum) number of values to be stored • Let λ = [number of values stored]/ tableSize • Load factor of the hash table • Restrict λ to be at most 1 (or ½) • Require tableSize to be a prime number • to “randomize” away any patterns that may arise in the hash function values • The prime should be of the form (4k+3) [for reasons to be detailed later] Hash tables – the hash function If the objects to be stored have integer keys (e.g., student IDs) hash(k) = k is generally OK, unless the keys have “patterns” Otherwise, some “randomized” way to obtain an integer Hash tables – the hash function If the objects to be stored have integer keys (e.g.,
    [Show full text]
  • Chapter 19 Hashing
    Chapter 19 Hashing hash: transitive verb1 1. (a) to chop (as meat and potatoes) into small pieces (b) confuse, muddle 2. ... This is the definition of hash from which the computer term was derived. The idea of hashing as originally conceived was to take values and to chop and mix them to the point that the original values are muddled. The term as used in computer science dates back to the early 1950’s. More formally the idea of hashing is to approximate a random function h : α ! β from a source (or universe) set U of type α to a destination set of type β. Most often the source set is significantly larger than the destination set, so the function not only chops and mixes but also reduces. In fact the source set might have infinite size, such as all character strings, while the destination set always has finite size. Also the source set might consist of complicated elements, such as the set of all directed graphs, while the destination are typically the integers in some fixed range. Hash functions are therefore many to one functions. Using an actual randomly selected function from the set of all functions from α to β is typically not practical due to the number of such functions and hence the size (number of bits) needed to represent such a function. Therefore in practice one uses some pseudorandom function. Exercise 19.1. How many hash functions are there that map from a source set of size n to the integers from 1 to m? How many bits does it take to represent them? What if the source set consists of character strings of length up to 20? Assume there are 100 possible characters.
    [Show full text]