Separate Chaining Meets Compact Hashing

Total Page:16

File Type:pdf, Size:1020Kb

Separate Chaining Meets Compact Hashing Separate Chaining Meets Compact Hashing Dominik K¨oppl Department of Informatics, Kyushu University, Japan Society for Promotion of Science Abstract In fact, almost all modern replacements feature open addressing layouts. Their implementations While separate chaining is a common strategy for are highlighted with detailed benchmarks putting resolving collisions in a hash table taught in most separate chaining with unordered map as its major textbooks, compact hashing is a less common tech- representation in the backlight of interest. How- nique for saving space when hashing integers whose ever, when considering compact hashing with satel- domain is relatively small with respect to the prob- lite data, separate chaining becomes again a com- lem size. It is widely believed that hash tables petitive approach, on which we shed a light in this waste a considerable amount of memory, as they ei- article. ther leave allocated space untouched (open address- ing) or store additional pointers (separate chain- ing). For the former, Cleary introduced the com- 1.1 Related Work pact hashing technique that stores only a part of The hash table of Askitis [2] also resorts to separate a key to save space. However, as can be seen by chaining. Its buckets are represented as dynamic the line of research focusing on compact hash ta- arrays. On inserting an element into one of these bles with open addressing, there is additional in- array buckets, the size of the respective array incre- formation, called displacement, required for restor- ments by one (instead of, e.g., doubling its space). ing a key. There are several representations of this The approach differs from ours in that these arrays displacement information with different space and store a list of (key,value)-pairs while our buckets time trade-offs. In this article, we introduce a sep- separate keys from values. arate chaining hash table that applies the compact The scan of the buckets in a separate chaining hashing technique without the need for the dis- hash table can be accelerated with SIMD (single placement information. Practical evaluations re- instruction multiple data) instructions as shown by veal that insertions in this hash table are faster Ross [20] who studied the application of SIMD in- or use less space than all previously known com- structions for comparing multiple keys in parallel pact hash tables on modern computer architectures in a bucketized Cuckoo hash table. when storing sufficiently large satellite data. For reducing the memory requirement a of hash table, a sparse hash table layout was introduced 1 Introduction by members of Google2. Sparse hash tables are a arXiv:1905.00163v1 [cs.DS] 1 May 2019 lightweight alternative to standard open addressing A major layout decision for hash tables is hash tables, which are represented as plain arrays. how collisions are resolved. A well-studied Most of the sparse variants replace the plain array and easy-implementable layout is separate chain- with a bit vector of the same length marking posi- ing, which is also applied by the hash ta- + https://probablydance.com/2017/02/26/i-wrote-the- ble unordered map of the C+ standard library fastest-hashtable/, libstdc++ [4, Sect. 22.1.2.1.2]. On the downside, https://attractivechaos.wordpress.com/2018/10/01/advanced- it is often criticized for being bloated and slow1. techniques-to-implement-fast-hash-tables, https://tessil.github.io/2016/08/29/benchmark-hopscotch- 1Cf. http://www.idryman.org/blog/2017/05/03/writing- map.html, to name a few. a-damn-fast-hash-table-with-tiny-memory-footprints/, 2https://github.com/sparsehash/sparsehash 1 tions at which an element would be stored in the size. Choosing an adequate value for bmax is im- array. The array is emulated by this bit vector and portant, as it affects the resizing and the search its partitioning into buckets, which are dynamically time of our hash table. resizeable and store the actual data. Resize. When we try to insert an element into a The notion of compact hashing was coined by bucket of maximum size bmax, we create a new hash Cleary [5] who studied a hash table with bidirec- table with twice the number of buckets 2 H and j j tional linear probing. The idea of compact hashing move the elements from the old table to the new is to use an injective function mapping keys to pairs one, bucket by bucket. After a bucket of the old of integers. Using one integer, called remainder, as table becomes empty, we can free up its memory. a hash value, and the other, called quotient, as the This reduces the memory peak commonly seen in data stored in the hash table, the hash table can hash tables or dynamic vectors reserving one large restore a key by maintaining its quotient and an array, as these data structures need to reserve space information to retain its corresponding remainder. for 3m elements when resizing from m to 2m. This This information, called displacement, is crucial as technique is also common for sparse hash tables. the bidirectional linear probing displaces elements Search in Cache Lines. We can exploit mod- on a hash collision from the position correspond- ern computer architectures featuring large cache ing to its hash value, i.e., its remainder. Poyias sizes by selecting a sufficiently small bmax such that and Raman [19] gave different representations for buckets fit into a cache line. Since we are only in- the displacement in the case that the hash table terested in the keys of a bucket during a lookup, an applies linear probing. optimization is to store keys and values separately: In this paper, we show that it is not necessary In our hash table, a bucket is a composition of a to store additional data in case that we resort to key bucket and a value bucket, each of the same separate chaining as collision resolution. The main size. This gives a good locality of reference [7] for strength of our hash table is its memory-efficiency searching a key. This layout is favorable for large during the construction while being at least as fast values of bmax and (keys,value)-pairs where the key as other compact hash tables. Its main weakness is size is relatively small to the value size, since (a) the the slow lookup time for keys, as we do not strive cost for an extra pointer to maintain two buckets for small bucket sizes. instead of one becomes negligible while (b) more keys fit into a cache line when searching a key in a bucket. An overview of the proposed hash table 2 Separate Chaining with layout is given in Fig. 1. Compact Hashing 2.1 Compact Hashing Our hash table H has H buckets, where H is a Compact hashing restricts the choice of the hash j j j j power of two. Let h be the hash function of H. function h. It requires an injective transform f An element with key K is stored in the (h(K) mod that maps a key K to two integers (q; r) with H )-th bucket. To look up an element with key K, 1 r H , where r acts as the hash value h(K). j j the (h(K) mod H )-th bucket is linearly scanned. The≤ values≤ j j q and r are called quotient and re- j j A common implementation represents a bucket mainder, respectively. The quotient q can be used with a linked list, and tries to avoid collisions as to restore the original key K if we know its cor- they are a major cause for decelerating searches. responding remainder r. We translate this tech- Here, the buckets are realized as dynamic arrays, nique to our separate chaining layout by storing q similar to the array hash table of Askitis [2]. We as key in the r-th bucket on inserting a key K with further drop the idea of avoiding collisions. Instead, f(K) = (q; r). we want to maintain buckets of sufficiently large A discussion of different injective transforms is sizes to compensate the extra memory for main- given by Fischer and K¨oppl[11, Sect. 3.2]. Sup- taining (a) the pointers to the buckets and (b) their pose that all keys can be represented by k bits. We sizes. To prevent a bucket from growing too large, want to construct a bijective function f : [1::2k] k ! we introduce a threshold bmax for the maximum f([1::2 ]), where we use the last lg m bits for the 2 transform-type,value-bucket-type quotient-bucket-type bucket<transform::quotient-type> → hash map declare key-type : tranform-type::key-type item-type declare quotient-type : tranform-type::quotient-type value-bucket-type declare value-type : value-bucket-type::item-type bucket transform : transform-type buckets : uint8 t (storing lg H ) | | bucket sizes : uint8 t[2buckets] set(position : int, item-type) quotient buckets : quotient-bucket-type[2buckets] get(position : int): item-type value buckets : value-bucket-type[2buckets] insert(key-type, value-type) transform-type find(key-type): value-type user: hash map: transform: quotient buckets:[r] find(K) key-type transform f(K) (q, r) declare quotient-type declare remainder-type find i with get(i) = q v : quotient buckets[r][i] f(key-type):(quotient-type, remainder-type) 1 f − (quotient-type, remainder-type): key-type v Figure 1: Diagram of our proposed hash table. The call of find(K) returns the i-th element of the r-th bucket at which K is located. The injective transform determines the types of the key and the quotient.
Recommended publications
  • Pragmatic Quotient Types in Coq Cyril Cohen
    Pragmatic Quotient Types in Coq Cyril Cohen To cite this version: Cyril Cohen. Pragmatic Quotient Types in Coq. International Conference on Interactive Theorem Proving, Jul 2013, Rennes, France. pp.16. hal-01966714 HAL Id: hal-01966714 https://hal.inria.fr/hal-01966714 Submitted on 29 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Pragmatic Quotient Types in Coq Cyril Cohen Department of Computer Science and Engineering University of Gothenburg [email protected] Abstract. In intensional type theory, it is not always possible to form the quotient of a type by an equivalence relation. However, quotients are extremely useful when formalizing mathematics, especially in algebra. We provide a Coq library with a pragmatic approach in two complemen- tary components. First, we provide a framework to work with quotient types in an axiomatic manner. Second, we program construction mecha- nisms for some specific cases where it is possible to build a quotient type. This library was helpful in implementing the types of rational fractions, multivariate polynomials, field extensions and real algebraic numbers. Keywords: Quotient types, Formalization of mathematics, Coq Introduction In set-based mathematics, given some base set S and an equivalence ≡, one may see the quotient (S/ ≡) as the partition {π(x) | x ∈ S} of S into the sets π(x) ˆ= {y ∈ S | x ≡ y}, which are called equivalence classes.
    [Show full text]
  • Linear Probing with Constant Independence
    Linear Probing with Constant Independence Anna Pagh∗ Rasmus Pagh∗ Milan Ruˇzic´ ∗ ABSTRACT 1. INTRODUCTION Hashing with linear probing dates back to the 1950s, and Hashing with linear probing is perhaps the simplest algo- is among the most studied algorithms. In recent years it rithm for storing and accessing a set of keys that obtains has become one of the most important hash table organiza- nontrivial performance. Given a hash function h, a key x is tions since it uses the cache of modern computers very well. inserted in an array by searching for the first vacant array Unfortunately, previous analyses rely either on complicated position in the sequence h(x), h(x) + 1, h(x) + 2,... (Here, and space consuming hash functions, or on the unrealistic addition is modulo r, the size of the array.) Retrieval of a assumption of free access to a truly random hash function. key proceeds similarly, until either the key is found, or a Already Carter and Wegman, in their seminal paper on uni- vacant position is encountered, in which case the key is not versal hashing, raised the question of extending their anal- present in the data structure. Deletions can be performed ysis to linear probing. However, we show in this paper that by moving elements back in the probe sequence in a greedy linear probing using a pairwise independent family may have fashion (ensuring that no key x is moved beyond h(x)), until expected logarithmic cost per operation. On the positive a vacant array position is encountered. side, we show that 5-wise independence is enough to ensure Linear probing dates back to 1954, but was first analyzed constant expected time per operation.
    [Show full text]
  • CSC 344 – Algorithms and Complexity Why Search?
    CSC 344 – Algorithms and Complexity Lecture #5 – Searching Why Search? • Everyday life -We are always looking for something – in the yellow pages, universities, hairdressers • Computers can search for us • World wide web provides different searching mechanisms such as yahoo.com, bing.com, google.com • Spreadsheet – list of names – searching mechanism to find a name • Databases – use to search for a record • Searching thousands of records takes time the large number of comparisons slows the system Sequential Search • Best case? • Worst case? • Average case? Sequential Search int linearsearch(int x[], int n, int key) { int i; for (i = 0; i < n; i++) if (x[i] == key) return(i); return(-1); } Improved Sequential Search int linearsearch(int x[], int n, int key) { int i; //This assumes an ordered array for (i = 0; i < n && x[i] <= key; i++) if (x[i] == key) return(i); return(-1); } Binary Search (A Decrease and Conquer Algorithm) • Very efficient algorithm for searching in sorted array: – K vs A[0] . A[m] . A[n-1] • If K = A[m], stop (successful search); otherwise, continue searching by the same method in: – A[0..m-1] if K < A[m] – A[m+1..n-1] if K > A[m] Binary Search (A Decrease and Conquer Algorithm) l ← 0; r ← n-1 while l ≤ r do m ← (l+r)/2 if K = A[m] return m else if K < A[m] r ← m-1 else l ← m+1 return -1 Analysis of Binary Search • Time efficiency • Worst-case recurrence: – Cw (n) = 1 + Cw( n/2 ), Cw (1) = 1 solution: Cw(n) = log 2(n+1) 6 – This is VERY fast: e.g., Cw(10 ) = 20 • Optimal for searching a sorted array • Limitations: must be a sorted array (not linked list) binarySearch int binarySearch(int x[], int n, int key) { int low, high, mid; low = 0; high = n -1; while (low <= high) { mid = (low + high) / 2; if (x[mid] == key) return(mid); if (x[mid] > key) high = mid - 1; else low = mid + 1; } return(-1); } Searching Problem Problem: Given a (multi)set S of keys and a search key K, find an occurrence of K in S, if any.
    [Show full text]
  • Arithmetic in MIPS
    6 Arithmetic in MIPS Objectives After completing this lab you will: • know how to do integer arithmetic in MIPS • know how to do floating point arithmetic in MIPS • know about conversion from integer to floating point and from floating point to integer. Introduction The definition of the R2000 architecture includes all integer arithmetic within the actual CPU. Floating point arithmetic is done by one of the four possible coprocessors, namely coprocessor number 1. Integer arithmetic Addition and subtraction are performed on 32 bit numbers held in the general purpose registers (32 bit each). The result is a 32 bit number itself. Two kinds of instructions are included in the instruction set to do integer addition and subtraction: • instructions for signed arithmetic: the 32 bit numbers are considered to be represented in 2’s comple- ment. The execution of the instruction (addition or subtraction) may generate an overflow. • instructions for unsigned arithmetic: the 32 bit numbers are considered to be in standard binary repre- sentation. Executing the instruction will never generate an overflow error even if there is an actual overflow (the result cannot be represented with 32 bits). Multiplication and division may generate results that are larger than 32 bits. The architecture provides two special 32 bit registers that are the destination for multiplication and division instructions. These registers are called hi and lo as to indicate that they hold the higher 32 bits of the result and the lower 32 bits respectively. Special instructions are also provided to move data from these registers into the general purpose ones ($0 to $31).
    [Show full text]
  • Quotient Types: a Modular Approach
    Quotient Types: A Modular Approach Aleksey Nogin Department of Computer Science Cornell University, Ithaca, NY 14853 [email protected] Abstract. In this paper we introduce a new approach to axiomatizing quotient types in type theory. We suggest replacing the existing mono- lithic rule set by a modular set of rules for a specially chosen set of primitive operations. This modular formalization of quotient types turns out to be much easier to use and free of many limitations of the tradi- tional monolithic formalization. To illustrate the advantages of the new approach, we show how the type of collections (that is known to be very hard to formalize using traditional quotient types) can be naturally for- malized using the new primitives. We also show how modularity allows us to reuse one of the new primitives to simplify and enhance the rules for the set types. 1 Introduction NuPRL type theory differs from most other type theories used in theorem provers in its treatment of equality. In Coq’s Calculus of Constructions, for example, there is a single global equality relation which is not the desired one for many types (e.g. function types). The desired equalities have to be handled explicitly, which is quite burdensome. As in Martin-L¨of type theory [20] (of which NuPRL type theory is an extension), in NuPRL each type comes with its own equality relation (the extensional one in the case of functions), and the typing rules guar- antee that well–typed terms respect these equalities. Semantically, a quotient in NuPRL is trivial to define: it is simply a type with a new equality.
    [Show full text]
  • Mathematics for Computer Scientists (G51MCS) Lecture Notes Autumn 2005
    Mathematics for Computer Scientists (G51MCS) Lecture notes Autumn 2005 Thorsten Altenkirch October 26, 2005 Contents 1 1 Types and sets What is a set? Wikipedia 1 says Informally, a set is just a collection of objects considered as a whole. And what is a type? Wikipedia offers different definitions for different subject areas, but if we look up datatype in Computer Science we learn: In computer science, a datatype (often simply type) is a name or label for a set of values and some operations which can be performed on that set of values. Here the meaning of type refers back to the meaning of sets, but it is hard to see a substantial difference. We may conclude that what is called a set in Mathematics is called a type in Computer Science. Here, we will adopt the computer science terminology and use the term type. But remember to translate this to set when talking to a Mathematician. There is a good reason for this choice: I am going to use the type system of the programming language Haskell to illustrate many important constructions on types and to implement mathematical constructions on a computer. Please note that this course is not an introduction to Haskell, this is done in the 2nd semester in G51FUN. If you want to find out more about Haskell now, I refer to Graham Hutton’s excellent, forthcoming book, whose first seven chapters are available from his web page 2 . The book is going to be published soon and is the course text for G51FUN. Another good source for information about Haskell is the Haskell home page 3 While we are going to use Haskell to illustrate mathematical concepts, Haskell is also a very useful language for lots of other things, from designing user interfaces to controlling robots.
    [Show full text]
  • Implementing the Map ADT Outline
    Implementing the Map ADT Outline ´ The Map ADT ´ Implementation with Java Generics ´ A Hash Function ´ translation of a string key into an integer ´ Consider a few strategies for implementing a hash table ´ linear probing ´ quadratic probing ´ separate chaining hashing ´ OrderedMap using a binary search tree The Map ADT ´A Map models a searchable collection of key-value mappings ´A key is said to be “mapped” to a value ´Also known as: dictionary, associative array ´Main operations: insert, find, and delete Applications ´ Store large collections with fast operations ´ For a long time, Java only had Vector (think ArrayList), Stack, and Hashmap (now there are about 67) ´ Support certain algorithms ´ for example, probabilistic text generation in 127B ´ Store certain associations in meaningful ways ´ For example, to store connected rooms in Hunt the Wumpus in 335 The Map ADT ´A value is "mapped" to a unique key ´Need a key and a value to insert new mappings ´Only need the key to find mappings ´Only need the key to remove mappings 5 Key and Value ´With Java generics, you need to specify ´ the type of key ´ the type of value ´Here the key type is String and the value type is BankAccount Map<String, BankAccount> accounts = new HashMap<String, BankAccount>(); 6 put(key, value) get(key) ´Add new mappings (a key mapped to a value): Map<String, BankAccount> accounts = new TreeMap<String, BankAccount>(); accounts.put("M",); accounts.put("G", new BankAcnew BankAccount("Michel", 111.11)count("Georgie", 222.22)); accounts.put("R", new BankAccount("Daniel",
    [Show full text]
  • Quotient Types — a Modular Approach *,**
    Quotient Types — a Modular Approach ?;?? Aleksey Nogin Department of Computer Science Cornell University Ithaca, NY 14853 E-mail: [email protected] Web: http://nogin.org/ Abstract. In this paper we introduce a new approach to defining quo- tient types in type theory. We suggest replacing the existing monolithic rule set by a modular set of rules for a specially chosen set of primitive operations. This modular formalization of quotient types turns out to be very powerful and free of many limitations of the traditional monolithic formalization. To illustrate the advantages of the new formalization, we show how the type of collections (that is known to be very hard to formal- ize using traditional quotient types) can be naturally formalized using the new primitives. We also show how modularity allows us to reuse one of the new primitives to simplify and enhance the rules for the set types. 1 Introduction NuPRL type theory differs from most other type theories used in theorem provers in its treatment of equality. In Coq’s Calculus of Constructions, for example, there is a single global equality relation which is not the desired one for many types (e.g. function types). The desired equalities have to be handled explicitly, which is quite burdensome. As in Martin-L¨oftype theory [17] (of which NuPRL type theory is an extension), in NuPRL each type comes with its own equality relation (the extensional one in the case of functions), and the typing rules guar- antee that well–typed terms respect these equalities. Semantically, a quotient in NuPRL is trivial to define: it is simply a type with a new equality.
    [Show full text]
  • Quotient Types
    Quotient Types Peter V. Homeier U. S. Department of Defense, [email protected] http://www.cis.upenn.edu/~homeier Abstract. The quotient operation is a standard feature of set theory, where a set is divided into subsets by an equivalence relation; the re- sulting subsets of equivalent elements are called equivalence classes. We reinterpret this idea for Higher Order Logic (HOL), where types are di- vided by an equivalence relation to create new types, called quotient types. We present a tool for the Higher Order Logic theorem prover to mechanically construct quotient types as new types in the HOL logic, and also to de¯ne functions that map between the original type and the quotient type. Several properties of these mapping functions are proven automatically. Tools are also provided to create quotients of aggregate types, in particular lists, pairs, and sums. These require special atten- tion to preserve the aggregate structure across the quotient operation. We demonstrate these tools on an example from the sigma calculus. 1 Quotient Sets Quotient sets are a standard construction of set theory. They have found wide application in many areas of mathematics, including algebra and logic. The following description is abstracted from [3]. A binary relation on S is an equivalence relation if it is reflexive, symmetric, and transitive. » reflexive: x S: x x symmetric: 8x;2 y S: x » y y x transitive: 8x; y;2 z S: x » y ) y »z x z 8 2 » ^ » ) » Let be an equivalence relation. Then the equivalence class of x (modulo )is » » de¯ned as [x] def= y x y .
    [Show full text]
  • Structured Formal Development with Quotient Types in Isabelle/HOL
    Structured Formal Development with Quotient Types in Isabelle/HOL Maksym Bortin1 and Christoph Lüth2 1 Universität Bremen, Department of Mathematics and Computer Science [email protected] 2 Deutsches Forschungszentrum für Künstliche Intelligenz, Bremen [email protected] Abstract. General purpose theorem provers provide sophisticated proof methods, but lack some of the advanced structuring mechanisms found in specification languages. This paper builds on previous work extending the theorem prover Isabelle with such mechanisms. A way to build the quotient type over a given base type and an equivalence relation on it, and a generalised notion of folding over quotiented types is given as a formalised high-level step called a design tactic. The core of this paper are four axiomatic theories capturing the design tactic. The applicability is demonstrated by derivations of implementations for finite multisets and finite sets from lists in Isabelle. 1 Introduction Formal development of correct systems requires considerable design and proof effort in order to establish that an implementation meets the required specifi- cation. General purpose theorem provers provide powerful proof methods, but often lack the advanced structuring and design concepts found in specification languages, such as design tactics [20]. A design tactic represents formalised de- velopment knowledge. It is an abstract development pattern proven correct once and for all, saving proof effort when applying it and guiding the development process. If theorem provers can be extended with similar concepts without loss of consistency, the development process can be structured within the prover. This paper is a step in this direction. Building on previous work [2] where an approach to the extension of the theorem prover Isabelle [15] with theory morphisms has been described, the contributions of this paper are the representation of the well-known type quotienting construction and its extension with a generalised notion of folding over the quotiented type as a design tactic.
    [Show full text]
  • Hash Tables & Searching Algorithms
    Search Algorithms and Tables Chapter 11 Tables • A table, or dictionary, is an abstract data type whose data items are stored and retrieved according to a key value. • The items are called records. • Each record can have a number of data fields. • The data is ordered based on one of the fields, named the key field. • The record we are searching for has a key value that is called the target. • The table may be implemented using a variety of data structures: array, tree, heap, etc. Sequential Search public static int search(int[] a, int target) { int i = 0; boolean found = false; while ((i < a.length) && ! found) { if (a[i] == target) found = true; else i++; } if (found) return i; else return –1; } Sequential Search on Tables public static int search(someClass[] a, int target) { int i = 0; boolean found = false; while ((i < a.length) && !found){ if (a[i].getKey() == target) found = true; else i++; } if (found) return i; else return –1; } Sequential Search on N elements • Best Case Number of comparisons: 1 = O(1) • Average Case Number of comparisons: (1 + 2 + ... + N)/N = (N+1)/2 = O(N) • Worst Case Number of comparisons: N = O(N) Binary Search • Can be applied to any random-access data structure where the data elements are sorted. • Additional parameters: first – index of the first element to examine size – number of elements to search starting from the first element above Binary Search • Precondition: If size > 0, then the data structure must have size elements starting with the element denoted as the first element. In addition, these elements are sorted.
    [Show full text]
  • Pseudorandom Data and Universal Hashing∗
    CS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing∗ Tim Roughgardeny April 14, 2014 1 Motivation: Linear Probing and Universal Hashing This lecture discusses a very neat paper of Mitzenmacher and Vadhan [8], which proposes a robust measure of “sufficiently random data" and notes interesting consequences for hashing and some related applications. We consider hash functions from N = f0; 1gn to M = f0; 1gm. Canonically, m is much smaller than n. We abuse notation and use N; M to denote both the sets and the cardinalities of the sets. Since a hash function h : N ! M is effectively compressing a larger set into a smaller one, collisions (distinct elements x; y 2 N with h(x) = h(y)) are inevitable. There are many way of resolving collisions. One that is common in practice is linear probing, where given a data element x, one starts at the slot h(x), and then proceeds to h(x) + 1, h(x) + 2, etc. until a suitable slot is found. (Either an empty slot if the goal is to insert x; or a slot that contains x if the goal is to search for x.) The linear search wraps around the table (from slot M − 1 back to 0), if needed. Linear probing interacts well with caches and prefetching, which can be a big win in some application. Recall that every fixed hash function performs badly on some data set, since by the Pigeonhole Principle there is a large data set of elements with equal hash values. Thus the analysis of hashing always involves some kind of randomization, either in the input or in the hash function.
    [Show full text]