Concise Algorithmics Or Algorithms and Data Structures — the Basic Toolbox Or

Total Page:16

File Type:pdf, Size:1020Kb

Concise Algorithmics Or Algorithms and Data Structures — the Basic Toolbox Or Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf Concise Algorithmics or Algorithms and Data Structures — The Basic Toolbox or. Kurt Mehlhorn and Peter Sanders Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf Mehlhorn, Sanders June 11, 2005 iii Foreword Buy me not [25]. iv Mehlhorn, Sanders June 11, 2005 v Contents 1 Amuse Geule: Integer Arithmetics 3 1.1 Addition . 4 1.2 Multiplication: The School Method . 4 1.3 A Recursive Version of the School Method . 6 1.4 Karatsuba Multiplication . 8 1.5 Implementation Notes . 10 1.6 Further Findings . 11 2 Introduction 13 2.1 Asymptotic Notation . 14 2.2 Machine Model . 16 2.3 Pseudocode . 19 2.4 Designing Correct Programs . 23 2.5 Basic Program Analysis . 25 2.6 Average Case Analysis and Randomized Algorithms . 29 2.7 Data Structures for Sets and Sequences . 32 2.8 Graphs . 32 2.9 Implementation Notes . 37 2.10 Further Findings . 38 3 Representing Sequences by Arrays and Linked Lists 39 3.1 Unbounded Arrays . 40 3.2 Linked Lists . 45 3.3 Stacks and Queues . 51 3.4 Lists versus Arrays . 54 3.5 Implementation Notes . 56 3.6 Further Findings . 57 vi CONTENTS CONTENTS vii 4 Hash Tables 59 9 Graph Traversal 157 4.1 Hashing with Chaining . 62 9.1 Breadth First Search . 158 4.2 Universal Hash Functions . 63 9.2 Depth First Search . 159 4.3 Hashing with Linear Probing . 67 9.3 Implementation Notes . 165 4.4 Chaining Versus Linear Probing . 70 9.4 Further Findings . 166 4.5 Implementation Notes . 70 4.6 Further Findings . 72 10 Shortest Paths 167 5 Sorting and Selection 75 10.1 Introduction . 167 5.1 Simple Sorters . 78 10.2 Arbitrary Edge Costs (Bellman-Ford Algorithm) . 171 5.2 Mergesort — an O(nlogn) Algorithm . 80 10.3 Acyclic Graphs . 172 5.3 A Lower Bound . 83 10.4 Non-Negative Edge Costs (Dijkstra’s Algorithm) . 173 5.4 Quicksort . 84 10.5 Monotone Integer Priority Queues . 176 5.5 Selection . 90 10.6 All Pairs Shortest Paths and Potential Functions . 181 5.6 Breaking the Lower Bound . 92 10.7 Implementation Notes . 182 5.7 External Sorting . 96 10.8 Further Findings . 183 5.8 Implementation Notes . 98 5.9 Further Findings . 101 11 Minimum Spanning Trees 185 6 Priority Queues 105 11.1 Selecting and Discarding MST Edges . 186 6.1 Binary Heaps . 107 11.2 The Jarn´ık-Prim Algorithm . 187 6.2 Addressable Priority Queues . 112 11.3 Kruskal’s Algorithm . 188 6.3 Implementation Notes . 120 11.4 The Union-Find Data Structure . 190 6.4 Further Findings . 121 11.5 Implementation Notes . 191 7 Sorted Sequences 123 11.6 Further Findings . 192 7.1 Binary Search Trees . 126 7.2 Implementation by (a;b)-Trees . 128 12 Generic Approaches to Optimization 195 7.3 More Operations . 134 12.1 Linear Programming — A Black Box Solver . 196 7.4 Augmenting Search Trees . 138 12.2 Greedy Algorithms — Never Look Back . 199 7.5 Implementation Notes . 140 12.3 Dynamic Programming — Building it Piece by Piece . 201 7.6 Further Findings . 143 12.4 Systematic Search — If in Doubt, Use Brute Force . 204 12.5 Local Search — Think Globally, Act Locally . 207 8 Graph Representation 147 12.6 Evolutionary Algorithms . 214 8.1 Edge Sequences . 148 8.2 Adjacency Arrays — Static Graphs . 149 12.7 Implementation Notes . 217 8.3 Adjacency Lists — Dynamic Graphs . 150 12.8 Further Findings . 217 8.4 Adjacency Matrix Representation . 151 8.5 Implicit Representation . 152 13 Summary: Tools and Techniques for Algorithm Design 219 8.6 Implementation Notes . 153 13.1 Generic Techniques . 219 8.7 Further Findings . 154 13.2 Data Structures for Sets . 220 viii CONTENTS CONTENTS 1 A Notation 225 [amuse geule arithmetik. Bild von Al Chawarizmi] = A.1 General Mathematical Notation . 225 ( A.2 Some Probability Theory . 227 A.3 Useful Formulas . 228 Bibliography 230 2 CONTENTS Mehlhorn, Sanders June 11, 2005 3 Chapter 1 Amuse Geule: Integer Arithmetics We introduce our readers into the design, analysis, and implementation of algorithms by studying algorithms for basic arithmetic operations on large integers. We treat addition and multiplication in the text and leave division and square roots for the exercises. Integer arithmetic is interesting for many reasons: Arithmetic on long integers is needed in applications like cryptography, geo- • metric computing, and computer algebra. We are familiar with the problem and know algorithms for addition and multi- • plication. We will see that the high school algorithm for integer multiplication is far from optimal and that much better algorithms exist. We will learn basic analysis techniques in a simple setting. • We will learn basic algorithm engineering techniques in a simple setting. • We will see the interplay between theory and experiment in a simple setting. • We assume that integers are represented as digit-strings (digits zero and one in our theoretical analysis and larger digits in our programs) and that two primitive opera- tions are available: the addition of three digits with a two digit result (this is sometimes called a full adder) and the multiplication of two digits with a one digit result. We will measure the efficiency of our algorithms by the number of primitive operations exe- cuted. 4 Amuse Geule: Integer Arithmetics 1.2 Multiplication: The School Method 5 We assume throughout this section that a and b are n-digit integers. We refer to n T the digits of a as an 1 to a0 with an 1 being the most significant (also called leading) 40000 0.3 − − digit and a0 being the least significant digit. [consistently replaced bit (was used 80000 1.18 = mixed with digit) by digit] 160000 4.8 ) 320000 20.34 1.1 Addition Table 1.1: The running time of the school method for the multiplication of n-bit inte- We all know how to add two integers a and b. We simply write them on top of each gers. The running time grows quadratically. other with the least significant digits aligned and sum digit-wise, carrying a single bit = from one position to the next. [picture!] ) = [todo: proper alignment of numbers in tables] Table 1.1 shows the execution c 0 : Digit // Variable for the carry digit = ) time of the school method using a C++ implementation and 32 bit digits. The time for i := 0 to n 1 do add a , b , and c to form s and a new carry c i i i given is the average execution time over ??? many random inputs on a ??? machine. s c − n = The quadratic growth of the running time is clearly visible: Doubling n leads to a We need one primitive operation for each position and hence a total of n primitive four-fold increase in running time. We can interpret the table in different ways: operations. (1) We can take the table to confirm our theoretical analysis. Our analysis pre- dicts quadratic growth and we are measuring quadratic growth. However, we ana- Lemma 1.1 Two n-digit integers can be added with n primitive operations. lyzed the number of primitive operations and we measured running time on a ??? computer.Our analysis concentrates on primitive operations on digits and completely ignores all book keeping operations and all questions of storage. The experiments 1.2 Multiplication: The School Method show that this abstraction is a useful one. We will frequently only analyze the num- ber of “representative operations”. Of course, the choice of representative operations = [picture!] We all know how to multiply two integers. In this section we will review requires insight and knowledge. In Section 2.2 we will introduce a more realistic com- ) the method familiar to all of us, in later sections we will get to know a method which puter model to have a basis for abstraction. We will develop tools to analyze running is significantly faster for large integers. time of algorithms on this model. We will also connect our model to real machines, so The school method for integer multiplication works as follows: We first form par- that we can take our analysis as a predictor of actual performance. We will investigate tial products pi by multiplying a with the i-th digit bi of b and then sum the suitably the limits of our theory. Under what circumstances are we going to concede that an i aligned products pi 2 to obtain the product of a and b. experiment contradicts theoretical analysis? · p=0 : (2) We can use the table to strengthen our theoretical analysis. Our theoretical i analysis tells us that the running time grows quadratically in n. From our table we for i := 0 to n do p:= a bi 2 + p · · may conclude that the running time on a ??? is approximately ??? n2 seconds. We Let us analyze the number of primitive operations required by the school method. can use this knowledge to predict the running time for our program· on other inputs. 2 We need n primitive multiplications to multiply a by bi and hence a total of n n = n [todo: redo numbers.] Here are three sample outputs. For n =100 000, the running = · primitive multiplications. All intermediate sums are at most 2n-digit integers and time is 1:85 seconds and the ratio is 1:005, for n = 1000, the running time is 0:0005147 ( 2 5 hence each iterations needs at most 2n primitive additions. Thus there are at most 2n seconds and the ratio is 2:797, for n = 200, the running time is 3:3 10− seconds and primitive additions. the ratio is 4:5, and for n =1 000 000, the running time is 263:6 seconds· and the ratio.
Recommended publications
  • Linear Probing with Constant Independence
    Linear Probing with Constant Independence Anna Pagh∗ Rasmus Pagh∗ Milan Ruˇzic´ ∗ ABSTRACT 1. INTRODUCTION Hashing with linear probing dates back to the 1950s, and Hashing with linear probing is perhaps the simplest algo- is among the most studied algorithms. In recent years it rithm for storing and accessing a set of keys that obtains has become one of the most important hash table organiza- nontrivial performance. Given a hash function h, a key x is tions since it uses the cache of modern computers very well. inserted in an array by searching for the first vacant array Unfortunately, previous analyses rely either on complicated position in the sequence h(x), h(x) + 1, h(x) + 2,... (Here, and space consuming hash functions, or on the unrealistic addition is modulo r, the size of the array.) Retrieval of a assumption of free access to a truly random hash function. key proceeds similarly, until either the key is found, or a Already Carter and Wegman, in their seminal paper on uni- vacant position is encountered, in which case the key is not versal hashing, raised the question of extending their anal- present in the data structure. Deletions can be performed ysis to linear probing. However, we show in this paper that by moving elements back in the probe sequence in a greedy linear probing using a pairwise independent family may have fashion (ensuring that no key x is moved beyond h(x)), until expected logarithmic cost per operation. On the positive a vacant array position is encountered. side, we show that 5-wise independence is enough to ensure Linear probing dates back to 1954, but was first analyzed constant expected time per operation.
    [Show full text]
  • CSC 344 – Algorithms and Complexity Why Search?
    CSC 344 – Algorithms and Complexity Lecture #5 – Searching Why Search? • Everyday life -We are always looking for something – in the yellow pages, universities, hairdressers • Computers can search for us • World wide web provides different searching mechanisms such as yahoo.com, bing.com, google.com • Spreadsheet – list of names – searching mechanism to find a name • Databases – use to search for a record • Searching thousands of records takes time the large number of comparisons slows the system Sequential Search • Best case? • Worst case? • Average case? Sequential Search int linearsearch(int x[], int n, int key) { int i; for (i = 0; i < n; i++) if (x[i] == key) return(i); return(-1); } Improved Sequential Search int linearsearch(int x[], int n, int key) { int i; //This assumes an ordered array for (i = 0; i < n && x[i] <= key; i++) if (x[i] == key) return(i); return(-1); } Binary Search (A Decrease and Conquer Algorithm) • Very efficient algorithm for searching in sorted array: – K vs A[0] . A[m] . A[n-1] • If K = A[m], stop (successful search); otherwise, continue searching by the same method in: – A[0..m-1] if K < A[m] – A[m+1..n-1] if K > A[m] Binary Search (A Decrease and Conquer Algorithm) l ← 0; r ← n-1 while l ≤ r do m ← (l+r)/2 if K = A[m] return m else if K < A[m] r ← m-1 else l ← m+1 return -1 Analysis of Binary Search • Time efficiency • Worst-case recurrence: – Cw (n) = 1 + Cw( n/2 ), Cw (1) = 1 solution: Cw(n) = log 2(n+1) 6 – This is VERY fast: e.g., Cw(10 ) = 20 • Optimal for searching a sorted array • Limitations: must be a sorted array (not linked list) binarySearch int binarySearch(int x[], int n, int key) { int low, high, mid; low = 0; high = n -1; while (low <= high) { mid = (low + high) / 2; if (x[mid] == key) return(mid); if (x[mid] > key) high = mid - 1; else low = mid + 1; } return(-1); } Searching Problem Problem: Given a (multi)set S of keys and a search key K, find an occurrence of K in S, if any.
    [Show full text]
  • Implementing the Map ADT Outline
    Implementing the Map ADT Outline ´ The Map ADT ´ Implementation with Java Generics ´ A Hash Function ´ translation of a string key into an integer ´ Consider a few strategies for implementing a hash table ´ linear probing ´ quadratic probing ´ separate chaining hashing ´ OrderedMap using a binary search tree The Map ADT ´A Map models a searchable collection of key-value mappings ´A key is said to be “mapped” to a value ´Also known as: dictionary, associative array ´Main operations: insert, find, and delete Applications ´ Store large collections with fast operations ´ For a long time, Java only had Vector (think ArrayList), Stack, and Hashmap (now there are about 67) ´ Support certain algorithms ´ for example, probabilistic text generation in 127B ´ Store certain associations in meaningful ways ´ For example, to store connected rooms in Hunt the Wumpus in 335 The Map ADT ´A value is "mapped" to a unique key ´Need a key and a value to insert new mappings ´Only need the key to find mappings ´Only need the key to remove mappings 5 Key and Value ´With Java generics, you need to specify ´ the type of key ´ the type of value ´Here the key type is String and the value type is BankAccount Map<String, BankAccount> accounts = new HashMap<String, BankAccount>(); 6 put(key, value) get(key) ´Add new mappings (a key mapped to a value): Map<String, BankAccount> accounts = new TreeMap<String, BankAccount>(); accounts.put("M",); accounts.put("G", new BankAcnew BankAccount("Michel", 111.11)count("Georgie", 222.22)); accounts.put("R", new BankAccount("Daniel",
    [Show full text]
  • Hash Tables & Searching Algorithms
    Search Algorithms and Tables Chapter 11 Tables • A table, or dictionary, is an abstract data type whose data items are stored and retrieved according to a key value. • The items are called records. • Each record can have a number of data fields. • The data is ordered based on one of the fields, named the key field. • The record we are searching for has a key value that is called the target. • The table may be implemented using a variety of data structures: array, tree, heap, etc. Sequential Search public static int search(int[] a, int target) { int i = 0; boolean found = false; while ((i < a.length) && ! found) { if (a[i] == target) found = true; else i++; } if (found) return i; else return –1; } Sequential Search on Tables public static int search(someClass[] a, int target) { int i = 0; boolean found = false; while ((i < a.length) && !found){ if (a[i].getKey() == target) found = true; else i++; } if (found) return i; else return –1; } Sequential Search on N elements • Best Case Number of comparisons: 1 = O(1) • Average Case Number of comparisons: (1 + 2 + ... + N)/N = (N+1)/2 = O(N) • Worst Case Number of comparisons: N = O(N) Binary Search • Can be applied to any random-access data structure where the data elements are sorted. • Additional parameters: first – index of the first element to examine size – number of elements to search starting from the first element above Binary Search • Precondition: If size > 0, then the data structure must have size elements starting with the element denoted as the first element. In addition, these elements are sorted.
    [Show full text]
  • Pseudorandom Data and Universal Hashing∗
    CS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing∗ Tim Roughgardeny April 14, 2014 1 Motivation: Linear Probing and Universal Hashing This lecture discusses a very neat paper of Mitzenmacher and Vadhan [8], which proposes a robust measure of “sufficiently random data" and notes interesting consequences for hashing and some related applications. We consider hash functions from N = f0; 1gn to M = f0; 1gm. Canonically, m is much smaller than n. We abuse notation and use N; M to denote both the sets and the cardinalities of the sets. Since a hash function h : N ! M is effectively compressing a larger set into a smaller one, collisions (distinct elements x; y 2 N with h(x) = h(y)) are inevitable. There are many way of resolving collisions. One that is common in practice is linear probing, where given a data element x, one starts at the slot h(x), and then proceeds to h(x) + 1, h(x) + 2, etc. until a suitable slot is found. (Either an empty slot if the goal is to insert x; or a slot that contains x if the goal is to search for x.) The linear search wraps around the table (from slot M − 1 back to 0), if needed. Linear probing interacts well with caches and prefetching, which can be a big win in some application. Recall that every fixed hash function performs badly on some data set, since by the Pigeonhole Principle there is a large data set of elements with equal hash values. Thus the analysis of hashing always involves some kind of randomization, either in the input or in the hash function.
    [Show full text]
  • Collisions There Is Still a Problem with Our Current Hash Table
    18.1 Hashing 1061 Since our mapping from element value to preferred index now has a bit of com- plexity to it, we might turn it into a method that accepts the element value as a parameter and returns the right index for that value. Such a method is referred to as a hash function , and an array that uses such a function to govern insertion and deletion of its elements is called a hash table . The individual indexes in the hash table are also sometimes informally called buckets . Our hash function so far is the following: private int hashFunction(int value) { return Math.abs(value) % elementData.length; } Hash Function A method for rapidly mapping between element values and preferred array indexes at which to store those values. Hash Table An array that stores its elements in indexes produced by a hash function. Collisions There is still a problem with our current hash table. Because our hash function wraps values to fit in the array bounds, it is now possible that two values could have the same preferred index. For example, if we try to insert 45 into the hash table, it maps to index 5, conflicting with the existing value 5. This is called a collision . Our imple- mentation is incomplete until we have a way of dealing with collisions. If the client tells the set to insert 45 , the value 45 must be added to the set somewhere; it’s up to us to decide where to put it. Collision When two or more element values in a hash table produce the same result from its hash function, indicating that they both prefer to be stored in the same index of the table.
    [Show full text]
  • Cuckoo Hashing
    Cuckoo Hashing Outline for Today ● Towards Perfect Hashing ● Reducing worst-case bounds ● Cuckoo Hashing ● Hashing with worst-case O(1) lookups. ● The Cuckoo Graph ● A framework for analyzing cuckoo hashing. ● Analysis of Cuckoo Hashing ● Just how fast is cuckoo hashing? Perfect Hashing Collision Resolution ● Last time, we mentioned three general strategies for resolving hash collisions: ● Closed addressing: Store all colliding elements in an auxiliary data structure like a linked list or BST. ● Open addressing: Allow elements to overflow out of their target bucket and into other spaces. ● Perfect hashing: Choose a hash function with no collisions. ● We have not spoken on this last topic yet. Why Perfect Hashing is Hard ● The expected cost of a lookup in a chained hash table is O(1 + α) for any load factor α. ● For any fixed load factor α, the expected cost of a lookup in linear probing is O(1), where the constant depends on α. ● However, the expected cost of a lookup in these tables is not the same as the expected worst-case cost of a lookup in these tables. Expected Worst-Case Bounds ● Theorem: Assuming truly random hash functions, the expected worst-case cost of a lookup in a linear probing hash table is Ω(log n). ● Theorem: Assuming truly random hash functions, the expected worst-case cost of a lookup in a chained hash table is Θ(log n / log log n). ● Proofs: Exercise 11-1 and 11-2 from CLRS. ☺ Perfect Hashing ● A perfect hash table is one where lookups take worst-case time O(1). ● There's a pretty sizable gap between the expected worst-case bounds from chaining and linear probing – and that's on expected worst-case, not worst-case.
    [Show full text]
  • Linear Probing with 5-Independent Hashing
    Lecture Notes on Linear Probing with 5-Independent Hashing Mikkel Thorup May 12, 2017 Abstract These lecture notes show that linear probing takes expected constant time if the hash function is 5-independent. This result was first proved by Pagh et al. [STOC’07,SICOMP’09]. The simple proof here is essentially taken from [Pˇatra¸scu and Thorup ICALP’10]. We will also consider a smaller space version of linear probing that may have false positives like Bloom filters. These lecture notes illustrate the use of higher moments in data structures, and could be used in a course on randomized algorithms. 1 k-independence The concept of k-independence was introduced by Wegman and Carter [21] in FOCS’79 and has been the cornerstone of our understanding of hash functions ever since. A hash function is a random function h : [u] [t] mapping keys to hash values. Here [s]= 0,...,s 1 . We can also → { − } think of a h as a random variable distributed over [t][u]. We say that h is k-independent if for any distinct keys x ,...,x [u] and (possibly non-distinct) hash values y ,...,y [t], we have 0 k−1 ∈ 0 k−1 ∈ Pr[h(x )= y h(x )= y ] = 1/tk. Equivalently, we can define k-independence via two 0 0 ∧···∧ k−1 k−1 separate conditions; namely, (a) for any distinct keys x ,...,x [u], the hash values h(x ),...,h(x ) are independent 0 k−1 ∈ 0 k−1 random variables, that is, for any (possibly non-distinct) hash values y ,...,y [t] and 0 k−1 ∈ i [k], Pr[h(x )= y ]=Pr h(x )= y h(x )= y , and ∈ i i i i | j∈[k]\{i} j j arXiv:1509.04549v3 [cs.DS] 11 May 2017 h i (b) for any x [u], h(x) is uniformly distributedV in [t].
    [Show full text]
  • Cuckoo Hashing
    Cuckoo Hashing Rasmus Pagh* BRICSy, Department of Computer Science, Aarhus University Ny Munkegade Bldg. 540, DK{8000 A˚rhus C, Denmark. E-mail: [email protected] and Flemming Friche Rodlerz ON-AIR A/S, Digtervejen 9, 9200 Aalborg SV, Denmark. E-mail: ff[email protected] We present a simple dictionary with worst case constant lookup time, equal- ing the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738{761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash func- tions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee. Key Words: data structures, dictionaries, information retrieval, searching, hash- ing, experiments * Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT). This work was initiated while visiting Stanford University. y Basic Research in Computer Science (www.brics.dk), funded by the Danish National Research Foundation. z This work was done while at Aarhus University. 1 2 PAGH AND RODLER 1. INTRODUCTION The dictionary data structure is ubiquitous in computer science.
    [Show full text]
  • The Power of Simple Tabulation Hashing
    The Power of Simple Tabulation Hashing Mihai Pˇatra¸scu Mikkel Thorup AT&T Labs AT&T Labs May 10, 2011 Abstract Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC'77). Keys are viewed as consist- ing of c characters. We initialize c tables T1;:::;Tc mapping characters to random hash codes. A key x = (x1; : : : ; xc) is hashed to T1[x1] ⊕ · · · ⊕ Tc[xc], where ⊕ denotes xor. While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing. Contents 1 Introduction 2 2 Concentration Bounds 7 3 Linear Probing and the Concentration in Arbitrary Intervals 13 4 Cuckoo Hashing 19 5 Minwise Independence 27 6 Fourth Moment Bounds 32 arXiv:1011.5200v2 [cs.DS] 7 May 2011 A Experimental Evaluation 39 B Chernoff Bounds with Fixed Means 46 1 1 Introduction An important target of the analysis of algorithms is to determine whether there exist practical schemes, which enjoy mathematical guarantees on performance. Hashing and hash tables are one of the most common inner loops in real-world computation, and are even built-in \unit cost" operations in high level programming languages that offer associative arrays. Often, these inner loops dominate the overall computation time.
    [Show full text]
  • Fundamental Data Structures Contents
    Fundamental Data Structures Contents 1 Introduction 1 1.1 Abstract data type ........................................... 1 1.1.1 Examples ........................................... 1 1.1.2 Introduction .......................................... 2 1.1.3 Defining an abstract data type ................................. 2 1.1.4 Advantages of abstract data typing .............................. 4 1.1.5 Typical operations ...................................... 4 1.1.6 Examples ........................................... 5 1.1.7 Implementation ........................................ 5 1.1.8 See also ............................................ 6 1.1.9 Notes ............................................. 6 1.1.10 References .......................................... 6 1.1.11 Further ............................................ 7 1.1.12 External links ......................................... 7 1.2 Data structure ............................................. 7 1.2.1 Overview ........................................... 7 1.2.2 Examples ........................................... 7 1.2.3 Language support ....................................... 8 1.2.4 See also ............................................ 8 1.2.5 References .......................................... 8 1.2.6 Further reading ........................................ 8 1.2.7 External links ......................................... 9 1.3 Analysis of algorithms ......................................... 9 1.3.1 Cost models ......................................... 9 1.3.2 Run-time analysis
    [Show full text]
  • Efficient Usage of One-Sided RDMA for Linear Probing
    Efficient Usage of One•Sided RDMA for Linear Probing Tinggang Wang*1 Shuo Yang*1 Hideaki Kimura2 Garret Swart2 Spyros Blanas1 1 The Ohio State University 2 Oracle Corp. {wang.5264, yang.5229, blanas.2}@osu.edu {hideaki.kimura, garret.swart}@oracle.com ABSTRACT Cuckoo hashing has been widely adopted by RDMA-aware RDMA has been an important building block for many high- key-value stores [3, 8, 20] due to the predictable and small performance distributed key-value stores in recent prior work. number of RDMA operations to access a remote hash table. To sustain millions of operations per second per node, many Prior work also proposed using one-sided RDMA verbs to of these works use hashing schemes, such as cuckoo hash- implement linear probing by doing fixed-size lookups [20], ing, that guarantee that an existing key can be found in a where the client fetches a fixed number of hash slots per small, constant number of RDMA operations. In this pa- RDMA READ request and repeats the procedure until find- per, we explore whether linear probing is a compelling de- ing the key or the first empty slot. However, fetching a fixed sign alternative to cuckoo-based hashing schemes. Instead number of slots per RDMA READ request may read too lit- of reading a fixed number of bytes per RDMA request, this tle or too much: in a lightly-loaded hash table, one should paper introduces a mathematical model that picks the opti- read very few slots to conserve network bandwidth; with mal read size by balancing the cost of performing an RDMA high load factors, the average length of the probe sequence operation with the probability of encountering a probe se- increases dramatically, hence one needs to read many more quence of a certain length.
    [Show full text]