Concise Algorithmics Or Algorithms and Data Structures — the Basic Toolbox Or

Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf Concise Algorithmics or Algorithms and Data Structures — The Basic Toolbox or. Kurt Mehlhorn and Peter Sanders Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf Mehlhorn, Sanders June 11, 2005 iii Foreword Buy me not [25]. iv Mehlhorn, Sanders June 11, 2005 v Contents 1 Amuse Geule: Integer Arithmetics 3 1.1 Addition . 4 1.2 Multiplication: The School Method . 4 1.3 A Recursive Version of the School Method . 6 1.4 Karatsuba Multiplication . 8 1.5 Implementation Notes . 10 1.6 Further Findings . 11 2 Introduction 13 2.1 Asymptotic Notation . 14 2.2 Machine Model . 16 2.3 Pseudocode . 19 2.4 Designing Correct Programs . 23 2.5 Basic Program Analysis . 25 2.6 Average Case Analysis and Randomized Algorithms . 29 2.7 Data Structures for Sets and Sequences . 32 2.8 Graphs . 32 2.9 Implementation Notes . 37 2.10 Further Findings . 38 3 Representing Sequences by Arrays and Linked Lists 39 3.1 Unbounded Arrays . 40 3.2 Linked Lists . 45 3.3 Stacks and Queues . 51 3.4 Lists versus Arrays . 54 3.5 Implementation Notes . 56 3.6 Further Findings . 57 vi CONTENTS CONTENTS vii 4 Hash Tables 59 9 Graph Traversal 157 4.1 Hashing with Chaining . 62 9.1 Breadth First Search . 158 4.2 Universal Hash Functions . 63 9.2 Depth First Search . 159 4.3 Hashing with Linear Probing . 67 9.3 Implementation Notes . 165 4.4 Chaining Versus Linear Probing . 70 9.4 Further Findings . 166 4.5 Implementation Notes . 70 4.6 Further Findings . 72 10 Shortest Paths 167 5 Sorting and Selection 75 10.1 Introduction . 167 5.1 Simple Sorters . 78 10.2 Arbitrary Edge Costs (Bellman-Ford Algorithm) . 171 5.2 Mergesort — an O(nlogn) Algorithm . 80 10.3 Acyclic Graphs . 172 5.3 A Lower Bound . 83 10.4 Non-Negative Edge Costs (Dijkstra’s Algorithm) . 173 5.4 Quicksort . 84 10.5 Monotone Integer Priority Queues . 176 5.5 Selection . 90 10.6 All Pairs Shortest Paths and Potential Functions . 181 5.6 Breaking the Lower Bound . 92 10.7 Implementation Notes . 182 5.7 External Sorting . 96 10.8 Further Findings . 183 5.8 Implementation Notes . 98 5.9 Further Findings . 101 11 Minimum Spanning Trees 185 6 Priority Queues 105 11.1 Selecting and Discarding MST Edges . 186 6.1 Binary Heaps . 107 11.2 The Jarn´ık-Prim Algorithm . 187 6.2 Addressable Priority Queues . 112 11.3 Kruskal’s Algorithm . 188 6.3 Implementation Notes . 120 11.4 The Union-Find Data Structure . 190 6.4 Further Findings . 121 11.5 Implementation Notes . 191 7 Sorted Sequences 123 11.6 Further Findings . 192 7.1 Binary Search Trees . 126 7.2 Implementation by (a;b)-Trees . 128 12 Generic Approaches to Optimization 195 7.3 More Operations . 134 12.1 Linear Programming — A Black Box Solver . 196 7.4 Augmenting Search Trees . 138 12.2 Greedy Algorithms — Never Look Back . 199 7.5 Implementation Notes . 140 12.3 Dynamic Programming — Building it Piece by Piece . 201 7.6 Further Findings . 143 12.4 Systematic Search — If in Doubt, Use Brute Force . 204 12.5 Local Search — Think Globally, Act Locally . 207 8 Graph Representation 147 12.6 Evolutionary Algorithms . 214 8.1 Edge Sequences . 148 8.2 Adjacency Arrays — Static Graphs . 149 12.7 Implementation Notes . 217 8.3 Adjacency Lists — Dynamic Graphs . 150 12.8 Further Findings . 217 8.4 Adjacency Matrix Representation . 151 8.5 Implicit Representation . 152 13 Summary: Tools and Techniques for Algorithm Design 219 8.6 Implementation Notes . 153 13.1 Generic Techniques . 219 8.7 Further Findings . 154 13.2 Data Structures for Sets . 220 viii CONTENTS CONTENTS 1 A Notation 225 [amuse geule arithmetik. Bild von Al Chawarizmi] = A.1 General Mathematical Notation . 225 ( A.2 Some Probability Theory . 227 A.3 Useful Formulas . 228 Bibliography 230 2 CONTENTS Mehlhorn, Sanders June 11, 2005 3 Chapter 1 Amuse Geule: Integer Arithmetics We introduce our readers into the design, analysis, and implementation of algorithms by studying algorithms for basic arithmetic operations on large integers. We treat addition and multiplication in the text and leave division and square roots for the exercises. Integer arithmetic is interesting for many reasons: Arithmetic on long integers is needed in applications like cryptography, geo- • metric computing, and computer algebra. We are familiar with the problem and know algorithms for addition and multi- • plication. We will see that the high school algorithm for integer multiplication is far from optimal and that much better algorithms exist. We will learn basic analysis techniques in a simple setting. • We will learn basic algorithm engineering techniques in a simple setting. • We will see the interplay between theory and experiment in a simple setting. • We assume that integers are represented as digit-strings (digits zero and one in our theoretical analysis and larger digits in our programs) and that two primitive operations are available: the addition of three digits with a two digit result (this is sometimes called a full adder) and the multiplication of two digits with a one digit result. We will measure the efficiency of our algorithms by the number of primitive operations exe- cuted. 4 Amuse Geule: Integer Arithmetics 1.2 Multiplication: The School Method 5 We assume throughout this section that a and b are n-digit integers. We refer to n T the digits of a as an 1 to a0 with an 1 being the most significant (also called leading) 40000 0.3 − − digit and a0 being the least significant digit. [consistently replaced bit (was used 80000 1.18 = mixed with digit) by digit] 160000 4.8 ) 320000 20.34 1.1 Addition Table 1.1: The running time of the school method for the multiplication of n-bit inte- We all know how to add two integers a and b. We simply write them on top of each gers. The running time grows quadratically. other with the least significant digits aligned and sum digit-wise, carrying a single bit = from one position to the next. [picture!] ) = [todo: proper alignment of numbers in tables] Table 1.1 shows the execution c 0 : Digit // Variable for the carry digit = ) time of the school method using a C++ implementation and 32 bit digits. The time for i := 0 to n 1 do add a , b , and c to form s and a new carry c i i i given is the average execution time over ??? many random inputs on a ??? machine. s c − n = The quadratic growth of the running time is clearly visible: Doubling n leads to a We need one primitive operation for each position and hence a total of n primitive four-fold increase in running time. We can interpret the table in different ways: operations. (1) We can take the table to confirm our theoretical analysis. Our analysis pre- dicts quadratic growth and we are measuring quadratic growth. However, we ana- Lemma 1.1 Two n-digit integers can be added with n primitive operations. lyzed the number of primitive operations and we measured running time on a ??? computer.Our analysis concentrates on primitive operations on digits and completely ignores all book keeping operations and all questions of storage. The experiments 1.2 Multiplication: The School Method show that this abstraction is a useful one. We will frequently only analyze the number of “representative operations”. Of course, the choice of representative operations = [picture!] We all know how to multiply two integers. In this section we will review requires insight and knowledge. In Section 2.2 we will introduce a more realistic com- ) the method familiar to all of us, in later sections we will get to know a method which puter model to have a basis for abstraction. We will develop tools to analyze running is significantly faster for large integers. time of algorithms on this model. We will also connect our model to real machines, so The school method for integer multiplication works as follows: We first form par- that we can take our analysis as a predictor of actual performance. We will investigate tial products pi by multiplying a with the i-th digit bi of b and then sum the suitably the limits of our theory. Under what circumstances are we going to concede that an i aligned products pi 2 to obtain the product of a and b. experiment contradicts theoretical analysis? · p=0 : (2) We can use the table to strengthen our theoretical analysis. Our theoretical i analysis tells us that the running time grows quadratically in n. From our table we for i := 0 to n do p:= a bi 2 + p · · may conclude that the running time on a ??? is approximately ??? n2 seconds. We Let us analyze the number of primitive operations required by the school method. can use this knowledge to predict the running time for our program· on other inputs. 2 We need n primitive multiplications to multiply a by bi and hence a total of n n = n [todo: redo numbers.] Here are three sample outputs. For n =100 000, the running = · primitive multiplications. All intermediate sums are at most 2n-digit integers and time is 1:85 seconds and the ratio is 1:005, for n = 1000, the running time is 0:0005147 ( 2 5 hence each iterations needs at most 2n primitive additions. Thus there are at most 2n seconds and the ratio is 2:797, for n = 200, the running time is 3:3 10− seconds and primitive additions. the ratio is 4:5, and for n =1 000 000, the running time is 263:6 seconds· and the ratio.

Concise Algorithmics Or Algorithms and Data Structures — the Basic Toolbox Or

Linear Probing with Constant Independence

CSC 344 – Algorithms and Complexity Why Search?

Implementing the Map ADT Outline

Hash Tables & Searching Algorithms

Pseudorandom Data and Universal Hashing∗

Collisions There Is Still a Problem with Our Current Hash Table

Cuckoo Hashing

Linear Probing with 5-Independent Hashing

Cuckoo Hashing

The Power of Simple Tabulation Hashing

Fundamental Data Structures Contents

Efficient Usage of One-Sided RDMA for Linear Probing