Algorithms in a Nutshell 2E
Total Page:16
File Type:pdf, Size:1020Kb
www.it-ebooks.info SECOND EDITION Algorithms in a Nutshell 2E George T. Heineman, Gary Pollice, and Stanley Selkow Boston www.it-ebooks.info Algorithms in a Nutshell 2E, Second Edition by George T. Heineman, Gary Pollice, and Stanley Selkow Copyright © 2010 George Heineman, Gary Pollice and Stanley Selkow. All rights re‐ served. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editor: Mary Treseler Indexer: FIX ME! Production Editor: FIX ME! Cover Designer: Karen Montgomery Copyeditor: FIX ME! Interior Designer: David Futato Proofreader: FIX ME! Illustrator: Rebecca Demarest January -4712: Second Edition Revision History for the Second Edition: 2015-07-27: Early release revision 1 See http://oreilly.com/catalog/errata.csp?isbn=0636920032885 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. !!FILL THIS IN!! and related trade dress are trade‐ marks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their prod‐ ucts are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 063-6-920-03288-5 [?] www.it-ebooks.info Table of Contents 1. Thinking Algorithmically. 1 Understand the Problem 1 Naive Solution 3 Intelligent Approaches 4 Greedy 4 Divide and Conquer 5 Parallel 5 Approximation 6 Generalization 7 Summary 8 2. The Mathematics of Algorithms. 9 Size of a Problem Instance 9 Rate of Growth of Functions 10 Analysis in the Best, Average, and Worst Cases 15 Worst Case 18 Average Case 18 Best Case 19 Performance Families 20 Constant Behavior 20 Log n Behavior 21 Sublinear O(nd) Behavior for d < 1 23 Linear Performance 23 n log n Performance 27 Quadratic Performance 28 Less Obvious Performance Computations 30 Exponential Performance 33 Benchmark Operations 33 iii www.it-ebooks.info Lower and Upper Bounds 36 References 36 3. Algorithm Building Blocks. 37 Algorithm Template Format 37 Name 38 Input/Output 38 Context 38 Solution 38 Analysis 38 Variations 39 Pseudocode Template Format 39 Empirical Evaluation Format 40 Floating-Point Computation 40 Performance 41 Rounding Error 41 Comparing Floating Point Values 43 Special Quantities 44 Example Algorithm 45 Name and Synopsis 45 Input/Output 46 Context 46 Solution 46 Analysis 49 Common Approaches 49 Greedy 49 Divide and Conquer 50 Dynamic Programming 51 References 56 4. Sorting Algorithms. 57 Overview 57 Terminology 57 Representation 58 Comparable Elements 59 Stable Sorting 60 Criteria for Choosing a Sorting Algorithm 61 Transposition Sorting 61 Insertion Sort 61 Context 63 Solution 63 iv | Table of Contents www.it-ebooks.info Analysis 65 Selection Sort 66 Heap Sort 67 Context 72 Solution 73 Analysis 74 Variations 74 Partition-based Sorting 74 Context 80 Solution 80 Analysis 81 Variations 81 Sorting Without Comparisons 83 Bucket Sort 83 Solution 86 Analysis 88 Variations 89 Sorting with Extra Storage 90 Merge Sort 90 Input/Output 92 Solution 92 Analysis 93 Variations 94 String Benchmark Results 95 Analysis Techniques 98 References 99 5. Searching. 101 Sequential Search 102 Input/Output 103 Context 103 Solution 104 Analysis 105 Binary Search 106 Input/Output 106 Context 107 Solution 107 Analysis 108 Variations 110 Hash-based Search 111 Input/Output 113 Table of Contents | v www.it-ebooks.info Context 114 Solution 117 Analysis 119 Variations 122 Bloom Filter 127 Input/Output 129 Context 129 Solution 129 Analysis 131 Binary Search Tree 132 Input/Output 133 Context 133 Solution 135 Analysis 146 Variations 146 References 146 6. Graph Algorithms. 149 Graphs 151 Data Structure Design 154 Depth-First Search 155 Input/Output 160 Context 161 Solution 161 Analysis 163 Variations 164 Breadth-First Search 164 Input/Output 167 Context 168 Solution 168 Analysis 169 Single-Source Shortest Path 169 Input/Output 172 Solution 172 Analysis 174 Dijkstra’s Algorithm For Dense Graphs 174 Variations 177 Comparing Single Source Shortest Path Options 180 Benchmark data 181 Dense graphs 181 Sparse graphs 182 vi | Table of Contents www.it-ebooks.info All Pairs Shortest Path 183 Input/Output 186 Solution 186 Analysis 188 Minimum Spanning Tree Algorithms 188 Solution 191 Analysis 192 Variations 192 Final Thoughts on Graphs 192 Storage Issues 192 Graph Analysis 193 References 194 7. Path Finding in AI. 195 Game Trees 196 Minimax 199 Input/Output 202 Context 202 Solution 203 Analysis 205 NegMax 206 Solution 208 Analysis 210 AlphaBeta 210 Solution 214 Analysis 215 Search Trees 217 Representing State 220 Calculate available moves 221 Using Heuristic Information 221 Maximum Expansion Depth 223 Depth-First Search 223 Input/Output 225 Context 225 Solution 225 Analysis 227 Breadth-First Search 230 Input/Output 232 Context 232 Solution 233 Analysis 234 Table of Contents | vii www.it-ebooks.info A*Search 234 Input/Output 236 Context 236 Solution 239 Analysis 243 Variations 246 Comparing Search Tree Algorithms 247 References 251 8. Network Flow Algorithms. 255 Network Flow 257 Maximum Flow 259 Input/Output 261 Solution 262 Analysis 267 Optimization 268 Related Algorithms 270 Bipartite Matching 270 Input/Output 271 Solution 271 Analysis 274 Reflections on Augmenting Paths 274 Minimum Cost Flow 279 Transshipment 280 Solution 280 Transportation 283 Solution 283 Assignment 283 Solution 283 Linear Programming 283 References 285 9. Computational Geometry. 287 Classifying Problems 288 Input data 288 Computation 290 Nature of the task 291 Assumptions 291 Convex Hull 291 Convex Hull Scan 293 Input/Output 295 viii | Table of Contents www.it-ebooks.info Context 295 Solution 295 Analysis 297 Variations 299 Computing Line Segment Intersections 302 LineSweep 303 Input/Output 306 Context 306 Solution 307 Analysis 310 Variations 313 Voronoi Diagram 313 Input/Output 321 Solution 322 Analysis 327 References 328 10. Spatial Tree Structures. 329 Nearest Neighbor queries 330 Range Queries 331 Intersection Queries 331 Spatial Tree Structures 332 KD-Tree 332 Quad Tree 333 R-Tree 334 Nearest Neighbor 335 Input/Output 337 Context 338 Solution 338 Analysis 340 Variations 347 Range Query 347 Input/Output 349 Context 350 Solution 350 Analysis 351 QuadTrees 355 Input/Output 358 Solution 359 Analysis 362 Variations 363 Table of Contents | ix www.it-ebooks.info R-Trees 363 Input/Output 368 Context 368 Solution 369 Analysis 374 References 376 11. Emerging Algorithm Categories. 379 Variations on a Theme 379 Approximation Algorithms 380 Input/Output 381 Context 382 Solution 382 Analysis 384 Parallel Algorithms 386 Probabilistic Algorithms 392 Estimating the Size of a Set 392 Estimating the Size of a Search Tree 394 References 400 12. Epilogue. 401 Principle: Know Your Data 401 Principle: Decompose the Problem into Smaller Problems 402 Principle: Choose the Right Data Structure 404 Principle: Make the Space versus Time Trade-off 406 Principle: If No Solution Is Evident, Construct a Search 407 Principle: If No Solution Is Evident, Reduce Your Problem to Another Problem That Has a Solution 408 Principle: Writing Algorithms Is Hard—Testing Algorithms Is Harder 409 Principle: Accept Approximate Solution When Possible 410 Principle: Add Parallelism to Increase Performance 411 A. Benchmarking. 413 x | Table of Contents www.it-ebooks.info CHAPTER 1 Thinking Algorithmically Algorithms matter! Knowing which algorithm to apply under which set of circumstances can make a big difference in the software you produce. Let this book be your guide to learning about a number of important algorithm domains, such as sorting and searching. We will introduce a number of general approaches used by algorithms to solve problems, such as Divide and Conquer or Greedy strategy. You will be able to apply this knowledge to improve the efficiency of your own software. Data structures have been tightly tied to algorithms since the dawn of computing. In this book, you will learn the fundamental data struc‐ tures used to properly represent information for efficient processing. What do you need to do when choosing an algorithm? We’ll explore that in the following sections. Understand the Problem The first step to design an algorithm is to understand the problem you want to solve. Let’s start with a sample problem from the field of com‐ putational geometry. Given a set of points, P, in a two-dimensional plane, such as shown in Figure 1-1, picture a rubber band that has been stretched around the points and released. The resulting shape is known as the convex hull, that is, the smallest convex shape that fully encloses all points in P. 1 www.it-ebooks.info Figure 1-1. Sample set of points in plane Given a convex hull for P, any line segment drawn between any two points in P lies totally within the hull. Let’s assume that we order the points in the hull in clockwise fashion. Thus, the hull is formed by a clockwise ordering of h points L0, L1, … Lh-1 as shown in Figure 1-2. Each sequence of three hull points Li, Li+1, Li+2 creates a right turn. Figure 1-2. Computed convex hull for points With just this information, you can probably draw the convex hull for any set of points, but could you come up with an algorithm, that is, a step by step sequence of instructions, that will efficiently compute the convex hull for any set of points? 2 | Chapter 1: Thinking Algorithmically www.it-ebooks.info What we find interesting about the convex hull problem is that it doesn’t seem to be easily classified into existing algorithmic domains.