Open Data Structures (In Pseudocode)
Total Page:16
File Type:pdf, Size:1020Kb
Open Data Structures (in pseudocode) Edition 0.1Gβ Pat Morin Contents Acknowledgments ix Why This Book? xi 1 Introduction 1 1.1 The Need for Efficiency ..................... 2 1.2 Interfaces ............................. 4 1.2.1 The Queue, Stack, and Deque Interfaces . 5 1.2.2 The List Interface: Linear Sequences . 6 1.2.3 The USet Interface: Unordered Sets .......... 8 1.2.4 The SSet Interface: Sorted Sets ............. 8 1.3 Mathematical Background ................... 9 1.3.1 Exponentials and Logarithms . 10 1.3.2 Factorials ......................... 11 1.3.3 Asymptotic Notation . 12 1.3.4 Randomization and Probability . 15 1.4 The Model of Computation ................... 18 1.5 Correctness, Time Complexity, and Space Complexity . 19 1.6 Code Samples .......................... 21 1.7 List of Data Structures ..................... 23 1.8 Discussion and Exercises .................... 23 2 Array-Based Lists 31 2.1 ArrayStack: Fast Stack Operations Using an Array . 32 2.1.1 The Basics ........................ 32 2.1.2 Growing and Shrinking . 35 2.1.3 Summary ......................... 37 Contents 2.2 FastArrayStack: An Optimized ArrayStack . 37 2.3 ArrayQueue: An Array-Based Queue . 38 2.3.1 Summary ......................... 41 2.4 ArrayDeque: Fast Deque Operations Using an Array . 42 2.4.1 Summary ......................... 44 2.5 DualArrayDeque: Building a Deque from Two Stacks . 44 2.5.1 Balancing ......................... 48 2.5.2 Summary ......................... 50 2.6 RootishArrayStack: A Space-Efficient Array Stack . 50 2.6.1 Analysis of Growing and Shrinking . 55 2.6.2 Space Usage ....................... 55 2.6.3 Summary ......................... 56 2.7 Discussion and Exercises .................... 57 3 Linked Lists 61 3.1 SLList: A Singly-Linked List . 61 3.1.1 Queue Operations .................... 63 3.1.2 Summary ......................... 64 3.2 DLList: A Doubly-Linked List . 64 3.2.1 Adding and Removing . 66 3.2.2 Summary ......................... 68 3.3 SEList: A Space-Efficient Linked List . 68 3.3.1 Space Requirements . 69 3.3.2 Finding Elements .................... 69 3.3.3 Adding an Element ................... 71 3.3.4 Removing an Element . 73 3.3.5 Amortized Analysis of Spreading and Gathering . 75 3.3.6 Summary ......................... 77 3.4 Discussion and Exercises .................... 78 4 Skiplists 83 4.1 The Basic Structure ....................... 83 4.2 SkiplistSSet: An Efficient SSet . 85 4.2.1 Summary ......................... 88 4.3 SkiplistList: An Efficient Random-Access List . 89 4.3.1 Summary ......................... 93 iv 4.4 Analysis of Skiplists ....................... 93 4.5 Discussion and Exercises .................... 97 5 Hash Tables 101 5.1 ChainedHashTable: Hashing with Chaining . 101 5.1.1 Multiplicative Hashing . 104 5.1.2 Summary . 108 5.2 LinearHashTable: Linear Probing . 108 5.2.1 Analysis of Linear Probing . 112 5.2.2 Summary . 115 5.2.3 Tabulation Hashing . 115 5.3 Hash Codes ............................117 5.3.1 Hash Codes for Primitive Data Types . 117 5.3.2 Hash Codes for Compound Objects . 117 5.3.3 Hash Codes for Arrays and Strings . 119 5.4 Discussion and Exercises . 122 6 Binary Trees 127 6.1 BinaryTree: A Basic Binary Tree . 129 6.1.1 Recursive Algorithms . 129 6.1.2 Traversing Binary Trees . 130 6.2 BinarySearchTree: An Unbalanced Binary Search Tree . 133 6.2.1 Searching . 133 6.2.2 Addition . 135 6.2.3 Removal . 137 6.2.4 Summary . 139 6.3 Discussion and Exercises . 140 7 Random Binary Search Trees 145 7.1 Random Binary Search Trees . 145 7.1.1 Proof of Lemma 7.1 . 148 7.1.2 Summary . 150 7.2 Treap: A Randomized Binary Search Tree . 151 7.2.1 Summary . 158 7.3 Discussion and Exercises . 160 v Contents 8 Scapegoat Trees 165 8.1 ScapegoatTree: A Binary Search Tree with Partial Rebuilding166 8.1.1 Analysis of Correctness and Running-Time . 170 8.1.2 Summary . 172 8.2 Discussion and Exercises . 173 9 Red-Black Trees 177 9.1 2-4 Trees .............................178 9.1.1 Adding a Leaf . 179 9.1.2 Removing a Leaf . 179 9.2 RedBlackTree: A Simulated 2-4 Tree . 182 9.2.1 Red-Black Trees and 2-4 Trees . 183 9.2.2 Left-Leaning Red-Black Trees . 186 9.2.3 Addition . 188 9.2.4 Removal . 191 9.3 Summary .............................196 9.4 Discussion and Exercises . 198 10 Heaps 203 10.1 BinaryHeap: An Implicit Binary Tree . 203 10.1.1 Summary . 209 10.2 MeldableHeap: A Randomized Meldable Heap . 209 10.2.1 Analysis of merge(h1,h2) . 212 10.2.2 Summary . 214 10.3 Discussion and Exercises . 214 11 Sorting Algorithms 217 11.1 Comparison-Based Sorting . 218 11.1.1 Merge-Sort . 218 11.1.2 Quicksort . 222 11.1.3 Heap-sort . 225 11.1.4 A Lower-Bound for Comparison-Based Sorting . 227 11.2 Counting Sort and Radix Sort . 231 11.2.1 Counting Sort . 231 11.2.2 Radix-Sort . 233 11.3 Discussion and Exercises . 235 vi 12 Graphs 239 12.1 AdjacencyMatrix: Representing a Graph by a Matrix . 241 12.2 AdjacencyLists: A Graph as a Collection of Lists . 244 12.3 Graph Traversal . 247 12.3.1 Breadth-First Search . 248 12.3.2 Depth-First Search . 250 12.4 Discussion and Exercises . 252 13 Data Structures for Integers 257 13.1 BinaryTrie: A digital search tree . 258 13.2 XFastTrie: Searching in Doubly-Logarithmic Time . 264 13.3 YFastTrie: A Doubly-Logarithmic Time SSet . 267 13.4 Discussion and Exercises . 272 14 External Memory Searching 275 14.1 The Block Store . 277 14.2 B-Trees ..............................277 14.2.1 Searching . 280 14.2.2 Addition . 282 14.2.3 Removal . 287 14.2.4 Amortized Analysis of B-Trees . 293 14.3 Discussion and Exercises . 296 Bibliography 301 Index 309 vii Acknowledgments I am grateful to Nima Hoda, who spent a summer tirelessly proofread- ing many of the chapters in this book; to the students in the Fall 2011 offering of COMP2402/2002, who put up with the first draft of this book and spotted many typographic, grammatical, and factual errors; and to Morgan Tunzelmann at Athabasca University Press, for patiently editing several near-final drafts. ix Why This Book? There are plenty of books that teach introductory data structures. Some of them are very good. Most of them cost money, and the vast majority of computer science undergraduate students will shell out at least some cash on a data structures book. Several free data structures books are available online. Some are very good, but most of them are getting old. The majority of these books be- came free when their authors and/or publishers decided to stop updat- ing them. Updating these books is usually not possible, for two reasons: (1) The copyright belongs to the author and/or publisher, either of whom may not allow it. (2) The source code for these books is often not avail- able. That is, the Word, WordPerfect, FrameMaker, or LATEX source for the book is not available, and even the version of the software that han- dles this source may not be available. The goal of this project is to free undergraduate computer science stu- dents from having to pay for an introductory data structures book. I have decided to implement this goal by treating this book like an Open Source software project. The LATEX source, pseudocode source, and build scripts for the book are available to download from the author’s website1 and also, more importantly, on a reliable source code management site.2 The source code available there is released under a Creative Commons Attribution license, meaning that anyone is free to share: to copy, dis- tribute and transmit the work; and to remix: to adapt the work, including the right to make commercial use of the work. The only condition on these rights is attribution: you must acknowledge that the derived work contains code and/or text from opendatastructures.org. 1http://opendatastructures.org 2https://github.com/patmorin/ods xi Why This Book? Anyone can contribute corrections/fixes using the git source-code management system. Anyone can also fork the book’s sources to develop a separate version (for example, in another programming language). My hope is that, by doing things this way, this book will continue to be a use- ful textbook long after my interest in the project, or my pulse, (whichever comes first) has waned. xii Chapter 1 Introduction Every computer science curriculum in the world includes a course on data structures and algorithms. Data structures are that important; they im- prove our quality of life and even save lives on a regular basis. Many multi-million and several multi-billion dollar companies have been built around data structures. How can this be? If we stop to think about it, we realize that we inter- act with data structures constantly. • Open a file: File system data structures are used to locate the parts of that file on disk so they can be retrieved. This isn’t easy; disks contain hundreds of millions of blocks. The contents of your file could be stored on any one of them. • Look up a contact on your phone: A data structure is used to look up a phone number in your contact list based on partial information even before you finish dialing/typing. This isn’t easy; your phone may contain information about a lot of people—everyone you have ever contacted via phone or email—and your phone doesn’t have a very fast processor or a lot of memory. • Log in to your favourite social network: The network servers use your login information to look up your account information.