Handout Data Structures

Handout Data Structures Contents 1 Basic Data Structures 1 1.1 Abstract Data Structures . .1 1.2 C++ programming . .6 1.3 Trees and their Representations . 11 2 Tree Traversal 18 2.1 Recursion . 18 2.2 Euler traversal . 19 2.3 Using a Stack . 20 2.4 Using Link-Inversion . 23 2.5 Using Inorder Threads . 25 3 Binary Search Trees 31 3.1 Representing sets . 31 3.2 Augmented trees . 33 3.3 Comparing trees . 35 4 Balancing Binary Trees 42 4.1 Tree rotation . 42 4.2 AVL Trees . 43 4.3 Adding a Key to an AVL Tree . 45 4.4 Deletion in an AVL Tree . 49 4.5 Self-Organizing Trees . 52 4.6 Splay Trees . 53 5 Priority Queues 57 5.1 ADT Priority Queue . 57 5.2 Binary Heaps . 59 5.3 Leftist heaps . 63 5.4 Pairing Heap . 67 5.5 Double-ended Priority Queues . 67 6 B-Trees 72 6.1 B-Trees . 72 6.2 Deleting Keys . 75 6.3 Red-Black Trees . 77 7 Graphs 84 7.1 Representation . 84 7.2 Graph traversal . 85 7.3 Disjoint Sets, ADT Union-Find . 89 7.4 Minimal Spanning Trees . 93 7.5 Shortest Paths . 96 7.6 Topological Sort . 101 8 Hash Tables 105 8.1 Perfect Hash Function . 105 8.2 Open Addressing . 106 8.3 Chaining . 111 8.4 Choosing a hash function . 112 9 Data Compression 116 9.1 Huffman Coding . 116 9.2 Ziv-Lempel-Welch . 120 9.3 Burrows-Wheeler . 123 10 Pattern Matching 126 10.1 Knuth-Morris-Pratt . 126 10.2 Aho-Corasick . 131 10.3 Comparing texts . 132 10.4 Other methods . 140 Standard Reference Works 144 HJH, Leiden, September 13, 2020 ii WERK IN UITVOERING. De binary heap heeft uitleg gekregen (dit onderwerp werd uit de lesstof van algoritmiek geschrapt). CHECK. Graafalgoritmen, veelal algoritmiek, Geometrical data structure [doen we weinig aan]. TODO. Meer uitleg: Topsort is een toepassing van DFS; DFS bezoektijden en edge typen DFS-tree. Algoritmiek voorjaar 2019: In afwijking van de tentamenstof van eerdere jaren hebben we dit jaar: – wel topologisch sorteren, het algoritme van Prim en de implementatie van het algoritme van Kruskal behandeld, – niet de heap en heapsort behandeld. iii 1 Basic Data Structures Often the notation nil is used to indicate a null-pointer. In pictures the symbol L might be used, but in many case we just draw a dot without outgoing arcs. In C++ originally the value 0 or the macro NULL was used as null-pointer (inherited from C), but in modern C++11 an explicit constant nullptr is provided to avoid unwanted conversions in the context of function overloading. 1.1 Abstract Data Structures Definition 1.1. An abstract data structure (ADT) is a specification of the values stored in the data structure as well as a description (and signatures) of the operations that can be performed. Thus an ADT is fixed by its domain which describes the values stored, the data elements and their structure (like set or linear list), and the operations performed on the data. In present day C++ the most common abstract data structures are implemented as template containers with specific member functions. In general if we want to code a given ADT, the specification fixes the ‘interface’, i.e., the functions and their types, the representation fixes the constructs used to store the data (arrays, pointers), while implementation means writing the actual code for the functions [picture after Stubbs and Webre 1985]. elements structure domain operations data structure specification representation implementation Stack. The stack keeps a linear sequence of objects. This sequence can be ac- cessed from one side only, which is called the top of the stack. The behaviour is called LIFO for last-in-first-out, as the last element added is the first that can be removed. The ADT STACK has the following operations: 1 •I NITIALIZE: void ! stack<T>. construct an empty sequence (). •I SEMPTY: void ! Boolean. check whether there the stack is empty, i.e., contains no elements). •S IZE: void ! Integer. return the number n of elements, the length of the sequence (x1;:::;xn). •T OP: void ! T. returns the top xn of the list (x1;:::;xn). Undefined on the empty sequence. •P USH(x): T ! void. add the given element x to the top of the sequence (x1;:::;xn), so afterwards the sequence is (x1;:::;xn;x). •P OP: void ! void. removes the topmost xn element of the sequence (x1;:::;xn), so afterwards the sequence is (x1;:::;xn−1). Undefined on the empty sequence. Stacks are implemented using an array or a (singly) linked list. In the array the topmost symbol is stored as an index to the array, whereas in the linked list there is a pointer to the topmost element. There is no need to maintain a pointer to the bottom element. top x x x x x x x top L x1 x2 xn Specification. There are several possible notations for pushing x to the stack S. In classical imperative programming style we write push(S,x), where the stack S is changed as result of the operation. In object oriented style we tend to write S.push(x). In functional style we also may write push(S,x), but here S itself is not changed, instead push(S,x) denotes the resulting stack. We have to make a distiction between the concept of the stack versus its instantiation in our preferred programming language. Languages have been developed to specify the syntax and semantics of ADT’s. One example is the VDM. Here the specification for the stack. VDM Vienna Development Method Stack of N 2 Function signatures: init: ! Stack push: N × Stack ! Stack top: Stack ! (N [ ERROR ) pop: Stack ! Stack isempty: Stack ! Boolean Semantics: top(init()) = ERROR top(push(i,s)) = i pop(init()) = init() pop(push(i, s)) = s isempty(init()) = true isempty(push(i, s)) = false Queue. The queue keeps a linear sequence of objects. Elements can be retrieved from one end (front), and added to the other end (back). The behaviour is called FIFO for first-in-first-out, as the elements are removed in the same order as they have been added. The ADT QUEUE has the following operations: •I NITIALIZE: construct an empty sequence (). •I SEMPTY: check whether there the queue is empty, i.e., contains no elements). •S IZE: return the number n of elements, the length of the sequence (x1;:::;xn). •F RONT: returns the first element x1 of the sequence (x1;:::;xn). Undefined on the empty sequence. •E NQUEUE(x): add the given element x to the end/back of the sequence (x1;:::;xn), so afterwards the sequence is (x1;:::;xn;x). •D EQUEUE: removes the first element of the sequence (x1;:::;xn), so afterwards the sequence is (x2;:::;xn). Undefined on the empty sequence. Queues are implemented using a circular array or a (singly) linked list. In a circular array we keep track of the first and last elements of the queue within the array. The last element of the array is considered to be followed by the first array element when used by the queue; hence circular. Be careful: it is hard to distinguish a ‘full’ queue from an empty one. 3 linear list deque ‘deck’ stack stapel LIFO queue rij !FIFO! Figure 1: Hierarchy of linear lists. front back back front x x x x x x x x x x x x x first last last first first last x1 x2 xn L back front A Deque (‘deck’) or double-ended-queue is a linear sequence where only operations are allowed on either end of the list, thus generalizing both stack and queue. Usually implemented as circular array or doubly linked list (see also below). There is no common naming convention for the operations in various programming languages. insert remove inspect back front back front back front C++ push_back push_front pop_back pop_front back front Perl push unshift pop shift [-1] [0] Python append appendleft pop popleft [-1] [0] List. A Linear List is an ADT that stores (linear) sequences of data elements. The operations include Initialization, EmptyTest, Retrieval, Change and Deletion of the value stored at a certain position, Addition of a new value "in between" two existing values, as well as before the first or after the last position. 4 at position inspect insert delete change × first position [6] last Hence, in order to specify the linear list we also require the notion of a position in the list, to indicate where operations should take place. We also need instructions that move the position one forward or backward if we want to be able to traverse the list for inspection. (Such a traversal is not considered necessary for stacks and queues, where we can only inspect the topmost/front element.) It is rather tedious to give a good description of a general list and its operations. We could start with the mathematical concept of a tuple and consider a list (x1;x2;:::;xn), with positions 1;2;:::;n. If we now add a new value at the start of the list it means all other values also have changed position. Not a natural choice for a programmer who might like to see positions as pointers. The most common implementation of the linear list is a doubly linked list, with pointers to the first and last element of the list, and for each element a pointer to its predecessor and successor (if it exists). Sometimes it is more convenient to treat the first and last pointers as special elements of the list.

Load more