On the Implementation of Purely Functional Data Structures for the Linearisation Case of Dynamic Trees

On the Implementation of Purely Functional Data Structures for the Linearisation case of Dynamic Trees By: Juan Carlos Sáenz-Carrasco A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Engineering Department of Computer Science October 2019 Abstract Dynamic trees, originally described by Sleator and Tarjan, have been studied in detail for non persistent structures providing O(log n) time for update and lookup operations as shown in theory and practice by Werneck. However, there are two gaps in current theory. First, how the most common dynamic tree operations (link and cut) are computed over a purely functional data structure has not been studied in detail. Second, even in the imperative case, when checking whether two vertices u and v are connected (i.e. in the same component), it is taken for granted that the corresponding location indices (i.e. pointers, which are not allowed in purely functional programming) are known a priori and do not need to be computed, yet this is rarely the case in practice. In this thesis we address these omissions by formally introducing two new data structures, Full and Top, which we use to represent trees in a functionally efficient manner. Based on a primitive version of finger trees – the de facto sequence data structure for the purely lazy-evaluation programming language Haskell – they are augmented with collection (i.e. set-based) data structures in order to manage efficiently k-ary trees for the so-called linearisation case of the dynamic trees problem. Different implementations are discussed, and their performance is measured. Our results suggest that relative timings for our proposed structures per- form sublinear time per operation once the forest is generated. Furthermore, Full and Top implementations show simplicity and preserve purity under a common interface. 1 Dedication To my wife Tonita, for the unconditional support, fed with pa- tience and love (and a teaspoon of sugar). To my kids Sarah and Juan Pablo, for understanding my absence from home. And special thanks to Mum and Dad who unfortunately passed away during my studies. 2 Acknowledgements The present work could not be achievable without the support and encouragement from Mike Stannett. Equally important, I appre- ciate the advise from Georg Struth to pursue the goals of the high demands in the research community. I definitely enjoyed the PhD project thanks to my colleagues at the Verification and Validation Lab and for sharing the lunch time with my colleagues at the Algorithms Group. My studies would not have been possible without the funding from the Consejo Nacional de Ciencia y Tecnología, CONACYT (Mex- ican National Council for Science and Technology) under grant 411550, scholar 580617, CVU 214885. 3 Contents 1 Introduction 13 1.1 Problem Statement . 14 1.2 Motivation . 18 1.2.1 Applications where dynamic trees operations take place . 18 1.2.2 Dynamic trees in Functional Programming 19 1.3 Benefits from Functional Programming . 19 1.4 Source Language . 21 1.4.1 Why Haskell? . 21 1.5 Terminology . 24 1.6 Contributions . 24 1.7 Structure of this Thesis . 25 2 Related Work 27 2.1 Path Decomposition . 28 2.1.1 Purely Functional Implementation . 29 2.2 Tree Contraction . 30 2.3 Linearisation . 30 2.4 Chapter notes . 33 3 Fundamentals 34 3.1 Forest and trees nomenclature . 34 4 3.2 Input tree data structure . 36 3.3 Data.Set ...................... 37 3.4 2-3 trees . 39 3.5 Monoids . 41 3.5.1 Monoidal annotation . 42 3.6 Finger Trees . 42 3.6.1 Structure of Ft ............... 43 3.6.2 Amounts of data stored in the FT data structure . 46 3.6.3 Operations in Ft .............. 50 3.6.4 Accessing the endpoints of a Ft ...... 53 3.6.5 Inserting at the endpoints of a Ft ..... 54 3.6.6 Appending FTs................ 57 3.6.7 Searching and splitting in Ft ........ 59 4 Euler-Tour Trees Functionally, FunEtt 67 4.1 Euler-tour trees by Henzinger and King . 67 4.1.1 Representation of the input tree . 68 4.1.2 Operations on Ett-HK ........... 68 4.2 Euler-tour trees by Tarjan . 70 4.2.1 Representation of the input tree . 70 4.2.2 Operations on Ett-T ............ 70 4.3 FunEtt ....................... 71 4.3.1 Representation of the input tree . 72 4.3.2 FunEtt data structure . 72 4.3.3 Operations on FunEtt ........... 74 4.4 Chapter Notes . 84 5 Indexless data structures 85 5.1 Full dynamic trees . 85 5.1.1 Full dynamic trees data types . 86 5 5.1.2 Full dynamic trees operations . 88 5.1.3 Experimental analysis of Full dynamic trees 95 5.2 Top dynamic trees . 126 5.2.1 Top dynamic trees data types . 127 5.2.2 Top dynamic trees operations . 129 5.2.3 Top vs Full experimental results . 133 6 Conclusion 142 6.1 Further Directions . 143 6 List of Figures 1.1 Example of a forest . 15 1.2 Example of pre-link ................ 16 1.3 Example of link applied . 16 1.4 Example of pre-cut ................. 16 1.5 Example of cut applied . 16 2.1 Input tree . 31 2.2 Input tree, turning edges into directed-edges . 31 2.3 Ett is built from input tree . 31 2.4 ETT represented by a finger tree . 32 3.1 unit forest . 34 3.2 2-node forest . 35 3.3 10-node forest . 35 3.4 Input tree . 37 3.5 Balanced Search Tree or BST . 39 3.6 Leafy 2-3 tree . 39 3.7 Nodal 2-3 tree . 40 3.8 Complete 2-3 tree . 40 3.9 Finger tree definition, Ft .............. 43 3.10 Node type . 44 3.11 Digit (or affix) type . 45 3.12 Ft example . 46 3.13 Number of leaves, Empty bottom . 47 7 3.14 Number of leaves, Single bottom . 48 3.15 Number of monoidal annotations, Empty bottom . 49 3.16 Number of monoidal annotations, Single bottom . 50 3.17 search an element . 63 3.18 split operation . 64 3.19 Ordered-set, initial example . 66 3.20 Ordered-set, initial annotations . 66 3.21 Ordered-set, final example . 66 4.1 An input tree . 67 4.2 FunEtt, input tree . 73 4.3 FunEtt, initial example . 74 4.4 Input tree example for cutTree .......... 78 4.5 Trees result from cutTree ............. 80 4.6 Tree result from linkTree ............. 84 5.1 FunEtt, repeated edges . 87 5.2 FunEtt, reduced sets . 87 5.3 Input forest example . 90 5.4 Full link example . 92 5.5 Full cut example . 94 5.6 Sample 1 of plotting . 96 5.7 Sample 2 of plotting . 96 5.8 Sample 3 of plotting . 97 5.9 Plotting unit Full forest, means . 98 5.10 Plotting unit Full forest, medians . 99 5.11 Plotting 2-node Full forest, means . 100 5.12 Plotting 2-node Full forest, medians . 101 5.13 Plotting 10-node Full forest, means . 103 5.14 Plotting 10-node Full forest, medians . 103 5.15 Plotting 300-node Full forest, means . 104 8 5.16 Plotting 300-node Full forest, medians . 104 5.17 Plotting all Full forests construction . 106 5.18 Plotting connectivity 10-node forest . 108 5.19 Performance of connectedMSet over a 10-node Full forest, multiple runs. 110 5.20 Performance of connectedMSet over a 300-node Full forest. 111 5.21 Performance of connectedMSet over a 300-node Full forest, showing one of the outliers. 112 5.22 Performance of connectedMSet over a 10-node and 300-node Full forest, for different number of runs per forest. 113 5.23 Performance of link operation over a unit, 2-node, 10-node and 300-node Full forests, sampling every 10 means. 115 5.24 Performance of link operation over a unit, 2-node, 10-node and 300-node Full forests, sampling every 100 medians. 116 5.25 Performance of cut operation over a one-tree, 2- node, 10-node and 300-node Full forests, sampling every 10 means. 118 5.26 Performance of cut operation over a one-tree, 2- node, 10-node and 300-node Full forests, sampling every 100 medians. 119 5.27 Performance of link and cut operations over 10- node and 300-node Full forests, sampling every 100 medians. 121 5.28 Performance individual link-cut incrementing number of nodes . 123 9 5.29 Performance individual link-cut incrementing number of operations, 10-node forest . 124 5.30 Performance individual link-cut incrementing number of operations, 300-node forest . 125 5.31 Accumulators of a finger tree . 127 5.32 Top accumulator of a Top finger tree . 128 5.33 General view of a Top finger tree . 128 5.34 FullvsTop, forests construction, 300-node ..... 134 5.35 FullvsTop, forests construction, unit vs 2-node vs 10-node ....................... 135 5.36 FullvsTop, connectivity 10-node and 300-node forests136 5.37 FullvsTop, link in unit and 2-node forests . 137 5.38 FullvsTop, link in 10-node forests . 138 5.39 FullvsTop, link in 300-node forests . 138 5.40 FullvsTop, cut in forests . 139 5.41 FullvsTop, individual link-cut in forests where input number of nodes . 140 5.42 FullvsTop, individual link-cut in forests where input is number of operations . 141 10 List of Tables 3.1 Data.Set operations . 38 3.2 Finger tree operations . 52 3.3 Accumulation of <> example . 56 4.1 Summary Ft operations .

On the Implementation of Purely Functional Data Structures for the Linearisation Case of Dynamic Trees

Cache Oblivious Search Trees Via Binary Trees of Small Height

Lecture 04 Linear Structures Sort

Advanced Data Structures

[Type the Document Title]

Search Trees

The Random Access Zipper: Simple, Purely-Functional Sequences

Design of Data Structures for Mergeable Trees∗

Parallel and Nearest Neighbor Search for High-Dimensional Index Structure of Content- Based Data Using DVA-Tree R

Top Tree Compression of Tries∗

Bulk Updates and Cache Sensitivity in Search Trees

List of Transparencies

Funnel Heap - a Cache Oblivious Priority Queue