On Design and Applications of Practical Concurrent Data Structures
Total Page:16
File Type:pdf, Size:1020Kb
THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY On Design and Applications of Practical Concurrent Data Structures IVAN WALULYA Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2018 On Design and Applications of Practical Concurrent Data Structures IVAN WALULYA Copyright © 2018 Ivan Walulya except where otherwise stated. All rights reserved. ISBN 978-91-7597-815-4 Doktorsavhandlingar vid Chalmers tekniska hogskola,¨ Ny serie nr 4496. ISSN 0346-718X Technical report 164D Department of Computer Science and Engineering Chalmers University of Technology SE-412 96 Gothenburg, Sweden Phone: +46 (0)31-772 10 00 Author e-mail: [email protected] This thesis has been prepared using LATEX. Printed by Chalmers Reproservice, Gothenburg, Sweden 2018. ii On Design and Applications of Practical Concurrent Data Struc- tures Ivan Walulya Department of Computer Science and Engineering, Chalmers University of Technology ABSTRACT The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that pro- grammers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing. In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks. We present a lock-free vector design that efficiently addresses synchronization bottlenecks by utilizing the combining technique. Typical combining techniques are blocking. Our design introduces combining without sacrificing non-blocking progress guarantees. We extend the vector to present a concurrent lock-free unbounded binary heap that implements a priority queue with mutable priorities. In the second part of the thesis, we shift our focus to concurrent search data structures. In order to offer strong progress guarantee, typical implementations of non-blocking search data structures employ a “helping” mechanism. However, helping may result in performance degradation. We propose help-optimality, which expresses optimization in amortized step complexity of concurrent opera- tions. To describe the concept, we revisit the lock-free designs of a linked-list and a binary search tree and present improved algorithms. We design the algo- rithms without using any language/platform specific constructs; we do not use bit-stealing or runtime type introspection of objects. Thus, our algorithms are portable. We further delve into multi-dimensional data and similarity search. We present the first lock-free multi-dimensional data structure and linearizable nearest neighbor search algorithm. Our algorithm for nearest neighbor search is generic and can be adapted to other data structures. In the last part of the thesis, we explore the utilization of concurrent data structures for deterministic stream processing. We propose solutions to two iv challenges prevalent in data stream processing: (1) efficient processing on cloud as well as edge devices and (2) deterministic data-parallel processing at high- -throughput and low-latency. As a first step, we present a methodology for customization of streaming aggregation on low-power multicore embedded plat- forms. Then we introduce Viper, a communication module that can be integrated into stream processing engines for the coordination of threads analyzing data in parallel. Keywords: atomicity, combining, concurrent data structures, lock-free, locking, multicore, non-blocking, synchronization, stream processing List of Publications Appended publications 1. Ivan Walulya and Philippas Tsigas, “Scalable lock-free vector with com- bining,” in the Proceedings of the 31st International Parallel and Dis- tributed Processing Symposium, pp. 917–926, IEEE 2017. 2. Ivan Walulya, Bapi Chatterjee, Ajoy K. Datta, Rashmi Niyoliya, and Philippas Tsigas,“Concurrent lock-free unbounded priority queue with mutable priorities,” in the Proceedings of the 20th International Sympo- sium on Stabilization, Safety, and Security of Distributed Systems, LNCS, Springer 2018. 3. Bapi Chatterjee, Ivan Walulya and Philippas Tsigas, “Help-optimal and languageportable lock-free concurrent data structures,” in the Proceedings of the 45th International Conference on Parallel Processing, pp. 360–369, IEEE 2016. 4. Bapi Chatterjee, Ivan Walulya, and Philippas Tsigas, “Concurrent lin- earizable nearest neighbour search in lockfree-kd-tree,” in the Proceedings of the 19th International Conference on Distributed Computing and Net- working, pp. 11:1–11:10, ACM 2018. 5. Lazaros Papadopoulos, Dimitrios Soudris, Ivan Walulya, and Philippas Tsigas, “Customization methodology for implementation of streaming aggregation in embedded systems,” Journal of Systems Architecture - Embedded Systems Design, vol. 66-67, pp. 48–60, Elsevier 2016. 6. Ivan Walulya, Dimitris Palyvos-Giannas, Yiannis Nikolakopoulos, Vin- cenzo Gulisano, Marina Papatriantafilou, and Philippas Tsigas, “Viper: A module for communication-layer determinism and scaling in low-latency stream processing,” Future Generation Computer Systems, vol. 88, pp. 297–308, Elsevier 2018. v vi Other publications The following articles were also published during my PhD studies, but not included in this thesis. A. Ivan Walulya, Yiannis Nikolakopoulos, Vincenzo Gulisano, Marina Pap- atriantafilou, and Philippas Tsigas, “Viper: Communication-layer deter- minism and scaling in low-latency stream processing,” in Euro-Par 2017: Parallel Processing Workshops, vol. 10659, pp. 129–140, LNCS, Springer 2018. B. Lazaros Papadopoulos, Ivan Walulya, Philippas Tsigas, and Dimitrios Soudris, “A systematic methodology for optimization of applications utilizing concurrent data structures,” IEEE Transactions on Computers, vol. 65, no. 7, pp. 2019–2031, IEEE 2016. C. Lazaros Papadopoulos, Ivan Walulya, Paul Renaud-Goud, Philippas Tsi- gas, Dimitrios Soudris, and Brendan Barry, “Performance and power consumption evaluation of concurrent queue implementations in embed- ded systems,” Computer Science - Research and Development, vol. 30, no. 2, pp. 165–175, Springer 2015. D. Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Pa- patriantafilou, and Philippas Tsigas, “Deterministic real-time analytics of geospatial data streams through scalegate objects,” in the Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp. 316–317, ACM 2015. E. Ivan Walulya, Yiannis Nikolakopoulos, Marina, and Philippas Tsigas, “Concurrent data structures in architectures with limited shared memory support,” in Euro-Par 2014: Parallel Processing Workshops, vol. 8805, pp. 189–200, LNCS, Springer 2014. F. Lazaros Papadopoulos, Ivan Walulya, Philippas Tsigas, Dimitrios Soudris, and Brendan Barry, “Evaluation of message passing synchronization algo- rithms in embedded systems,” in the Proceedings of the 14th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 282–289, IEEE 2014. Research Contribution Paper 1 was authored in collaboration with Philippas Tsigas. I contributed to the design and implementation of the presented algorithms. Additionally, I participated in the writing of the paper. Paper 2 builds on previous work by Ajoy K. Datta and Rashmi Niyoliya extending the work in Paper 1 to build a concurrent unbounded binary heap. In this paper, I contributed to the design of the presented algorithms, implementations, and authoring of paper. In Paper 3, I contributed to the design of the algorithms, proof sketches, and developed the C/C++ implementations presented in the paper. Additionally, I developed the benchmark suite utilized for all results presented in the paper. My contributions to Paper 4 include participation in the design of Nearest Neighbor Search (NNS) algorithm on the concurrent KD-Tree, proof of cor- rectness of the NNS algorithm. Implementation of the designed algorithms and benchmarks used in the evaluation of the algorithm in addition to co-authoring. Paper 5 is an extension of joint work with Lazaros Papadopoulos in Pa- pers B, C, F; together, the works were developed for exploiting parallelism available in low-power embedded systems. In this work, my contributions were on the design of concurrent data structures and algorithms ported onto the em- bedded systems. Additionally, I participated in the writing of the papers, while Papadopoulos performed the bulk of the experiments presented in the papers. In Paper 6, I integrated ScaleGate (a novel interface for the deterministic merging of multiple data streams) into Apache Storm Stream Processing Engine, and extended ScaleGate to include flow-control thus making it usable in a task-based scheduler as opposed to a thread-based scheduler. Additionally, I implemented and performed benchmarks for throughput, latency, and energy measurements on general purpose processors. Benchmarks on Odroid devices were implemented in collaboration with Dimitris Palyvos-Giannas. Additionally, I was