Download the Result from the Cluster
Total Page:16
File Type:pdf, Size:1020Kb
UC Riverside UC Riverside Electronic Theses and Dissertations Title Interval Joins for Big Data Permalink https://escholarship.org/uc/item/7xb001cz Author Carman, Jr., Eldon Preston Publication Date 2020 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA RIVERSIDE Interval Joins for Big Data A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Eldon Preston Carman, Jr. September 2020 Dissertation Committee: Dr. Vassilis J. Tsotras, Chairperson Dr. Michael J. Carey Dr. Ahmed Eldawy Dr. Vagelis Hristidis Dr. Eamonn Keogh Copyright by Eldon Preston Carman, Jr. 2020 The Dissertation of Eldon Preston Carman, Jr. is approved: Committee Chairperson University of California, Riverside Acknowledgments I am grateful to my advisor, Professor Vassilis Tsotras, without whose support, I would not have been able to complete this journey. Thank you for your continued encour- agement and guidance through this learning experience. I especially would like to thank my dissertation committee members, Dr. Ahmed Eldawy, Dr. Vagelis Hristidis, Dr. Eamonn Keogh, and Dr. Michael Carey, for reviewing my dissertation. I also like to thank Dr. Rajiv Gupta for his support in the beginning of this dissertation. I would like to thank everyone in AsterixDB’s team for their support. Especially Professor Michael Carey for his guidance in understanding the AsterixDB stack and managing the research in an active software product. Thank you to my lab partners, Ildar Absalyamov and Steven Jacobs, for all the lively discussions on our regular drives out to Irvine. Apache VXQuery had a strong base created by Vinayak Borkar and Till Westmann which made Chapter 7 possible and their continued feedback and guidance on how to develop Apache VXQuery. I would also like to acknowledge the other members of the AsterixDB team: Young-Seok Kim, Taewoo Kim, Jianfeng Jia, Pouria Pirzadeh, Abdullah Alamoudi, Murtadha Hubail, Yingyi Bu, Ali Alsuliman, Raman Grover, Sattam Alsubaiee, Keren-Audrey Ouaknine, and Dmitry Lychagin. Thanks to the rest of the UCR Database lab: Moloud Shahbazi, Jarod Wen, Shi- wen Cheng, Mohiuddin Qader, Christina Pavlopoulou, James Feng, and Elena Strzheletska. Thank you to Susan Gardner and James Foster for their editorial feedback. The text of this dissertation, in part, is a reprint of the material as it appears in IEEE Big Data 2015. The co-author Vassilis Tsotras listed in that publication directed and supervised the research which forms the basis for this dissertation. iv This research was partially supported by NSF grants: IIS-0910859, IIS-1447826 and CNS-1305253. Any Opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect those of the National Science Foundation. v To my wife and family for all their support. In the words of my daughter Cassia, “I love you so much!” vi ABSTRACT OF THE DISSERTATION Interval Joins for Big Data by Eldon Preston Carman, Jr. Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, September 2020 Dr. Vassilis J. Tsotras, Chairperson The main part of this dissertation considers how to scale interval join queries. To provide scalable query processing for such joins, we adapted five recently published overlapping interval join algorithms and modified them to work in a shared-nothing big data management system (AsterixDB) under a memory budget. We developed a cost model for each algorithm to predict when an algorithm will spill to disk (run out of memory). Our experimental evaluation shows that the cost models are accurate and can be used to pick the most efficient algorithm for the given input data. The adapted interval join algorithms are shown to scale for large datasets using both synthetic and real datasets. Finally, we further adapt these algorithms to support several new types of interval joins, specifically overlap and contains, as defined by Allen’s interval algebra. We detail how to abstract the memory management from these algorithms. As a by-product we also implemented a scalable parallel processor, namely Apache VXQuery, that extends a stack consisting of Hyracks, a parallel execution engine, and Algebricks, a language-agnostic compiler toolbox. VXQuery provides an implementation of vii the XQuery specifics (data model, data-model dependent functions and optimizations, anda parser). We describe the architecture of Apache VXQuery, its integration with Hyracks and Algebricks, and the XQuery optimization rules applied to the query plan to improve path expression efficiency and to enable query parallelism. An experimental evaluation usinga real 500GB dataset with various XML selection, aggregation, and join queries shows that Apache VXQuery performs well both in terms of scale-up and speed-up. viii Contents List of Figures xi List of Tables xvii 1 Introduction 1 2 Background: The AsterixDB Software Stack 5 2.1 Hyracks . 6 2.2 Algebricks . 8 2.3 AsterixDB Runtime(s) . 11 2.4 Partitioning for Parallel Join Query Plans . 13 3 Scaling Overlapping Interval Joins 17 3.1 Introduction . 17 3.2 Extending AsterixDB for More Efficient Interval Joins . 19 3.3 Overlapping Interval Join Algorithms . 27 3.3.1 Sort-Merge Interval Join . 27 3.3.2 Time-Sweep Interval Join . 30 3.3.3 Forward-Scan Interval Join . 34 3.3.4 Overlap Interval Partition Join . 36 3.3.5 Disjoint Interval Partitioning Join . 40 3.4 Performance . 44 3.4.1 Speed-Up Non-Spilling Experiments . 48 3.4.2 Scale-Up Non-Spilling Experiments . 55 3.4.3 Scale-Up Spilling Experiments . 58 3.5 Conclusions . 62 4 Cost Models for Overlapping Interval Joins 64 4.1 Introduction . 64 4.2 Main Memory Utilization . 67 4.3 Join Size Estimation . 68 4.4 Cost Models . 71 ix 4.4.1 Sort-Merge Interval Join Model . 71 4.4.2 Time-Sweep Interval Join Model . 82 4.4.3 Forward-Scan Interval Join Model . 89 4.4.4 Overlap Interval Partition Join Model . 98 4.4.5 Disjoint Interval Partitioning Model . 107 4.4.6 Expanding the Models for Complex Datasets . 112 4.5 Model Accuracy . 116 4.5.1 Simple Dataset Self-Join . 116 4.5.2 Complex Dataset Self-Join . 119 4.6 Predicting Spilling . 121 4.7 Conclusions . 123 5 Lessons Learned From Implementing the Overlapping Interval Join Algo- rithms 125 6 Scaling Joins Involving Allen’s Interval Algebra 131 6.1 Introduction . 131 6.2 Covers and Overlaps Interval Join Algorithms . 134 6.2.1 Base Case: Sort-Merge Interval Join . 136 6.2.2 Modifying the Other Join Algorithms . 139 6.3 Performance . 140 6.3.1 Speed-Up Non-Spilling Experiments . 141 6.3.2 Scale-Up Non-Spilling Experiments . 145 6.3.3 Scale-Up Spilling Experiments . 149 6.4 Conclusions . 150 7 A Scalable Parallel XQuery Processor 152 7.1 Introduction . 152 7.2 Related Work . 154 7.3 Apache VXQuery’s Stack . 156 7.4 Rewrite Rules . 161 7.4.1 Path Expression Rewrite Rules . 161 7.4.2 Parallel Rewrite Rules . 167 7.5 Apache VXQuery Performance . 174 7.5.1 Weather Data . 175 7.5.2 Queries . 175 7.5.3 Experimental Results . 176 7.6 Conclusions . 182 8 Conclusions 184 Bibliography 187 x List of Figures 2.1 The layers of the Asterix architecture. 6 2.2 The components of the AsterixDB system architecture. 7 2.3 The components of the Algebricks architecture. 9 2.4 Global partitioning of a dataset in AsterixDB for interval join. 12 2.5 Hash partitioning applies a partition function to the data and re-partitions based on the hash results. 15 2.6 Broadcast partitioning sends each partition to all other partitions to cre- ate the full dataset in each partitions. The figure shows how the resulting partitions grows to hold the whole dataset. 15 2.7 Range partitioning creates partitions where each partition holds a specific range of join key values. 16 3.1 The initial query plan for Interval joins in AsterixDB. 20 3.2 PROJECT partitioning is the same as range partitioning with the interval start point as the key. 22 3.3 SPLIT partitioning is similar to range partitioning, but includes additional values that overlap multiple partitions. 22 3.4 REPLICATE partitioning is similar to range partitioning, but partially broad- casts each interval to all partitions with its start point and subsequent par- titions. 22 3.5 A sample join for Datasets R and S (where R and S have the same data) each with two partitions created using global SPLIT partitioning. 25 3.6 Updated query plan for AsterixDB interval join ordered partitioning and local sorting with SPLIT partitioning. 26 3.7 Sort-merge interval join actions for the in-memory mode. A) move a tuple from the stream of S to memory; B) move a tuple from the stream of R to memory; D) move a tuple from the spill file of R to memory; E) remove an active tuple in memory from S; and F) put the merged the tuples from R and S in to the result stream. 28 xi 3.8 Sort-merge interval join actions for the spilling mode. B) move a tuple from the stream of R to memory; C) copy the R tuple from memory to the spill file for R; D) move a tuple from the spill file of R to memory; E) remove an active tuple in memory from S; and F) put the merged the tuples from R and S into the result stream. 29 3.9 Time-sweep interval join actions for the in-memory mode. A) move a tuple from the stream of S to memory; B) move a tuple from the stream of R to memory; D) move a tuple from the spill file of R to memory; E) remove an active tuple in memory from S; F) remove an active tuple in memory from R; and G) put the merged the tuples from R and S into the result stream.