Index-Based Search Techniques for Visualization and Data Analysis Algorithms on Many-Core Systems
Total Page:16
File Type:pdf, Size:1020Kb
INDEX-BASED SEARCH TECHNIQUES FOR VISUALIZATION AND DATA ANALYSIS ALGORITHMS ON MANY-CORE SYSTEMS by BRENTON JOHN LESSLEY A DISSERTATION Presented to the Department of Computer and Information Science and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2019 DISSERTATION APPROVAL PAGE Student: Brenton John Lessley Title: Index-Based Search Techniques for Visualization and Data Analysis Algorithms on Many-Core Systems This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Computer and Information Science by: Hank Childs Chair Boyana Norris Core Member Christopher Wilson Core Member Eric Torrence Institutional Representative and Janet Woodruff-Borden Vice Provost and Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded June 2019 ii ⃝c 2019 Brenton John Lessley All rights reserved. iii DISSERTATION ABSTRACT Brenton John Lessley Doctor of Philosophy Department of Computer and Information Science June 2019 Title: Index-Based Search Techniques for Visualization and Data Analysis Algorithms on Many-Core Systems Sorting and hashing are canonical index-based methods to perform searching, and are often sub-routines in many visualization and analysis algorithms. With the emergence of many-core architectures, these algorithms must be rethought to exploit the increased available thread-level parallelism and data-parallelism. Data-parallel primitives (DPP) provide an efficient way to design an algorithm for scalable, platform-portable parallelism. This dissertation considers the following question: What are the best index- based search techniques for visualization and analysis algorithms on diverse many-core systems? To answer this question, we develop new DPP-based techniques, and evaluate their performance against existing techniques for data-intensive visualization and analysis algorithms across different many-core platforms. Then, we synthesize our findings into a collection of best practices and recommended usage. As a result of these efforts, we were able to conclude that our techniques demonstrate viability and leading platform- portable performance for several different search-based use cases. This dissertation is a culmination of previously-published co-authored material. iv CURRICULUM VITAE NAME OF AUTHOR: Brenton John Lessley DEGREES AWARDED: Ph.D. in Computer Science University of Oregon, 2019 M.S. in Computer Science University of Washington Tacoma, 2011 B.S. in Business Administration University of Washington Tacoma, 2009 AREAS OF SPECIAL INTEREST: Data-Parallel Algorithms Parallel Data Structures High-Performance Computing Scientific Visualization PROFESSIONAL EXPERIENCE: Graduate Research & Teaching Fellow University of Oregon, 2013 - 2019 Graduate Student Researcher Lawrence Berkeley Nat’l Lab, Summer 2017 Graduate Student Researcher Lawrence Berkeley Nat’l Lab, Summer 2016 Student Developer Google Summer of Code, Summer 2015 PUBLICATIONS: Brenton Lessley, Samuel Li, and Hank Childs (2019). HashFight: A Platform- Portable Hash Table for Multi-Core and Many-Core Architectures (in preparation). Brenton Lessley and Hank Childs (2019). Data-Parallel Hashing Techniques for GPU Architectures (in submission). v Brenton Lessley, Talita Perciano, Colleen Heinemann, David Camp, Hank Childs, and E. Wes Bethel (2018). DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives. Proceedings of the IEEE Symposium on Large Data Visualization and Analysis (LDAV), Berlin, Germany, pp. 1–11. Brenton Lessley, Kenneth Moreland, Matthew Larsen and Hank Childs (2017). Techniques for Data-Parallel Searching for Duplicate Elements. Proceedings of the IEEE Symposium on Large Data Visualization and Analysis (LDAV), Phoenix, AZ, pp. 1–5. Brenton Lessley, Talita Perciano, Manish Mathai, Hank Childs and E. Wes Bethel (2017). Maximal Clique Enumeration with Data-Parallel Primitives. Proceedings of the IEEE Symposium on Large Data Visualization and Analysis (LDAV), Phoenix, AZ, pp. 16–25. Brenton Lessley, Roba Binyahib, Robert Maynard, and Hank Childs (2016). External Facelist Calculation with Data-Parallel Primitives. Proceedings of EuroGraphics Symposium on Parallel Graphics and Visualization (EGPGV), Groningen, The Netherlands, pp. 10–20. Sergio Davalos, Altaf Merchant, Gregory Rose, Brenton Lessley, and Ankur Teredesai (2015). ’The Good Old Days’: An Examination of Nostalgia in Facebook Posts. International Journal of Human-Computer Studies, vol. 83, pp. 83–93. Daniel Lowd, Brenton Lessley, and Mino De Raj (2014). Towards Adversarial Reasoning in Statistical Relational Domains. AAAI-14 Workshop on Statistical Relational AI (Star AI 2014), Quebec City, Canada. Brenton Lessley, Matthew Alden, Daniel Bryan, and Arindam Tripathy (2012). Detection of Financial Statement Fraud Using Evolutionary Algorithms. Journal of Emerging Technologies in Accounting, vol. 9, no. 1, pp. 71–94. vi ACKNOWLEDGEMENTS The path towards completing my Ph.D. has been quite an adventure, always seeing the light at the end of the tunnel, yet always one more challenging task to complete. In short, the past six years have more or less been a mind-training game that teaches me how to stay focused on a goal and work towards it tirelessly until completion, despite countless thoughts of giving up. To reach this point took more self-motivation than I ever knew I had and even more motivation and encouragement from those around me. Throughout the Ph.D. process, my parents have provided the best support, advice, and encouragement that I could have asked for. They have been my number one cheering squad and, with great humor, make sure that I always remain optimistic, from one paper or milestone to the next. I cannot thank them enough. The same thank you goes to my very good friend, Agrima, who has patiently heard every high note and low note from me while completing this long process. I know she sacrificed a lot of fun to join in on my many work sessions or postpone a trip due to a looming milestone. I truly appreciate her patience and sense of humor, and can’t thank her enough. If it wasn’t for my advisor, Hank Childs, I likely would not have even completed the PhD degree. From the day I joined the CDUX research group and planned out a road map towards completing the PhD, Hank has helped me become a self-sufficient researcher and provided countless opportunities to hone my area of expertise. I thank him for all the guidance and support he has provided me, particularly in the past year. I am proud of all the work that we have completed and published during the four years under his advisement. vii During my time at the University of Oregon, I have made some of my best friends, who provided so many nice memories and made the hard work much more enjoyable, even if it was as simple as going to a basketball or football game after completing some goal, holding a house party, or going to Starbucks for a coding session. You know who you are—thank you, you are the best. I was fortunate to have many awesome officemates during my six years in Deschutes Hall, all of whom provided plenty of great humor, language learning, and a benchmark for working harder. I can confidently say that our whiteboards were rarely empty. To Ali, Pedram, Azaad, Soheil, Sam, Vince, Stephanie, Abhishek, Manish, Garrett, Sudhanshu, and Steve, it was a great pleasure. This thank you also extends to my fellow labmates and friends in CDUX. From the group parties and practice presentations to random office conversations, your sense of humor and friendship were awesome. While most of my time was spent at the University of Oregon, I also spent two great summers at the Lawrence Berkeley National Lab with the Data Analytics & Visualization Group. The research projects and papers I completed during this time really boosted my confidence as a researcher and led to a pipeline of exciting work. I was lucky to have Wes Bethel and Talita Perciano as my mentors, and I appreciated the discussions and collaboration with David Camp, Colleen Heinemann, Dmitriy Morozov, Sugeerth Murugesan, Dani Ushizima, Burlen Loring, and Hari Krishnan. Finally, I would like to thank Boyana Norris, Chris Wilson, and Eric Torrence for their willingness to serve on my PhD committee and provide great feedback throughout the dissertation phase. viii TABLE OF CONTENTS Chapter Page I. INTRODUCTION ................................ 1 1.1. Dissertation Outline .............................. 4 1.2. Co-Authored Material ............................. 5 I Techniques 7 II. BACKGROUND ................................. 9 2.1. Parallel Computing & Data-Parallelism .................... 9 2.2. Data Parallel Primitives ............................ 12 2.3. General-Purpose GPU Computing ...................... 15 2.3.1. SIMT Architecture .......................... 17 2.3.2. Optimal Performance Criteria .................... 19 2.4. Index-Based Searching ............................ 22 2.4.1. Searching Via Sorting ......................... 24 2.4.2. Searching Via Hashing ........................ 25 2.5. Data-Parallel Hashing Techniques ...................... 30 2.5.1. Open-addressing Probing ....................... 31 2.5.1.1. Linear Probing-based Hashing . 32 2.5.1.2. Cuckoo-based Hashing ................... 33 2.5.1.3. Double Hashing ...................... 38 2.5.1.4. Robin Hood-based Hashing .