Facilitating High Performance Code Parallelization

Syracuse University SURFACE Dissertations - ALL SURFACE 5-14-2017 Facilitating High Performance Code Parallelization Maria Abi Saad Syracuse University Follow this and additional works at: https://surface.syr.edu/etd Part of the Engineering Commons Recommended Citation Abi Saad, Maria, "Facilitating High Performance Code Parallelization" (2017). Dissertations - ALL. 712. https://surface.syr.edu/etd/712 This Dissertation is brought to you for free and open access by the SURFACE at SURFACE. It has been accepted for inclusion in Dissertations - ALL by an authorized administrator of SURFACE. For more information, please contact [email protected]. Abstract With the surge of social media on one hand and the ease of obtaining information due to cheap sensing devices and open source APIs on the other hand, the amount of data that can be processed is as well vastly increasing. In addition, the world of computing has recently been witnessing a growing shift towards massively parallel distributed systems due to the increasing importance of transforming data into knowledge in today’s data-driven world. At the core of data analysis for all sorts of applications lies pattern matching. Therefore, parallelizing pattern matching algorithms should be made efficient in order to cater to this ever-increasing abundance of data. We propose a method that automatically detects a user’s single threaded function call to search for a pattern using Java’s standard regular expression library, and replaces it with our own data parallel implementation using Java bytecode injection. Our approach facilitates parallel processing on different platforms consisting of shared memory systems (using multithreading and NVIDIA GPUs) and distributed systems (using MPI and Hadoop). The major contributions of our implementation consist of reducing the execution time while at the same time being transparent to the user. In addition to that, and in the same spirit of facilitating high performance code parallelization, we present a tool that automatically generates Spark Java code from minimal user-supplied inputs. Spark has emerged as the tool of choice for efficient big data analysis. However, users still have to learn the complicated Spark API in order to write even a simple application. Our tool is easy to use, interactive and offers Spark’s native Java API performance. To the best of our knowledge and until the time of this writing, such a tool has not been yet implemented. Facilitating High Performance Code Parallelization by Maria Abi Saad B.S. Lebanese American University, 2006 M.S. Lebanese American University, 2008 Dissertation Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Syracuse University May 2017 Copyright © Maria Abi Saad 2017 All Rights Reserved Acknowledgements Firstly, I would like to express my deepest appreciation to my advisor Professor C.Y. Roger Chen for his continuous support and advice. I am very grateful to have been the student of such a knowledgeable, enthusiastic, and ambitious visionary. Without his guidance and encouragement over the years, this dissertation would not have been possible. Besides my advisor, my sincere thanks go to my dissertation committee members for taking the time to review my work and for their valuable insights, comments and suggestions. They are and in alphabetical order Prof. Ehat Ercanli, Prof. Erin Mackie, Prof. Qinru Qiu, Prof. Yanzhi Wang, and Prof. Edmond Yu. I would also like to thank my colleagues at the Design and Modeling Lab who have accompanied me at different intervals of my research, and who have offered valuable suggestions and stimulating discussions. In particular, I am grateful to Elie for his support, his encouragement, his sarcasm and his friendship. This would have been a much harder wave to ride without him. I am also grateful to my friends who always remind me that there is more to life than work although at times the lines get blurred. I would like to specifically thank all the wonderful people whom I have met during my time at Syracuse. You have made the snowy winters much easier to bear. Sigue bailando! Lastly, I would like to thank my parents, my sister, and my brother who have supported me throughout this journey and have always believed in me. I am grateful for their understanding, their patience, and mostly for their love. This is dedicated to them. iv Table of Contents Chapter 1: Introduction ............................................................................................................... 1 1.1 Parallel Computing ........................................................................................................... 1 1.2 Proposal ............................................................................................................................ 2 1.3 Objectives ......................................................................................................................... 4 1.4 Development Process ....................................................................................................... 5 1.5 Outline .............................................................................................................................. 5 Chapter 2: Automatic Parallel Matching using Java Bytecode Injection ................................... 7 2.1 Introduction ...................................................................................................................... 7 2.2 Pattern Matching .............................................................................................................. 8 2.3 Motivation ...................................................................................................................... 12 2.4 Literature Survey ............................................................................................................ 14 2.5 Java Bytecode and ASM ................................................................................................ 16 2.6 Implementation............................................................................................................... 20 2.7 Experimental Results and Interpretations ...................................................................... 26 2.8 Conclusion ...................................................................................................................... 28 Chapter 3: Parallel Pattern Matching on GPUs ........................................................................ 29 3.1 Introduction .................................................................................................................... 29 3.2 Introduction to GPUs ..................................................................................................... 30 3.3 Literature Survey ............................................................................................................ 35 3.4 Writing Java for GPUs ................................................................................................... 38 3.5 Mapping Approach ......................................................................................................... 44 3.6 Performing Pattern Matching on the GPU ..................................................................... 48 3.7 Experimental Results and Interpretations ...................................................................... 51 3.8 Conclusion ...................................................................................................................... 55 Chapter 4: Parallel Pattern Matching using MPI ...................................................................... 56 4.1 Introduction .................................................................................................................... 56 4.2 Introduction to the Message Passing Interface (MPI) .................................................... 59 v 4.3 Literature Survey ............................................................................................................ 64 4.4 Implementation............................................................................................................... 67 4.5 Experimental Results and Interpretations ...................................................................... 77 4.6 Introduction to Hadoop .................................................................................................. 84 4.7 Hadoop Comparison ....................................................................................................... 87 4.8 Conclusion ...................................................................................................................... 90 Chapter 5: Generating Spark Java Code ................................................................................... 91 5.1 Introduction .................................................................................................................... 91 5.2 Related Work.................................................................................................................. 93 5.3 Spark............................................................................................................................... 95 5.4 Machine Learning .......................................................................................................... 99 5.5 Implementation............................................................................................................. 100 5.6 Graphical User Interface .............................................................................................

Load more