Gossamer: a Lightweight Approach to Using Multicore Machines
Total Page:16
File Type:pdf, Size:1020Kb
Gossamer: A Lightweight Approach to Using Multicore Machines Item Type text; Electronic Dissertation Authors Roback, Joseph Anthony Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 06/10/2021 14:59:16 Link to Item http://hdl.handle.net/10150/194468 GOSSAMER: A LIGHTWEIGHT APPROACH TO USING MULTICORE MACHINES by Joseph Anthony Roback A Dissertation Submitted to the Faculty of the DEPARTMENT OF COMPUTER SCIENCE In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2010 2 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: Joseph Anthony Roback 3 ACKNOWLEDGEMENTS I cannot adequately express my gratitude to my advisor, Dr. Greg Andrews for his inspiration, support, encouragement, trust, and patience throughout my doctoral work. I have been very fortunate to have an advisor who helped me overcome many difficult situations and finish this dissertation. I am also very grateful to him for his generosity, coordination, and supervision, which made it possible for me to complete this dissertation in a distance of more than 1,000 miles (even after his official retirement). I wish Greg a long and healthy retirement with his wife Mary and their family. I would also like to thank the other members of my committee. Saumya Debray was always willing to put up with my ignorance of compiler theory and to take my random questions as I walked by his office’s open door. David Lowenthal and John Hartman also provided helpful advice and feedback. I would also like to thank the National Science Foundation. This research was supported in part by NSF Grants CNS-0410918 and CNS-0615347. A special thanks to my fellow graduate students from the department of com- puter science: Kevin Coogan, Somu Perianayagam, Igor Crk, Kyri Pavlou, and Varun Khare. They always were around for lunch or coffee when procrastination ranked higher than dissertation work. Most importantly, I'd like to thank my family, who have been very supportive throughout my life and academic career. I would especially like to thank my mother for her endless support and for encouraging me to pursue my passion in computer science. I would like to thank my grandfather for his endless teachings about life, and my grandmother, Jean-Jean, who was always there to mend my torn clothes. Last but certainly not least, a big thanks to Jess, my patient and loving soul mate whom I love dearly. 4 DEDICATION To Mom, Jean-Jean, and Gramp, for all their support and generosity. 5 TABLE OF CONTENTS LIST OF FIGURES................................7 LIST OF TABLES.................................8 ABSTRACT....................................9 CHAPTER 1 INTRODUCTION........................ 10 1.1 Current Approaches............................ 11 1.2 The Gossamer Approach and Contributions.............. 12 1.3 Outline................................... 13 CHAPTER 2 ANNOTATIONS......................... 14 2.1 Task and Recursive Parallelism..................... 14 2.2 Loop Parallelism............................. 15 2.3 Domain Decomposition.......................... 16 2.4 Synchronization.............................. 20 2.5 MapReduce................................ 21 CHAPTER 3 EXAMPLES............................ 23 3.1 Quicksort................................. 23 3.2 N-Queens................................. 23 3.3 Bzip2.................................... 25 3.4 Matrix Multiplication........................... 26 3.5 Sparse Matrix Multiplication....................... 27 3.6 Jacobi Iteration.............................. 30 3.7 Run Length Encoding.......................... 31 3.8 Word Count................................ 32 3.9 Multigrid................................. 34 3.10 Gravitational N-Body Problem..................... 34 3.11 Summary................................. 37 CHAPTER 4 TRANSLATOR AND RUN-TIME SYSTEM.......... 40 4.1 Recursive and Task Parallelism..................... 42 4.2 Loop Parallelism............................. 44 4.3 Domain Decomposition.......................... 47 4.4 Additional Synchronization....................... 50 6 TABLE OF CONTENTS { Continued 4.5 MapReduce................................ 54 4.6 Static Program Analysis and Feedback................. 55 4.7 Run-Time Program Analysis and Performance Profiling........ 57 4.8 Combining Multiple Levels of Parallelism................ 58 CHAPTER 5 PERFORMANCE......................... 60 5.1 Testing Methodology and Hardware................... 60 5.2 Application Performance on Intel Xeon E5405 (8-core)........ 61 5.3 Application Performance on SGI Altix 4700 (32-core)......... 65 5.4 Application-Independent Gossamer Overheads............. 68 5.5 Summary................................. 69 CHAPTER 6 RELATED WORK........................ 70 6.1 Thread Packages and Libraries...................... 70 6.2 Parallelizing Compilers.......................... 71 6.3 Parallel Languages............................ 72 6.4 Stream and Vector Processing Packages................. 75 6.5 Annotation Based Languages...................... 78 6.6 Performance of Gossamer versus Cilk++ and OpenMP........ 79 CHAPTER 7 CONCLUSION AND FUTURE WORK............ 82 APPENDIX A GOSSAMER ANNOTATIONS................. 85 A.1 Task Parallelism.............................. 85 A.2 Loop Parallelism............................. 85 A.3 Domain Decomposition.......................... 86 A.4 Synchronization.............................. 86 A.5 MapReduce................................ 87 APPENDIX B GOSSAMER MANUAL..................... 88 B.1 Installing Gossamer............................ 88 B.2 Compiling and Running Gossamer Programs.............. 88 B.3 Debugging and Profiling Gossamer Programs.............. 91 REFERENCES................................... 94 7 LIST OF FIGURES 3.1 Example: Quicksort............................ 24 3.2 Example: N-Queens Problem...................... 24 3.3 Example: Bzip2.............................. 26 3.4 Example: Matrix Multiplication..................... 27 3.5 Example: Sparse Matrix Multiplication................. 28 3.6 Example: Jacobi Iteration........................ 29 3.7 Example: Run Length Encoding..................... 31 3.8 Example: Word Count.......................... 33 3.9 Example: Multigrid............................ 35 3.10 Example: N-Body............................. 36 4.1 Gossamer Overview............................ 41 4.2 Two Dimensional divide ......................... 48 4.3 MapReduce Hash Table......................... 53 6.1 Matrix Multiplication Using Only One Filament Per Server Thread. 81 8 LIST OF TABLES 2.1 Gossamer Annotations.......................... 14 3.1 Applications and the Annotations They Use.............. 38 3.2 Dwarf Computational Patterns and Applications............ 39 5.1 Execution Times on Intel Xeon..................... 62 5.2 Application Speedups on Intel Xeon................... 62 5.3 Execution Times on SGI Altix 4700................... 66 5.4 Application Speedups on SGI Altix 4700................ 66 5.5 Run-Time Initialization Overheads (CPU Cycles)........... 69 5.6 Filament Creation Overheads (Intel Xeon workstation)........ 69 6.1 Gossamer, Cilk++, and OpenMP: Execution Times.......... 79 6.2 Gossamer, Cilk++, and OpenMP: Speedups.............. 80 9 ABSTRACT The key to performance improvements in the multicore era is for software to utilize the newly available concurrency. Consequently, programmers will have to learn new programming techniques, and software systems will have to be able to manage the parallelism effectively. The challenge is to do so simply, portably, and efficiently. This dissertation presents a lightweight programming framework called Gossamer that is easy to use, enables the solution of a broad range of parallel programming problems, and produces efficient code. Gossamer supports task and recursive par- allelism, iterative parallelism, domain decomposition, pipelined computations, and MapReduce computations. Gossamer contains (1) a set of high-level annotations that one adds to a sequential program to specify concurrency and synchroniza- tion, (2) a source-to-source translator that produces an optimized program, and (3) a run-time system that provides efficient threads and synchronization. The annotation-based programming model simplifies writing parallel programs by allow- ing the programmer to concentrate on the application and not the extensive book- keeping involved with concurrency and synchronization; moreover, the annotations never reference any particulars of the underlying hardware. 10 CHAPTER