Scheduling Deterministic Parallel Programs
Total Page:16
File Type:pdf, Size:1020Kb
Scheduling Deterministic Parallel Programs Daniel John Spoonhower CMU-CS-09-126 May 18, 2009 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Guy E. Blelloch (co-chair) Phillip B. Gibbons Robert Harper (co-chair) Simon L. Peyton Jones (Microsoft Research) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2009 Daniel John Spoonhower This research was sponsored by Microsoft, Intel, and IBM. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: parallelism, scheduling, cost semantics Abstract Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reason- ing about the correctness of these programs. However, the performance of parallel programs still depends upon this assignment of tasks, as determined by a part of the language implementation called the scheduling policy. In this thesis, I define a novel cost semantics for a parallel language that en- ables programmers to reason formally about different scheduling policies. This cost semantics forms a basis for a suite of prototype profiling tools. These tools allow programmers to simulate and visualize program execution under different schedul- ing policies and understand how the choice of policy affects application memory use. My cost semantics also provides a specification for implementations of the lan- guage. As an example of such an implementation, I have extended MLton, a com- piler and runtime system for Standard ML, with support for parallelism and imple- mented several different scheduling policies. Using my cost semantics and profiler, I found a memory leak caused by a bug in one of the existing optimizations in MLton. In addition to memory use, this cost semantics precisely captures when a par- allel schedule deviates from the uni-processor execution of a given program. The number of deviations acts as a important conceptual tool in reasoning about concrete measures of performance. I use this metric to derive new bounds on the overhead of existing work stealing implementations of parallel futures and to suggest more efficient alternatives. This work represents an important step in understanding how to develop pro- grams and programming languages for multi-core and multi-processor architectures. In particular, a cost semantics provides an abstraction that helps to ensure adequate performance without sacrificing the ability to reason about programs at a high level. iv to Copic vi Acknowledgments When I arrived at Carnegie Mellon, I expected to find bright, motivated people working on exciting problems. I found not only that but also a vibrant community of students, especially among the students of the Principles of Programming group. Many of these students became not only my colleagues but also my friends. Their passion and enthusiasm for programming languages, logic, and computer science escapes from campus and into homes, restaurants, and bars. In particular, I would like to thank those senior students who helped to establish this community within our group, especially Kevin Watkins, Leaf Peterson, Derek Dreyer, and Tom Mur- phy VII, as well as those who continue to maintain it, especially Jason Reed, Dan Licata, William Lovas, Rob Simmons, and the other members of the ConCert Read- ing Group. It have been extremely gratifying to have participated in this culture and to see that culture persist beyond my time at CMU. I would like to thank the MLton developers, especially Matthew Fluet, for pro- viding the foundation upon which I did much of my implementation and for provid- ing technical support. Pittsburgh has been a great place to live while pursuing my graduate studies. This is due in a large part to the great people I have met here. This includes many fantastic people at CMU, including Rose Hoberman, Katrina Ligett, Adam Goode, Chris Twigg, Tiankai Tu, and those mentioned above; the D’s regulars, especially Heather Hendrickson, Stephen Hancock, and Laura Marinelli; and the Alder Street crowd, especially Ryan Kelly and Jessica Nelson who provided a place to stay and a home while I was traveling frequently. My advisers, Bob Harper and Guy Blelloch, and Phil Gibbons, who became a third adviser, were (of course) instrumental in my success as a graduate student. I am deeply grateful for their unflinching support and confidence in me. I would also like to acknowledge the role of Dan Huttenlocher, who has been a friend and mentor since before I started graduate school and who has helped me to make many important decisions along the way. Finally, I would also like to thank my parents, Tom and Sue, and the rest of my family for always encouraging me to do my best and Kathy Copic for never accepting less than that. viii Preface Like all endeavors of human understanding, the goals of computer science are to generalize our experience and to build models of the world around us. In the case of computer science, these models describe computers, programs, algorithms, and other means of processing information. The purpose of these models is to enable us to understand more about existing phenomena and artifacts, including both those that are purely conceptual and those that are physical in nature. These models also enable us to discover and develop new ways of processing information. As in other areas of study, these models are justified by their capacity to make predictions about existing phenomena or to connect ideas that were previously considered unrelated or disparate. As a study of information, computer science is also concerned with these models themselves and with the means used to justify them. In a sense, abstraction, one of the most important conceptual tools of computer science, is the practice of divid- ing information about a system into that which will help us to learn more about its behavior and that which impedes our understanding. Despite the unifying goal of computer science to better understand how informa- tion is processed, work in our field is often divided into two broad but disjoint cat- egories: theory and systems. However, these need not be competing sub-fields but instead, complementary approaches for evaluating models about information and in- formation processing systems. In this thesis, I develop new models about languages for parallel programming. In addition, I use approaches based on both theory and practice to evaluate these models. My hope is that these models will serve to help future language implementers understand how to efficiently exploit parallel archi- tectures. x Contents 1 Introduction 1 1.1 Parallelism . .1 1.2 Scheduling . .3 1.3 Cost Semantics . .4 1.4 Overview . .6 1.5 Contributions . .8 2 Cost Semantics 9 2.1 Example: Matrix Multiplication . .9 2.2 Syntax . 12 2.3 Graphs . 13 2.4 Semantics . 14 2.5 Schedules and Deviations . 17 2.5.1 Sequential Schedules . 18 2.5.2 Deviations . 19 2.6 Memory Use . 20 2.7 Sequential Extensions . 26 2.8 Alternative Rules . 27 2.9 Summary . 28 2.10 Related Work . 28 3 Online Scheduling 31 3.1 Non-Deterministic Scheduling . 33 3.1.1 Confluence . 33 3.1.2 Soundness . 37 3.1.3 Completeness . 38 3.2 Greedy Scheduling . 42 3.3 Depth-First Scheduling . 45 3.3.1 Semantics . 46 3.3.2 Determinacy . 47 3.3.3 Progress . 48 3.3.4 Soundness . 48 3.3.5 Completeness . 49 3.3.6 Greedy Semantics . 51 xi 3.4 Breadth-First Scheduling . 51 3.4.1 Semantics . 52 3.4.2 Soundness . 56 3.5 Work-Stealing Scheduling . 57 3.6 Summary . 60 3.7 Related Work . 60 4 Semantic Profiling 61 4.1 Simulation . 62 4.1.1 Example: Matrix Multiplication . 65 4.2 Visualization . 66 4.2.1 Distilling Graphs . 67 4.2.2 GraphViz . 68 4.2.3 Example: Matrix Multiplication . 69 4.2.4 Example: Quickhull . 71 4.3 Summary . 73 4.4 Related Work . 73 5 Vector Parallelism 75 5.1 Example: Matrix Multiplication . 76 5.2 Syntax . 78 5.3 Semantics . 79 5.3.1 Cost Semantics . 81 5.3.2 Implementation Semantics . 81 5.4 Flattening . 82 5.5 Labeling . 85 5.6 Summary . 90 5.7 Related Work . 90 6 Parallel Futures 91 6.1 Syntax and Semantics . 94 6.2 Work Stealing . 96 6.2.1 Cilk . 96 6.2.2 Bounding Steals . 97 6.2.3 Beyond Nested Parallelism . 98 6.3 Deviations . 99 6.3.1 Deviations Resulting From Futures . 100 6.3.2 Performance Bounds . 101 6.4 Bounds on Deviations . 104 6.4.1 Upper Bound . 104 6.4.2 Lower Bound . 107 6.5 Alternative Implementations . 109 6.6 Speculation . 111 6.6.1 Glasgow Parallel Haskell . 111 xii 6.6.2 Cost Semantics . 113 6.7 Summary . ..