Terra: Simplifying High-Performance Programming Using Multi-Stage Programming

TERRA: SIMPLIFYING HIGH-PERFORMANCE PROGRAMMING USING MULTI-STAGE PROGRAMMING A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Zachary DeVito December 2014 © 2014 by Zachary Lipner DeVito. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/zt513qh3032 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Pat Hanrahan, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Alex Aiken I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Oyekunle Olukotun Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii ABSTRACT Modern high-performance computers are heterogeneous machines that contain a mixture of multi-core CPUs, GPUs, interconnects, and other custom accelerators. The composition of individual machines varies and will change over time, so it is important that applications written for these machines are portable. One way to write portable high-performance applications is to abstract the problem at a higher-level and use approaches such as autotuners or domain-specific languages (DSLs) to compile the abstraction into high-performance code using domain knowledge. These approaches require that the application generate code programmatically. However, current state-of-the-art high-performance languages lack support for generating code, making it difficult to design these programs. This thesis presents a novel two-language design for generating high-performance code programmatically. We leverage a high-level language, Lua, to stage the execution of a new low-level language, Terra. Programmers can implement program analyses and transformations in Lua, but use built-in constructs to generate and run high-performance Terra code. This design reflects how programs like DSLs are often created in practice but applies principled techniques to make them easier to create. We adapt techniques from multi-stage programming to work within the two-language design. In particular, shared lexical scope makes it easy to generate code, but separate execution enables the programmer to control performance. Furthermore, we show how adding staged compilation to the typechecking process of Terra allows for the creation of high-performance types. This typechecking process combines meta-object protocols frequently used in dynamically-typed languages with staged programming to produce flexible but high-performance types. We evaluate the design presented in this thesis by implementing a number of example programs using Lua and Terra in areas where performance is critical such as linear algebra, iv image processing, probabilistic programming, serialization, and dynamic assembly. We show that these applications, which previously required many different languages and technologies, can be implemented concisely using just Lua and Terra. They perform as well as or better than their equivalents written using existing technologies. In some cases, we found that the added expressiveness of using our approach made it feasible to implement more aggressive optimizations, enabling the program to perform up to an order of magnitude better than existing approaches. v ACKNOWLEDGMENTS Creating this dissertation was only possible with the help, suggestions, and support of many people. First, I want to thank my advisor Pat Hanrahan for always encouraging me to work on the problems that I found most interesting, and giving me scarily accurate advice well before I realized that I needed it. Frequently there were times that the solution to a problem that I was working on was something Pat mentioned off-hand six months earlier. I am also grateful to Alex Aiken for his advice over the years on the many projects I worked on including the Terra and Liszt languages. As my interests turned to programming languages, Alex helped me navigate the community, always asking on-point questions that helped focus the work. I also want to thank my other Committee members — Kunle Olukotun, Jan Vitek, and Eric Darve — for their advice and for many fruitful discussions about how to design high-performance domain-specific languages. My work would not have been possible without the help of many other students in the lab at Stanford. I am grateful to James Hegarty for participating in endless discussions about programming language design, and for using Terra even at a very early stage, and to Daniel Ritchie and Matt Fisher for their help in crafting the design of Exotypes. I started working on the Liszt programming language only a few months after I came to Stanford. The project has continued up to this day, accumulating a large number of people who have helped out over the years. I am grateful for this help, especially to Niels Joubert, who spent long hours getting Liszt to run on large clusters and GPUs, and to Crystal Lemire, Gilbert Bernstein, and Chinmayee Shah who are now continuing work on the project. I also want to thank my academic “siblings” Mike Bauer, Eric Schkufza, Sharon Lin, and Lingfeng Yang who helped me navigate the academic process, and encouraged me to keep going on problems even when I thought it was not worth it. I am also grateful to the vi rest of Pat’s lab and other students I interacted with while at Stanford. Their diverse interests, enthusiasm for problem solving, and inquisitiveness meant that meetings never had a dull moment. Finally, I would like to thank my mom and dad for their support and encouragement through this process. I wouldn’t have been able to do it without you. vii CONTENTS ABSTRACT iv ACKNOWLEDGMENTS vi 1 INTRODUCTION 1 1.1 Contributions . 4 1.2 Thesis outline . 6 2 BACKGROUND 7 2.1 Practical designs of libraries that do code generation . 7 2.1.1 Ad hoc templates and scripts . 8 2.1.2 Source-to-source autotuners . 9 2.1.3 Statically-compiled DSLs . 11 2.1.4 Dynamically-compiled DSLs . 14 2.1.5 Summary . 16 2.2 Existing techniques for code generation . 16 2.2.1 LISP . 17 2.2.2 Partial evaluation . 19 2.2.3 Multi-stage programming . 21 2.3 Conclusion . 23 3 GENERATING CODE 25 3.1 Writing code in Terra . 26 3.2 Multi-stage programming in Terra . 30 3.3 Writing optimizations using staging . 32 viii 3.4 Generating arbitrary code using staging. 33 3.5 Design decisions for code generation . 36 3.5.1 Shared lexical environment . 36 3.5.2 Hygienic code generation . 37 3.5.3 Separate evaluation of Terra code . 38 3.5.4 Eager specialization with lazy typechecking . 39 4 FORMALIZING CODE GENERATION WITH TERRA CORE 41 4.1 Terra Core . 41 4.2 Design decisions as reflected in Terra Core . 46 4.2.1 Terra entities are first-class . 46 4.2.2 Hygiene and shared lexical environment . 47 4.2.3 Eager specialization with lazy typechecking . 48 4.2.4 Separate evaluation of Terra code . 50 5 BUILDING DSLS 51 5.1 Tuning DGEMM . 51 5.2 Darkroom: A stencil DSL for images . 55 5.3 Liszt: A DSL for physical simulation . 57 5.4 Quicksand: A DSL for probabilistic programming . 59 6 GENERATING TYPES WITH EXOTYPES 63 6.1 Meta-object protocols . 65 6.2 Exotypes interface . 66 6.3 Example: Array(T) . 70 6.4 Composability . 72 6.5 Related Work . 74 7 FORMALIZATION OF EXOTYPES 76 8 BUILDING LIBRARIES WITH EXOTYPES 83 8.1 Class systems . 83 8.2 Data layout . 85 ix 8.3 Serialization . 87 8.4 Dynamic x86 assembly . 89 8.5 Automatic differentiation . 92 9 EMBEDDING CUSTOM DSLS IN LUA 95 9.1 Current approaches to embedding languages . 95 9.2 Our language extension interface . 98 9.3 Using language extensions . 100 9.3.1 Darkroom: Lua-based meta-programming with custom syntax . 100 9.3.2 Liszt: Custom syntax with a language extension API . 102 9.4 Comparison to other approaches to syntax extension . 103 10 INTEROPERABILITY 105 10.1 With existing applications . 105 10.1.1 Embedding as a runtime . 106 10.1.2 Zero-overhead compatibility with C . 107 10.2 With Lua . 111 10.3 With DSLs . 115 11 IMPLEMENTATION 117 12 DISCUSSION 121 12.1 Lessons learned from building DSLs . 121 12.2 Software development as a library . 123 12.3 Statically-typed languages as heterogeneous staged programming . 124 12.4 A middle ground for static and dynamic typing . 126 12.5 Applying dynamic code generation to scientific computing . 128 12.6 Using high-performance staged programming to improve compiler performance . 128 13 CONCLUSION 130 x LIST OF FIGURES 2.1 Example Liszt code . 12 2.2 Generating code with LLVM . 15 2.3 Macro hygiene . 18 2.4 Multi-stage programming operators . 22 3.1 Phases of evaluation of a Lua-Terra program. 30 3.2 An example function that compiles a simple IR for image processing code into a Terra function . 35 3.3 A simple autotuner for the image processing language.

Load more