Reliably Composable Language Extensions
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Minnesota Digital Conservancy Reliably composable language extensions A Dissertation SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Ted Kaminski IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Eric Van Wyk May, 2017 Copyright © 2017 Ted Kaminski. Acknowledgments I am unlikely to do justice to everyone I should thank here. I should begin by thanking my advisor, Eric Van Wyk, who has been a been a better mentor than I had even expected going in, and I had pretty optimistic expectations. I would also like to thank my thesis committee members: Gopalan Nadathur, Mike Whalen, and Tim Hunter. All of them helped provide valuable feedback in polishing this thesis. I also want to thank my ex-committee member, Wayne Richter, who retired before I finished this thing off. I’d like to thank other students in this department, without whom grad school would hardly have been as excellent as it was: Nate Bird, Morten Warncke-Wang, Aaron Halfaker, Michael Ekstrand, Sarah McRoberts, Andy Exley, Hannah Miller, Sean Landman, Dane Coffey, Fedor Korsakov, and you know, tag yourselves. Sorry Reid. I’d also like to thank the colleagues I’ve had a chance to talk shop with, especially Tony Sloane and Jurgen Vinju. I’d like to thank math friends, Shelley Kandola and Maggie Ewing, who provided some welcome diversions into things that aren’t a part of this thesis, unfortunately. As for Minneapolis in general, best place on Earth, there are too many people to name, but Eli Cizewski-Robinson must appear here. And finally, thank you Mary Katherine Southern, best friend, girlfriend, and colleague, for your support through most of grad school, for helping me make sense of things, and for brightening my life. And a final thank you to my family, for their support as well. This material is based upon work partially supported by the National Science Foundation under Grant Nos. IIS 0905581, ACI 1047961, and CBET 1307089. I’d also like to acknowledge the support of Department of Education GAANN fellowships (through grants P200A060194 and P200A100195.) I was also partially supported in this work through DARPA and the US Air Force Research Lab under Contract No FA8650-10-C-7076, in conjunction with Adventium Labs. i Abstract Many programming tasks are dramatically simpler when an appropriate domain-specific language can be used to accomplish them. These languages of- fer a variety of potential advantages, including programming at a higher level of abstraction, custom analyses specific to the problem domain, and the ability to generate very efficient code. But they also suffer many disadvantages as a result of their implementation techniques. Fully separate languages (such as YACC, or SQL) are quite flexible, but these are distinct monolithic entities and thus we are unable to draw on the features of several in combination to accomplish a single task. That is, we cannot compose their domain-specific features. “Embedded” DSLs (such as parsing combinators) accomplish some- thing like a different language, but are actually implemented simply as libraries within a flexible host language. This approach allows different libraries to be imported and used together, enabling composition, but it is limited in analysis and translation capabilities by the host language they are embedded within. A promising combination of these two approaches is to allow a host language to be directly extended with new features (syntactic and semantic.) However, while there are plausible ways to attempt to compose language extensions, they can easily fail, making this approach unreliable. Previous methods of assuring reliable composition impose onerous restrictions, such as throwing out entirely the ability to introduce new analysis. This thesis introduces reliably composable language extensions as a tech- nique for the implementation of DSLs. This technique preserves most of the advantages of both separate and “embedded” DSLs. Unlike many prior ap- proaches to language extension, this technique ensures composition of multiple language extensions will succeed, and preserves strong properties about the ii behavior of the resulting composed compiler. We define an analysis on lan- guage extensions that guarantees the composition of several extensions will be well-defined, and we further define a set of testable properties that ensure the resulting compiler will behave as expected, along with a principle that assigns “blame” for bugs that may ultimately appear as a result of composition. Fi- nally, to concretely compare our approach to our original goals for reliably composable language extension, we use these techniques to develop an extensi- ble C compiler front-end, together with several example composable language extensions. iii Contents List of Figures x 1 Introduction 1 1.1 Domain-specific languages ........................ 3 1.2 Language composition .......................... 7 1.2.1 Composable language extensions ................ 8 1.3 Motivation for composable language extension . 11 1.4 Contributions ............................... 14 1.5 Outline of the thesis ........................... 17 2 Related work 20 2.1 Extensible languages, historically .................... 21 2.2 The expression problem ......................... 23 2.3 Language specification systems ..................... 25 2.3.1 Objects .............................. 26 2.3.2 Algebraic datatypes ........................ 27 2.3.3 Systems that have solved the expression problem . 28 2.3.4 Macros ............................... 30 2.3.5 Common host languages for internal DSLs . 32 iv Contents 3 Background 35 3.1 Attribute grammars ............................ 36 3.1.1 Higher-order attribute grammars . 39 3.1.2 Forwarding ............................ 42 3.1.3 Reference attribute grammars . 46 3.1.4 Aspects .............................. 48 3.1.5 Remote/collection attributes ................... 50 3.1.6 Well-definedness .......................... 51 3.2 Silver .................................... 52 3.3 Copper ................................... 53 4 Integrating AGs and functional programming 56 4.1 Language Design Goals .......................... 58 4.2 The Ag Language ............................. 59 4.2.1 Decorated and undecorated trees . 65 4.2.2 Semantics ............................. 68 4.3 The Type System ............................. 73 4.3.1 Generalized algebraic data types. 80 4.3.2 Polymorphic attribute access problem . 82 4.3.3 Putting types to work ...................... 85 4.4 Pattern Matching ............................. 87 4.4.1 Behavior .............................. 90 4.4.2 Typing pattern matching expressions . 92 4.4.3 Semantics ............................. 95 4.4.4 Towards semantics without the variable restriction . 98 v Contents 4.4.5 Differences between Ag and Silver . 100 4.5 Algebraic properties of language extension . 101 4.6 Related work ............................... 103 5 Modular well-definedness analysis 105 5.1 Modules .................................. 106 5.2 Modular analysis ............................. 109 5.3 Effective completeness and flow types . 112 5.3.1 Flow types ............................. 114 5.4 Overview of the analysis . 117 5.4.1 Restrictions imposed by the analysis . 118 5.4.2 Reasoning about the presence of inherited equations . 122 5.4.3 Integrating flow and effective completeness analysis . 124 5.5 Flow type inference ............................ 125 5.5.1 The structure of production flow graphs . 126 5.5.2 Flow dependencies of expressions . 133 5.5.3 Computing intermediate flow data . 137 5.5.4 Constructing static production flow graphs . 143 5.5.5 Computing flow types . 147 5.6 Checking for effective completeness . 150 5.6.1 Effective inherited completeness . 151 5.6.2 Implementation notes . 156 5.6.3 Extending to additional language features . 157 5.7 Self-evaluation on Silver . 159 5.8 Extending to circularity . 163 vi Contents 5.8.1 Discussion ............................. 166 5.9 Related work ............................... 168 6 Non-interference 170 6.1 The problem ................................ 173 6.2 Reasoning about attribute grammars . 175 6.2.1 Properties of languages and modular non-interference . 176 6.2.2 Induction on decorated trees . 178 6.2.3 Extending proofs to extended languages . 183 6.3 Coherence ................................. 186 6.3.1 Examples of incoherence . 189 6.3.2 Closure properties of coherence . 191 6.3.3 Restricted propositions and relations (Limitations and scope of this development) . 192 6.3.4 The problem of equality . 196 6.3.5 Non-interference: coherence over extended languages . 199 6.3.6 Coherence assures modular non-interference . 201 6.4 Showing non-interference . 204 6.4.1 An “unreasonable” approach to enforcement . 206 6.4.2 Relationship to macro systems . 211 6.5 Attribute properties: more useful non-interference . 213 6.5.1 Host language adaptation . 213 6.5.2 Weakening the strict equality condition . 215 6.5.3 Closing the world . 217 6.5.4 Summary ............................. 219 vii Contents 6.6 Property testing .............................. 220 6.7 Application to a real compiler . 223 6.8 Related work ............................... 225 6.9 Discussion ................................. 227 6.9.1 Future work ............................ 228 7 AbleC: Synthesis, Platform, and Evaluation 230 7.1 The AbleC compiler . 232 7.2 Permitted classes of language extensions . 234 7.2.1 External DSLs . 235 7.2.2 Improving