Hackett Alistairfinn.Pdf (996.6Kb)
Total Page:16
File Type:pdf, Size:1020Kb
TreeGen: a monotonically impure functional language by Alistair Finn Hackett A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Masters in Mathematics in Computer Science Waterloo, Ontario, Canada, 2020 © Alistair Finn Hackett 2020 Author's Declaration I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract We present TreeGen, an impure functional language designed to express, consume, and validate JSON-like documents, as well as generate text files. The language aims to provide a more reliable and flexible way to create customised Interface Definition Languages, since the current state of the art is implemented via monolithic, ad-hoc codebases, which cannot easily be modified. TreeGen's principal contribution, aside from being tailored to the domain of manip- ulating documents and generating text, is the concept of monotonic mutability: despite being an impure scripting language, its execution remains deterministic under arbitrary reordering of operations, making it robust to many common classes of programmer error possible in languages that allow unchecked mutability. We prove this by basing TreeGen's unordered constraint-based formal semantics on a partially-ordered model of TreeGen's heap, then showing that the execution of any TreeGen expression's constraint set is deter- ministic under chaotic iteration. We also give notes on our experience implementing the language. These notes include a model for execution tracing and error reporting, necessary data structures to practically implement the formal semantics, related performance issues, and comments on potential mitigations of those performance issues. iii Acknowledgements I would like to thank OndˇrejLhot´akfor giving patient criticism and suggestions on the many previous drafts of this thesis. While I am reasonably comfortable as an implementer and software architect, I am relatively new to developing and writing up formal correctness arguments, especially at this scale. This has taught me a lot, both things that I can do now that I could not before, and skills that I now know I want to work on going forward. I would like to thank Jo Atlee for her insightful advice on how to navigate some of the administrative aspects of this project, including the suggestion to go write the Mitacs Accelerate proposal that started it all. Without that push, this thesis as it is now would not exist. I would like to thank PDFTron Systems Inc. for partially funding this project as part of a Mitacs Accelerate grant, as well as for playing along with my opportunistic reply to a recruitment e-mail and providing this project's motivating design problem. I would like to thank my friends and family for listening to my ramblings as I work through things. iv Dedication To Kira, Kaddy and Martin. v Table of Contents List of Figures xi List of Listings xiv 1 Introduction1 1.1 Hello World, an illustrative example......................4 1.2 Thesis outline..................................6 2 Motivation and Related Work8 2.1 API binding generators.............................8 2.1.1 Automatic or semi-automatic tools..................8 2.1.2 Existing interface definition languages................9 2.1.3 Common language runtimes......................9 2.1.4 Language-specific tools......................... 10 2.1.5 Declarative domain-specific languages................. 10 2.2 Templating engines............................... 10 2.2.1 Plain text templating.......................... 11 2.2.2 XML querying and transformation.................. 11 2.3 Scripting languages and mutability...................... 13 2.3.1 Immutability.............................. 15 2.3.2 Monotonic mutability.......................... 16 vi 3 Language tutorial 18 3.1 Documents................................... 19 3.1.1 Essential shapes............................. 20 3.1.2 Essential relations............................ 20 3.1.3 Implicit value deduplication...................... 22 3.2 The merge operation.............................. 24 3.3 Essential syntax and semantics by example.................. 25 3.3.1 Unknown syntax............................ 25 3.3.2 Object syntax.............................. 25 3.3.3 Primitive expressions and operations................. 36 3.3.4 Operations on unknown values..................... 39 3.4 Function values................................. 42 3.4.1 Function deduplication via closed-over values............. 44 3.4.2 Calling a function............................ 45 3.4.3 Non-function values can be called................... 47 3.4.4 Passing an object expression to a function.............. 49 3.4.5 Recursion over graph loops can terminate.............. 50 3.5 Type tags and type directives......................... 52 3.6 Pattern semantics................................ 54 3.6.1 Let directives.............................. 57 3.6.2 Patterns in type directives....................... 58 3.6.3 Partial functions with patterns.................... 60 3.6.4 Match expressions........................... 61 3.7 List definitions and syntax........................... 63 3.7.1 Syntactic sugar............................. 64 3.7.2 How to write a generic list-of type................... 65 3.8 A practical code generation example..................... 66 vii 3.8.1 Simple schema for classes and methods................ 67 3.8.2 Some sample definitions........................ 68 3.8.3 Main file and out-of-order cross-references.............. 70 3.8.4 Generating Java code.......................... 71 4 Formal semantics 76 4.1 Shapes and identifiers.............................. 76 4.2 Environments.................................. 78 4.2.1 Shape relation.............................. 78 4.2.2 Field relation.............................. 79 4.2.3 Calls relation.............................. 79 4.2.4 Call fingerprints............................. 79 4.2.5 Rewrite history............................. 80 4.2.6 Environment accessor notation..................... 81 4.2.7 Shape invariants............................ 81 4.3 Environment congruence............................ 83 4.4 Environment sequencing............................ 84 4.4.1 Shape sequencing............................ 85 4.4.2 Identifier sequencing.......................... 86 4.4.3 Environment sequencing defines a partial order........... 86 4.5 Core language grammar............................ 91 4.6 Operational semantics............................. 93 4.6.1 Program state.............................. 98 4.6.2 Translation............................... 99 4.6.3 The step function family........................ 105 4.6.4 Constraints............................... 107 4.6.5 Deferrals................................. 118 4.7 File output overlay............................... 126 viii 5 Reducing TreeGen to core TreeGen 129 5.1 Full TreeGen grammar............................. 130 5.1.1 Grammatical and lexical refinements................. 131 5.2 Eliminating list syntax............................. 134 5.3 Eliminating patterns.............................. 135 5.3.1 Stage 1: scoping............................. 136 5.3.2 Pattern scoping rules.......................... 137 5.3.3 Stage 2: individual pattern semantics................. 141 5.4 Eliminating match expressions......................... 147 5.5 Eliminating non-core function features.................... 148 5.5.1 Function calls.............................. 148 5.5.2 Functions with patterns........................ 149 5.5.3 Match functions............................. 149 5.6 Eliminating non-core directives........................ 150 5.6.1 Pattern assignments.......................... 150 5.6.2 Type declarations............................ 150 5.6.3 Assert and effect directives....................... 153 5.7 Eliminating assertions............................. 153 5.8 Eliminating n-ary effect expressions...................... 154 5.9 Eliminating dollar names............................ 154 5.10 Eliminating let directives............................ 156 6 Implementation strategy 157 6.1 Substitution via union-find........................... 157 6.1.1 Union-find is not sufficient in general................. 159 6.2 Necessary substitutions and additional data structures........... 159 6.2.1 Fields.................................. 160 6.2.2 Function calls.............................. 160 ix 6.2.3 Object and function uniqueness.................... 161 6.3 Performance effects of execution ordering................... 168 6.3.1 Merging large objects.......................... 169 6.3.2 Interaction between function calls and the object trie........ 171 6.4 Execution tracing................................ 175 6.4.1 Object sub-expressions, bindings, and simple paths......... 175 6.4.2 Patterns and cases........................... 177 6.4.3 Starvation................................ 179 6.5 Implementation-related conclusions...................... 180 7 Conclusion and Future Work 182 7.1 Future work................................... 183 References 184 x List of Figures 1.1 Abstract architecture of a code generator...................2 1.2 Abstract architecture of TreeGen as a code generator............3 3.1 An illustration of the three kinds of node a document may contain..... 19 3.2 An illustration of values with the three essential kinds of shape....... 20 3.3 An illustration of the