GOOL: a Generic Object-Oriented Language (Extended Version)
Total Page:16
File Type:pdf, Size:1020Kb
GOOL: A Generic Object-Oriented Language (extended version) Jacques Carette Brooks MacLachlan W. Spencer Smith Department of Computing and Department of Computing and Department of Computing and Software Software Software McMaster University McMaster University McMaster University Hamilton, Ontario, Canada Hamilton, Ontario, Canada Hamilton, Ontario, Canada [email protected] [email protected] [email protected] Abstract it to the others. Of course, this can be made to work — one We present GOOL, a Generic Object-Oriented Language. It could engineer a multi-language compiler (such as gcc) to demonstrates that a language, with the right abstractions, de-compile its Intermediate Representation (IR) into most can capture the essence of object-oriented programs. We of its input languages. The end-results would however be show how GOOL programs can be used to generate human- wildly unidiomatic; roughly the equivalent of a novice in a readable, documented and idiomatic source code in multi- new (spoken) language “translating” word-by-word. ple languages. Moreover, in GOOL, it is possible to express What if, instead, there was a single meta-language de- common programming idioms and patterns, from simple signed to embody the common semantic concepts of a num- library-level functions, to simple tasks (command-line argu- ber of OO languages, encoded so that the necessary informa- ments, list processing, printing), to more complex patterns, tion for translation is present? This source language could such as methods with a mixture of input, output and in-out be agnostic about what eventual target language will be used parameters, and finally Design Patterns (such as Observer, – and free of the idiosyncratic details of any given language. State and Strategy). GOOL is an embedded DSL in Haskell This would be quite the boon for the translator. In fact, we that can generate code in Python, Java, C#, and C++. could go even further, and attempt to teach the translator about idiomatic patterns of each target language. Keywords Code Generation, Domain Specific Language, Trying to capture all the subtleties of each language is Haskell, Documentation hopeless — akin to capturing the rhythm, puns, metaphors, similes, and cultural allusions of a sublime poem in transla- 1 Introduction tion. But programming languages are most often used for Java or C#? At the language level, this is close to a non- much more prosaic tasks: writing programs for getting things question: the two languages are so similar that only issues done. This is closer to translating technical textbooks, mak- external to the programming language itself would be the ing sure that all of the meaningful material is preserved. deciding factor. Unlike say the question “C or Prolog?”, which Is this feasible? In some sense, this is already old hat: is almost non-sensical, as the kinds of applications where modern compilers have a single IR, used to target multiple each is well-suited are vastly different. But, given a single processors. Compilers can generate human-readable sym- paradigm, for example object-oriented (OO), would it be bolic assembly code for a large family of CPUs. But this is not possible to write a unique meta-language that captures the the same as generating human-readable, idiomatic high-level essence of writing OO programs? After all, they generally all code. contain (mutable) variables, statements, conditionals, loops, More precisely, we are interested in capturing the con- arXiv:1911.11824v1 [cs.PL] 26 Nov 2019 methods, classes, objects, and so on. ceptual meaning of OO programs, in such a way as to fully Of course, OO programs written in different languages automate the translation from the “conceptual” to human- appear, at least at the surface, to be quite different. But this readable, idiomatic code, in mainstream languages. is mostly because the syntax of different programming lan- At some level, this is not new. Domain-Specific Languages guages is different. Are they quite so different in the utter- (DSL), are high-level languages with syntax and semantics ances that one can say in them? In other words, are OO tailored to a specific domain [18]. A DSL abstracts over the de- programs akin to sentences in Romance languages (French, tails of “code”, providing notation to specify domain-specific Spanish, Portugese, etc) which, although different at a sur- knowledge in a natural manner. DSL implementations often face level, are structurally very similar? work via translation to a GPL for execution. Some generate This is what we set out to explore. One non-solution is to human-readable code [5, 12, 19, 25]. find an (existing) language and try to automatically translate This is what we do, for the domain of OO programs. We have a set of new requirements: PL’20, , 2020. 1. The generated code should be human-readable, 1 2. The generated code should be idiomatic, look at JVM bytecode, and few C++ programmers at assembly. 3. The generated code should be documented, But GOOL’s aim is different: to allow writing high-level OO 4. The generator expresses common OO patterns. code once, but have it be available in many GPLs. One use Here we demonstrate that all of these requirements can be case would be to generate libraries of utilities for a narrow met. While designing a generic OO language is a worthwhile domain. As needs evolve and language popularity changes, endeavour, we had a second motive: we needed a means to it is useful to have it immediately available in a number do exactly that as part of our Drasil project [23, 24]. The of languages. Another use, which is core to our own moti- idea of Drasil is to generate all the requirements documenta- vation as part of Drasil [23, 24], is to have extremely well tion and code from expert-provided domain knowledge. The documented code, indeed to a level that would be unrealistic generated code needs to be human readable so that experts to do by hand. But this documentation is crucial in domains can certify that it matches their requirements. We largely where certification is required. And readable is a proxy for rewrote SAGA [5] to create GOOL1. GOOL is implemented understandable, which is also quite helpful for debugging. as a DSL embedded in Haskell that can currently generate The same underlying reasons for readable also drive id- code in Python, Java, C#, and C++. Others could be added, iomatic and documented, as they contribute to the human- with the implementation effort being commensurate to the understandability of the generated code. idiomatic is impor- (semantic) distance to the languages already supported. tant as many human readers would find the code “foreign” First we expand on the high-level requirements for such an otherwise, and would not be keen on using it. Note that endeavour, in Section2. To be able to give concrete examples, documentation can span from informal comments meant for we show the syntax of GOOL in Section3. The details of the humans, to formal, structured comments useful for generat- implementations, namely the internal representation and the ing API documentation with tools like Doxygen, or with a family of pretty-printers, is in Section4. Common patterns variety of static analysis tools. Readability (and thus under- are illustrated in Section5. We close with a discussion of standability) are improved when code is pretty-printed [7]. related work in Section6, plans for future improvements in Thus taking care of layout, redundant parentheses, well- Section7, and conclusions in Section8. chosen variable names, using a common style with lines that Note that a short version of this paper [16] will be pub- are not too long, are just as valid for generated code as for lished at PEPM 2020. The text in both version differs many human-written code. GOOL does not prevent users from places (other than just in length), but do not differ in mean- writing undocumented or complex code, if they choose to ing. do so. It just makes it easy to have readable, idiomatic and documented code in multiple languages. 2 Requirements The patterns requirement is typical of DSLs: common programming idioms can be reified into a proper linguistic While we outlined some of our requirements above, here we form instead of being merely informal. Even some of the give a complete list, as well as some reasoning behind each. design patterns of [10] can become part of the language itself. mainstream Generate code in mainstream object-oriented While this does make writing some OO code even easier languages. in GOOL than in GPLs, it also helps keep GOOL language- readable The generated code should be human-readable, agnostic and facilitates generating idiomatic code. Examples idiomatic The generated code should be idiomatic, will be given in Section5. But we can indicate now how this documented The generated code should be documented, helps: Consider Python’s ability to return multiple values patterns The generator should allow one to express with a single return statement, which is uncommon in other common OO patterns. languages. Two choices might be to disallow this feature in expressivity The language should be rich enough to GOOL, or throw an error on use when generating code in lan- express a set of existing OO programs, which act as guages that do not support this feature. In the first case, this test cases for the language. would likely mean unidiomatic Python code, or increased common Language commonalities should be abstracted. complexity in the Python generator to infer that idiom. The Targetting OO languages (mainstream) is primarily be- second option is worse still: one might have to resort to writ- cause of their popularity, which implies the most potential ing language-specific GOOL, obviating the whole reason for users — in much the same way that the makers of Scala and the language! Multiple-value return statements are always Kotlin chose to target the JVM to leverage the Java ecosystem, used when a function returns multiple outputs; what we can and Typescript for Javascript.