A Compiler-Compiler for DSL Embedding
Total Page:16
File Type:pdf, Size:1020Kb
A Compiler-Compiler for DSL Embedding Amir Shaikhha Vojin Jovanovic Christoph Koch EPFL, Switzerland Oracle Labs EPFL, Switzerland {amir.shaikhha}@epfl.ch {vojin.jovanovic}@oracle.com {christoph.koch}@epfl.ch Abstract (EDSLs) [14] in the Scala programming language. DSL devel- In this paper, we present a framework to generate compil- opers define a DSL as a normal library in Scala. This plain ers for embedded domain-specific languages (EDSLs). This Scala implementation can be used for debugging purposes framework provides facilities to automatically generate the without worrying about the performance aspects (handled boilerplate code required for building DSL compilers on top separately by the DSL compiler). of extensible optimizing compilers. We evaluate the practi- Alchemy provides a customizable set of annotations for cality of our framework by demonstrating several use-cases encoding the domain knowledge in the optimizing compila- successfully built with it. tion frameworks. A DSL developer annotates the DSL library, from which Alchemy generates a DSL compiler that is built CCS Concepts • Software and its engineering → Soft- on top of an extensible optimizing compiler. As opposed to ware performance; Compilers; the existing compiler-compilers and language workbenches, Alchemy does not need a new meta-language for defining Keywords Domain-Specific Languages, Compiler-Compiler, a DSL; instead, Alchemy uses the reflection capabilities of Language Embedding Scala to treat the plain Scala code of the DSL library as the language specification. 1 Introduction A compiler expert can customize the behavior of the pre- Everything that happens once can never happen defined set of annotations based on the features provided by again. But everything that happens twice will a particular optimizing compiler. Furthermore, the compiler surely happen a third time. expert can extend the set of annotations with additional ones – Paulo Coelho, The Alchemist for encoding various domain knowledge in an optimizing Domain-specific languages (DSLs) have gained enormous compiler. success in providing productivity and performance simulta- This paper is organized as follows. In Section 2 we review neously. The former is achieved through their concise syntax, the background material and related work. Then, in Section 3 while the latter is achieved by using specialization and com- we give a high-level overview of the Alchemy framework. In pilation techniques. These two significantly improve DSL Section 4 we present the process of generating a DSL com- users’ programming experience. piler in more detail. Section 5 presents several use cases built Building DSL compilers is a time-consuming and tedious with the Alchemy framework. Finally, Section 6 concludes. task requiring much boilerplate code related to non-creative aspects of building a compiler, such as the definition of inter- 2 Background & Related Work mediate representation (IR) nodes and repetitive transforma- In this section, we present the background and related work tions [2]. There are many extensible optimizing compilers to better understand the design decisions behind Alchemy. to help DSL developers by providing the required infrastruc- ture for building compiler-based DSLs. However, the existing 2.1 Compiler-Compiler arXiv:1808.01344v1 [cs.PL] 3 Aug 2018 optimizing compilation frameworks suffer from a steep learn- A compiler-compiler (or a meta compiler) is a program that ing curve, which hinders their adoption by DSL developers generates a compiler from the specification of a program- who lack compiler expertise. In addition, if the API of the ming language. This specification is usually expressed ina underlying extensible optimizing compiler changes, the DSL declarative language, called a meta-language. developer would need to globally refactor the code base of Yacc [17] is a compiler-compiler for generating parsers the DSL compiler. specified using a declarative language. There are numerous The key contribution of this paper is to use a generative systems for defining new languages, referred to as language approach to help DSL developers with the process of build- workbenches [9, 12], such as Stratego/Spoofax [19], Sug- ing a DSL compiler. Instead of asking the DSL developer to arJ [7], Sugar* [8], KHEPERA [10], and MPS [16]. provide the boilerplate code snippets required for building a DSL compiler, we present a framework which automatically generates them. 2.2 Domain-Specific Languages More specifically, we present Alchemy, a language work- DSLs are programming languages tailored for a specific do- bench [9, 12] for generating compilers for embedded DSLs main. There are many successful examples of systems using DSLs in various domains such as SQL in database manage- DSL Developer Compiler Expert ment, Spiral [28] for generating digital signal processing Kiama LMS SC Kiama LMS SC kernels, and Halide [29] for image processing. The software Plugin Plugin Plugin Annotated Alchemy development process can also be improved by using DSLs, Shallow IR Nodes Deep EDSL Expression EDSL referred to as language-oriented programming [40]. Cade- Lifting lion [23] is a language workbench developed for language- Annotations oriented programming. Interpreter Compiler-Compiler Compiler There are two kinds of DSLs: 1) external DSLs which have a stand-alone compiler, and 2) embedded DSLs [14] (EDSLs) Figure 1. Overall design of Alchemy. which are embedded in another generic-purpose program- ming language, called a host language. Various EDSLs have been successfully implemented in dif- EDSL using a particular extensible optimizing compiler. In ferent host languages, such as Haskell [1, 14, 24] or Scala [21, other words, Alchemy converts an interpreter for a language 25, 30, 33]. The main advantage of EDSLs is reusing the ex- (a shallow EDSL) to a compiler (a deep EDSL). isting infrastructure of the host language, such as the parser, Truffle [15] provides a DSL for defining self-optimizing the type checker, and the IDEs among others. AST interpreters, using the Graal [41] optimizing compiler as There are two ways to define an EDSL. The first approach the backend. This system mainly focuses on providing just- is by defining it as a plain library in the host language, re- in-time compilation for dynamically typed languages such ferred to as shallowly embedding it in the host language. as JavaScript and R, by annotating AST nodes. In contrast, A shallow EDSL is reusing both the frontend and backend Alchemy uses annotation on the library itself and generates components of the host language compiler. However, the the AST nodes based on strategy defined by the compiler opportunities for domain-specific optimizations are left un- expert. exploited. In other words, the library-based implementation Forge [38] is an embedded DSL in Scala for specifying of the EDSL in the host language is served an interpreter. other DSLs. Forge is used by the Delite [21] and LMS [30, 31] The second approach is deeply embedding the DSL in the compilation frameworks. This approach requires DSL devel- host language. A deep EDSL is only using the frontend of the opers to learn a new specification language before imple- host language, and requires the DSL developer to implement menting DSLs. In contrast, Alchemy developers write a DSL a backend for the EDSL. This way, the DSL developer can specification using plain Scala code. Then, domain-specific leverage domain-specific opportunities for optimizations knowledge is encoded using simple Alchemy annotations. and can leverage different target backends through code Yin-Yang [18] uses Scala macros for automatically convert- generation. ing shallow EDSLs to the corresponding deep EDSLs. Thus, it completely removes the need for providing the definition 2.3 Extensible Optimizing Compilers of a deep DSL library from the DSL developer. However, con- There are many extensible optimizing compilers which pro- trary to our work, the compiler-compilation of Yin-Yang is vide facilities for defining optimizations and code genera- specific to the LMS30 [ ] compilation framework. Also, Yin- tion for new languages. Such frameworks can significantly Yang does not generate any code related to optimizations of simplify the development of the backend component of the the DSL library. We have identified the task of automatically compiler for a new programming language. generating the optimizations to be not only a crucial require- Stratego/Spoofax [19] uses strategy-based term-rewrite ment for DSL developers but also one that is significantly systems for defining domain-specific optimizations for DSLs. more complicated than the one handled by Yin-Yang. Stratego uses an approach similar to quasi-quotation [39] to hide the expression terms from the user. For the same pur- 3 Overview pose, Alchemy uses annotations for specifying a subset of Figure 1 shows the overall design of the Alchemy framework. optimizations specified by the compiler expert. One canuse Alchemy is implemented as a compiler plugin for the Scala quasi-quotes [26, 34] for implementing domain-specific opti- programming language.1 After parsing and type checking mizations in concrete syntax (rather than abstract syntax) the library-based implementation of an EDSL, Alchemy uses similar to Stratego. the type-checked Scala AST to generate an appropriate DSL compiler. The generated DSL compiler follows the API pro- 2.4 What is Alchemy? vided by an extensible optimizing compiler to implement Alchemy is a compiler-compiler, designed for EDSLs that transformations