Compiling a Higher-Order Smart Contract Language to LLVM
Total Page:16
File Type:pdf, Size:1020Kb
Compiling a Higher-Order Smart Contract Language to LLVM Vaivaswatha Nagaraj Jacob Johannsen Anton Trunov Zilliqa Research Zilliqa Research Zilliqa Research [email protected] [email protected] [email protected] George Pîrlea Amrit Kumar Ilya Sergey Zilliqa Research Zilliqa Research Yale-NUS College [email protected] [email protected] National University of Singapore [email protected] Abstract +----------------------+ Scilla is a higher-order polymorphic typed intermediate | Blockchain Smart | | Contract Module | level language for implementing smart contracts. In this talk, | in C++ (BC) | +----------------------+ we describe a Scilla compiler targeting LLVM, with a focus + state variable | + ^ on mapping Scilla types, values, and its functional language foo.scilla | | | & message | fetch| | constructs to LLVM-IR. | | |update v v | The compiled LLVM-IR, when executed with LLVM’s JIT +--------------------------------------+---------------------------------+ framework, achieves a speedup of about 10x over the refer- | | | +-------------+ +----------------+ | ence interpreter on a typical Scilla contract. This reduced | +-----------------> |JIT Driver | +--> | Scilla Run-time| | | | |in C++ (JITD)| | Library in C++ | | latency is crucial in the setting of blockchains, where smart | | +-+-------+---+ | (SRTL) | | | | | ^ +----------------+ | contracts are executed as parts of transactions, to achieve | | | | | | | foo.scilla| | | peak transactions processed per second. Experiments on the | | | foo.ll| | | | | | | Ackermann function achieved a speedup of more than 45x. | | v | | | | +--+-------+----+ | This talk is aimed at both programming language researchers | | |Scilla Compiler| | | | |in OCaml (SC) | | looking to implement an LLVM based compiler for their func- | | +---------------+ | | | | tional language, as well as at LLVM practitioners. | | | | | | | | Scilla Virtual Machine | | v in OCaml & C++ (SVM) | | +-----+-------+ | | | JIT Cache | | 1 Introduction | +-------------+ | | | Scilla (Smart Contract Intermediate Level Language) [16] +------------------------------------------------------------------------+ is a functional programming language in the ML family, designed to enable writing safe smart contracts. Scilla is the Fig. 1. Design of the Scilla Virtual Machine main smart contract language of the Zilliqa blockchain [18]. Contracts written in Scilla are deployed as-is (i.e., as source code) and are interpreted during transaction validation by vision on some desired LLVM features which would facilitate the nodes participating in the blockchain consensus (aka projects similar to our own (Sec. 5). miners). The project is under active development and is free and open-source [12–14]. 2 The Compiler Pipeline arXiv:2008.05555v1 [cs.PL] 12 Aug 2020 The Scilla Virtual Machine (SVM) (Fig. 1) aims to eventu- Scilla’s reference interpreter and the accompanying type- ally replace the Scilla interpreter with faster execution, thus checker are written in OCaml. To take advantage of the achieving higher transaction throughput. This project com- developed infrastructure, we decided to write the compiler prises of the Scilla compiler (SC) which translates Scilla also in OCaml, thus enabling re-using much of the frontend to LLVM-IR, the Scilla runtime library (SRTL) and the JIT code and the type-checker. The compiler uses LLVM’s OCaml driver which JIT compiles the LLVM-IR and executes it. Ex- bindings to generate LLVM-IR. ecution of a Scilla contract may result in accessing the Serving as a background for the next section, we now blockchain state, with such interactions facilitated by the briefly discuss our compiler pipeline. While in this proposal runtime library. The runtime library also implements Scilla we use textual description, our talk will make use of examples built-in operations. to illustrate the transformations performed in each pass. In the course of this talk, we will briefly describe the phases of our compiler pipeline (Sec. 2), elaborate on mapping the 2.1 Parsing and Type Checking Scilla AST to LLVM-IR (Sec. 3), present characteristic bench- The parser and type checker reuse code from the reference marks used to evaluate our compiler (Sec. 4), and share our interpreter and the result is an AST with each identifier 1 Vaivaswatha Nagaraj, Jacob Johannsen, Anton Trunov, George Pîrlea, Amrit Kumar, and Ilya Sergey 1 fun (p: List(Option Int32)) ) or monomorphization (C++, Rust). The former erases all 2 match p with type information in generic code after type-checking and 3 | Nil ) z generates a single copy of the code. This requires that run- 4 | Cons(Somex) xs ) x time values be boxed (i.e., values of all types represented 5 | Cons__ ) z uniformly [8]) so that they can all be handled uniformly in 6 end the generic code. On the other hand monomorphization spe- cializes the code for each possible (at runtime) ground type Fig. 2. Nested pattern in a match-expression. (i.e., types that don’t have a type parameter) [17], creating annotated with its type. The type annotations are necessary copies of the code. While this results in code replication, for code generation later on. each copy can be efficiently compiled because the runtime values no longer need to be boxed [3]. 2.2 Dead Code Elimination A third approach to parametric polymorphism is to wait Scilla follows a whole-program compilation model. This until polymorphic code is used, and specialize it then (typi- means that along with the program currently being compiled, cally using JIT compilation). This approach is used in C# [5] we also compile imported libraries. While the DCE pass is and Ada [6]. useful in the traditional sense, our primary goal for it here Scilla is based on System F [4, 15]— an expressive type is to simply eliminate unused imported library functions, to system with explicit type bindings and instantiations. We im- improve compile time. plement parametric polymorphism by specializing each poly- morphic Scilla expression with all possible ground types. 2.3 Pattern Match Flattening This uses a higher-order data-flow analysis to track flow of Scilla allows for nested patterns in a match expression. For types into polymorphic expressions [9]. As type instantiation example, the expression shown in Fig. 2 requires checking for may happen in stages for Scilla expressions with several a Some constructor inside an object created with the Cons type parameters, we choose to compile a polymorphic expres- constructor. To efficiently translate pattern match constructs sion as a dynamic dispatch table, associating ground types into an LLVM switch statement, match expressions with with the corresponding monomorphic version of the expres- nested patterns are flattened into nested match expressions sion. This provides for a simpler compilation that saves us with flat patterns [10]. from replicating code outside of the polymorphic expression, but at the cost of a runtime indirection indexing into the (pos- 2.4 Uncurrying sibly nested) dynamic dispatch tables. While this is a mix of As is common in functional programming languages, Scilla existing strategies to implement parametric polymorphism, functions and their applications (calls) follow curried seman- we are not aware of one that uses this exact combination. tics [2]. This means that a function taking n arguments is actually a function that takes in one argument and returns 2.6 Closure Conversion a function to process the remaining n − 1 arguments and As is typical of higher-order languages, Scilla allows one produce the result. The uncurrying pass transforms the AST to pass functions as arguments to other functions and re- to have regular C-like (and LLVM-IR) function call seman- turn them, as well as to define functions nested inside other tics, i.e., functions may take n ≥ 0 arguments, and their calls functions. This means a function may have not only its argu- will provide exactly n arguments. The change in semantics ments available for use inside, but also a set of “free variables” is accompanied by an optimization pass that analyzes all that are defined outside its syntactic definition. calls of a function, and when all of them provide the same n The closure conversion pass [1] computes the set of free number of arguments, the function is translated to a function variables for each function, constructs an aggregate type to taking n arguments. If it cannot prove so, then the function hold all the free variables of the function (conventionally is translated to take just one argument, and a nested function called the function’s environment) and lifts the function handling the rest. An example: If all calls of the function f definition to the top / global level, as is required for LLVM-IR : T1 -> T2 -> T3 provide two arguments (of type T1 and generation (where all functions are defined at the top level). T2), then f is transformed to have the type f : (T1,T2) -> The lifted function takes the environment as an argument. T3. When functions are used as values in the program, each At the time of writing this proposal, our uncurrying pass such variable is now a pair of (1) pointer to the function defi- changes the semantics of the AST to use uncurried semantics, nition and (2) the environment. Function applications (calls) but we do not have the optimization implemented yet. are translated into to calling the function pointer, with the paired environment being passed as an additional argument. 2.5 Monomorphize In our implementation, this pass also flattens the AST by Parametric polymorphism