Developing a Monadic Type Checker for an Object-Oriented Language: an Experience Report

Developing a Monadic Type Checker for an Object-Oriented Language: An Experience Report Elias Castegren Kiko Fernandez-Reyes Software and Computer Systems Information Technology KTH Royal Institute of Technology Uppsala University Sweden Sweden [email protected] [email protected] Abstract 1 Introduction Functional programming languages are well-suited for de- Compilers for functional languages are often written in veloping compilers, and compilers for functional languages functional languages themselves. For example, the Haskell, are often themselves written in a functional language. Func- OCaml, and F# compilers are written in their own respec- tional abstractions, such as monads, allow abstracting away tive language [1, 2, 5]. The recursive nature of functional some of the repetitive structure of a compiler, removing programming, as well as constructs such as algebraic data boilerplate code and making extensions simpler. Even so, types and pattern matching, lends itself well to traverse and functional languages are rarely used to implement compilers manipulate abstract syntax trees, which is a large part of the for languages of other paradigms. task of a compiler. Functional programming languages pro- This paper reports on the experience of a four-year long vide abstractions, such as type classes or ML style modules, project where we developed a compiler for a concurrent, which allows writing concise and extensible code. object-oriented language using the functional language Haskell. However, programmers are creatures of habit. Therefore, The focus of the paper is the implementation of the type compilers for imperative or object-oriented languages tend checker, but the design works well in static analysis tools, to be written in languages of these paradigms, e.g., the Java such as tracking uniqueness of variables to ensure data-race compiler and the Scala compiler are written in their respec- freedom. The paper starts from a simple type checker to tive languages, and the clang compiler framework for C and which we add more complex features, such as type state, C++ is written in C++ [3, 4, 6]. This habit prevents compiler with minimal changes to the overall initial design. writers from using attractive features from other languages. For example, a programmer developing a language tool in C CCS Concepts • Software and its engineering → Func- will miss out on features like pattern matching and useful tional languages; Compilers; Object oriented languages; • type system features that are not available in C.1 Theory of computation → Type theory. In this paper, we report on our experience using Haskell Keywords functional programming, object-oriented lan- to develop the compiler for the object-oriented language guages, type systems, compilers Encore [10]. The full compiler consists of ≈15,000 lines of Haskell which compiles Encore to C. To make the presenta- ACM Reference Format: tion fit in a paper, we use a subset of the language andfocus Elias Castegren and Kiko Fernandez-Reyes. 2019. Developing a on the implementation of the type checker (≈7,000 lines in Monadic Type Checker for an Object-Oriented Language: An Expe- the full compiler). We start from a simple type checker and rience Report. In Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering (SLE ’19), October gradually refactor and extend it to make it more robust and 20–22, 2019, Athens, Greece. ACM, New York, NY, USA, 13 pages. feature rich. The monadic structure of the code allows these https://doi.org/10.1145/3357766.3359545 extensions with few or no changes to the original code. The language started as a research project, but has since Permission to make digital or hard copies of all or part of this work for evolved to a full language with support for parametric poly- personal or classroom use is granted without fee provided that copies morphism, subtyping via traits [31], concurrency, and a type are not made or distributed for profit or commercial advantage and that system that prevents data-races between threads sharing copies bear this notice and the full citation on the first page. Copyrights state [12]. This paper reports on our experience creating a for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or compiler for a research language, focusing on the features republish, to post on servers or to redistribute to lists, requires prior specific and robustness of the type checker. The techniques used are permission and/or a fee. Request permissions from [email protected]. not novel on their own, but we have not seen them used in SLE ’19, October 20–22, 2019, Athens, Greece this combination for an object-oriented language. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1The same argument could be made in the other direction, since functional ACM ISBN 978-1-4503-6981-7/19/10...$15.00 languages for example typically do not offer control over low-level details https://doi.org/10.1145/3357766.3359545 like memory layout. SLE ’19, October 20–22, 2019, Athens, Greece Elias Castegren and Kiko Fernandez-Reyes Audience Our hope is that this paper can serve as a source Expressions are mostly standard for an object-oriented of inspiration for language engineering researchers and com- language. For simplicity we use a let expression for intro- piler writers who want to use functional languages to de- ducing names and handling scopes (in the Encore compiler, velop compilers and tools for object-oriented languages. We an earlier phase translates an imperative style of variable expect the audience to be familiar with basic Haskell nota- declarations into let form). Sequencing of statements can tion, the standard monads, and basic extensions. be emulated by binding the result of the statement to an unused variable: “s1; s2” =) “let _ = s1 in s2”; or by Contributions using a “sequence expression” containing a list of expres- • An explanation of the initial design and implementa- sions, only the last of which is used for its value (this is tion of a type checker for an object-oriented language, the approach taken in the Encore compiler). An anonymous ¹ º using common functional programming constructs function λ x : τ :e (Lambda in Figure1) takes zero or more (Sections2–3). parameters. Note the difference between method calls, which • Descriptions of how to extend the type checker to call a method on a specified target, and function calls, which handle backtraces, reporting warnings, and throwing call a value of function type. multiple errors (Sections4–6). The task of the type checker is twofold: to ensure that the • A description of the use of phantom types to do type- AST is well-formed, and to decorate each expression node level programming, which ensures that the type checker with its type. Every expression in the AST data type has a indeed type checks the entire program (Section7). field etype, which has value Nothing after parsing and which • Explanation on how to add parametric polymorphism, will be assigned a type during the type checking phase (this subtyping via traits, and tracking of uniqueness of part of the design will be improved in Section7). Additionally, variables with no changes to the monadic design (Sec- the type checker uses an environment, storing a map of all tions8– 10). the classes of a program and all variables currently in scope. Since we only allow assigning to immutable val fields if The main point of the work presented here is the ease of we are currently in a constructor method, we also store a extension from the original type checker with new compiler boolean flag to track this. functionality, and how easily one can add object-oriented features, such as subtyping via traits. data Env = Env {ctable :: Map Name ClassDef ,vartable :: Map Name Type ,constructor :: Bool} 2 A Small Object-Oriented Language and its Typechecker We define a type class [37] Typecheckable with a single We define our object-oriented language below, and the en- function typecheck which takes such an environment and an coding of the abstract syntax tree (AST) using algebraic data AST node, and returns either the decorated node, or an error types in Figure1. value (TCError) representing one of the ways type checking can fail. To avoid having to check if each computation fails or not, we use the exception monad [35]: Classes L ::= class CfQ f : τ ; Mg Qualifiers Q ::= var j val data TCError = TypeMismatchError Type Type Methods M ::= def m¹x : τ º : τ f return e g | UnknownClassError Name | NonArrowTypeError Type Binary operators B ::= + j − j ∗ j / | NonClassTypeError Type Expressions e ::= x j let x = e in e j e:m¹eº j e:f | NonArrowTypeError Type | UninferrableError Expr j e B e j e¹eº j e = e j new C¹eº | .. j if e then e else e j e : τ j v class Typecheckable a where v ::= true j false j n j λ¹x : τ º:e j null typecheck :: Env ! a ! Except TCError a j ! j int j bool j unit Types τ ::= C τ τ Figure2 shows instances of Typecheckable for programs, classes, fields, types and expressions (for brevity, we show an A program consists of a list of class definitions. A class excerpt). Type checking a program corresponds to checking definition contains a list of field and method definitions. A its classes; type checking a class corresponds to extending the field definition has a name, a type, and a mutability modifier environment to include the variable this (Line7), and check- (val or var). A method definition has a name, a list of param- ing its fields and methods (Lines8–9); type checking a type eters, a return type, and a method body.

Load more