Subtyping: Overview and Implementation CS5218: Principles of Program Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Subtyping: Overview and Implementation CS5218: Principles of Program Analysis Chen Jingwen National University of Singapore [email protected] 1 ABSTRACT of a typechecker in a toy programming language with subtyping. Type systems has been a topic of heavy research in programming Finally, we will explore the related work in the eld of subtyping. language theory for decades, with implementations in dierent lan- guages across paradigms. In object oriented languages, class based 3 SUBTYPING type systems describe a hierarchy between objects using a concept 3.1 Overview known as subtyping. Subtyping describes relations between types, allowing terms of a type to be used safely in places where terms Traditionally, type systems rigidly restrict the usage of typed terms of another type is expected. This increases the exibility of exist- in locations based on the idea of type equality. This results in ing type systems by increasing expressiveness while maintaining programs where it is not obvious why typechecking fails, even well-typedness. In this technical report, we will give an overview though it seems natural in both syntax and semantics. For example, of the motivation and formalism of subtyping systems. We will the function application also provide a Haskell implementation of record datatype subtyp- (fn (x: float) => x + 2) (2 :: int) ing in a functional toy language to demonstrate a typechecking algorithm. Lastly, we will explore related work in the eld in the seems that it should be well-typed as the + operation permits addi- decades following its conception. tions with operands of both int and float types, but since they are not equal under classical type systems, the expression is ill-typed. 2 INTRODUCTION Subtyping is the formalisation of relations between types with the goal of allowing terms of a type to be used in locations where Type systems are a central feature of programming languages and terms of other types are expected. For example, we want to allow they come in dierent manifestations, stemming from decades of operations that are dened on reals, R, to work with integers, Z, research in programming language theory. Despite implementation by forming a subtype relation between Z and R. Pierce described dierences, the one thing in common is to prevent entire classes of this to be the principle of safe substitution in his book, Types and erroneous program behaviours from happening. Programming Languages [26], where terms of one type can be safely Cardelli’s 1996 paper, Type Systems [8], provides a high level substituted in place of a term of another type without incurring overview on the subject, ranging from rst order type systems (a runtime type errors. This is also known as Liskov’s substitution la Simply Typed Lambda Calculus) to System F<:. A collection of principle, where safety is asserted on behavioural properties in type system avours is summarised in the following list: subtype relations [20]. (1) First order (System F1): Pascal, Algol68 As a core concept utilised in object-oriented (OO) languages (2) Second order (System F2): ML, Haskell, Modula-3 for the hierarchical organization of objects, subtyping is often im- (3) Object-oriented (OO), class-based: Java, C++ plemented in the form of subclassing. Classes, the templates that (4) Dependent types: Idris, Agda objects are instantiated from, are ordered hierarchically in the form (5) Dynamic: Smalltalk, Ruby of superclasses and subclasses. In “A Semantics of Multiple Inheritance”, Cardelli analysed pop- Zooming in on typed object-oriented systems, the dominating ular OO languages (Simula, Lisp & Smalltalk) in their representa- feature is the type hierarchy via subclasses, which describes a rela- tions of objects [5], and decided to focus on representing objects tion of attribute and behaviour inheritance between objects. How- as records, which is essentially the Cartesian product type with la- ever, the variety of implementations and optimizations of subclass- bels. The main advantage of using the concept of records (with ing systems makes understanding and formalising this relation a functional components) was for simplicity, because it is possible to dicult endeavour. Subtyping is a popular approach to formalise statically determine the elds of a record at compile-time, hence such relations, and it was a concept introduced by Cardelli with his removing any need for dynamical analysis at runtime for invalid seminal paper “A Semantics of Multiple Inheritance” [5]. eld accesses. He then presented the grammar and semantics of a The purpose of this report is to present a notion of typing and static, strongly typed functional language with multiple inheritance, subtyping, an exploration on the evolution of research around and record and variant subtyping. He also provided sound type subtyping in the past decades, and an implementation of a static inference and type checking algorithms on the semantics. typechecker for a language with a subtyping system. In the next In a later paper, Type Systems [8], he described subtyping as a section, we will describe what subtyping is and the benets that it form of set inclusion, where terms of a type can be understood as el- brings to a type system. Next, we will formalize the inference rules ements belonging to a set, and the subtyping relation is understood for a language with subtyping, alongside a concrete implementation as the subset relation between these sets of elements. 3.2 Implementation of FunSub 3.2.4 Records. Records are data structures in the form of nite Before diving into an overview of the fundamental subtyping con- and unordered associations from labels to values, with each pair cepts, we will rst describe a Haskell implementation of a toy lan- described as a eld. To extract a value from a eld, we use the guage, FunSub, and a static typechecker for checking well-typedness projection notation e:a where a is the label of a eld in the record in the presence of subtypes. e. The purpose of this toy language is to provide a concrete real- The purpose of having records in our toy language is for demon- isation of the formal denitions that we will explore in the later strating subtyping relations on a composite data structure, unlike sections. We will attach snippets of implementations alongside the Int and Bool. denitions where applicable. 3.2.5 Explicit types. To simplify implementation, we chose to FunSub Fun is a superset of [10], which is in turn an implementa- use an explicit typing notation (expr :: type) to assert the types tion of Church’s Simply Typed Lambda Calculus [13] that includes for functions and records, in place of a type inference algorithm. the record datatype and a modied style of explicit typing (also known as ascriptions). 3.2.6 Example. The following is a valid expression in the FunSub The source code for FunSub is available online [12]. syntax: 3.2.1 Usage. Assuming Haskell is installed and the user is in (fn x :: ({ a: Int, b: Int } -> { a: Int }) => x) the project directory, running the following command will invoke { a = 2, b = 2, c = true } :: { a: Int, b: Int, c: Bool } the typechecker: In type systems without subtyping, this function application will not typecheck because the record argument type does not match runhaskell Main.hs examples/typed_expressions.fun the parameter type, and the parameter type does not match the This produces an output similar to the following: function body type even though it is an identity function. However .. in our subtyping implementation, we are able to use concepts such [Expression]: (fn x :: (Int -> Int) => 2) as variance (§3.5), width and depth record subtyping (§3.4) to make [Typecheck][OK]: (Int -> Int) this expression well-typed. 3.2.7 Implementation: AST. The AST is a direct translation of [Expression]: (fn x :: (Int -> Int) => true) the syntax dened in §3.2.3. [Typecheck][FAIL]: Type mismatch: expected Int, got Bool .. -- Syntax.hs data Ty = IntTy 3.2.2 Architecture. The components of the language comprises | BoolTy of a REPL/reader, lexer, parser and typechecker. Some examples of | ArrowTy Ty Ty FunSub expressions are stored in the examples/ folder. | RcdTy [(String, Ty)] Main.hs is the entry point that takes in the lename for a le deriving (Show, Eq) containing FunSub expressions. The lexer (Lexer.hs) denes re- served tokens and lexemes, and the parser (Parser.hs) uses Parsec data Expr = I Int Ty on the tokens to generate an abstract syntax tree (AST) dened | B Bool Ty in Syntax.hs. Lastly, the AST is checked for well-typedness with | Var String rules dened in the typechecker (Typecheck.hs). | Fn String Expr Ty | FApp Expr Expr 3.2.3 Syntax. | Rcd [(String, Expr)] Ty c 2 Const constants | RcdProj Expr Expr a; x 2 Ident alphanumeric identiers deriving (Show, Eq) 3.2.8 Implementation: Typechecker. The two main functions of t ::= Int integers the typechecker are typecheck and isSubtype. Their type signa- j Bool booleans tures are dened as follows: -- Typecheck.hs j t1 -> t2 arrows newtype TypeEnv = TypeEnv (Map String Ty) j fa1 : t1;:::; an : tn g records isSubtype :: Ty -> Ty -> Bool e ::= c constant typecheck :: TypeEnv -> Expr -> Either String Ty j x variable isSubtype takes in two types and recursively determines if the j (fn x :: (t) => e) function abstraction rst type is a subtype of the second, using the inference rules dened in §3.3. j e e function application 1 2 typecheck takes in a type environment (a mapping of variables j fa1 = e1;:::; an = en g :: t record to types) and an AST, and recursively determines if the expression j e:a record projection is well-typed. If it is ill-typed, an error message describing the issue 2 is bubbled up and handled in Main.hs.