The L3 Project the L3 Language

Project overview What you will get (as the semester progresses): – parts of an L3 compiler written in Scala, and – parts of a virtual machine, written in C. The L3 project What you will have to do: – one non-graded exercise to warm you up, Advanced Compiler Construction – complete the compiler, Michel Schinz — 2016–02–25 – complete the virtual machine. 1 2 The L3 language L3 is a Lisp-like language. Its main characteristics are: – it is “dynamically typed”, – it is functional: – functions are first-class values, and can be nested, – there are few side-effects (exceptions: mutable blocks The L3 language and I/O), – it automatically frees memory, – it has six kinds of values: unit, booleans, characters, integers, blocks and functions, – it is simple but quite powerful. 3 4 A taste of L3 Literal values "c1…cn" y An L3 function to compute x for x ∈ ℤ, y ∈ ℕ: String literal (translated to a block expression, see later). (defrec pow 'c' x0 = 1 (fun (x y) Character literal. (cond ((= 0 y) … -2 -1 0 1 2 3 … 1) x2z = (xz)2 Integer literals (also available in base 16 with #x prefix, or ((even? y) in base 2 with #b prefix). (let ((t (pow x (/ y 2)))) (* t t))) #t #f (#t Boolean literals (true and false, respectively). (* x (pow x (- y 1))))))) #u xz+1 = x(xz) Unit literal. 5 6 Top-level definitions Local definitions (let ((n1 e1) …) b1 b2 …) Parallel local value definition. The expressions e1, … are (def n e) evaluated in that order, and their values are then bound to Top-level non-recursive definition. The expression e is names n1, … in the body b1, b2, … The value of the whole evaluated and its value is bound to name n in the rest of expression is the value of the last bi. the program. The name n is not visible in expression e. (let* ((n1 e1) …) b1 b2 …) (defrec n f) Sequential local value definition. Equivalent to a nested Top-level recursive function definition. The function sequence of let: (let ((n1 e1)) (let (…) …)) expression f is evaluated and its value is bound to name n (letrec ((n1 f1) …) b1 b2 …) in the rest of the program. The function can be recursive, Recursive local function definition. The function i.e. the name n is visible in the function expression f. expressions f1, … are evaluated and bound to names n1, … in the body b1, b2 … The functions can be mutually recursive. 7 8 Functions Conditional expressions (if e1 e2 e3) (fun (n1 …) b1 b2 …) Two-ways conditional. If e1 evaluates to a true value (i.e. Anonymous function with arguments n1, … and body anything but #f), e2 is evaluated, otherwise e3 is evaluated. b1, b2, … The return value of the function is the value of the The value of the whole expression is the value of the last bi. evaluated branch. (e e1 …) The else branch is optional and defaults to #u (unit). Function application. Expressions e, e1, … are evaluated in (cond (c1 b1,1 b1,2 …) (c2 b2,1 b2,2 …) …) that order, and then the value of e — which must be a N-ways conditional. If c1 evaluates to a true value, evaluate function — is applied to the value of e1, … b1,1, b1,2 …; else, if c2 evaluates to a true value, evaluate Note : if e is a simple identifier, a special form of name b2,1, b2,2 …; etc. The value of the whole expression is the resolution, based on arity, is used — see later. value of the evaluated branch or #u if none of the conditions are true. 9 10 Logical expressions Loops and blocks (and e1 e2 e3 …) Short-cutting conjunction. If e1 evaluates to a true value, (rec n ((n1 e1) …) b1 b2 …) proceed with the evaluation of e2, and so on. The value of General loop. Equivalent to: the whole expression is that of the last evaluated ei. (letrec ((n (fun (n1 …) b1 b2 …))) (or e1 e2 e3 …) (n e1 …)) Short-cutting disjunction. If e1 evaluates to a true value, (begin b1 b2 …) produce that value. Otherwise, proceed with the Sequential evaluation. First evaluate expression b1, evaluation of e2, and so on. discarding its value, then b2, etc. The value of the whole (not e) expression is the value of the last bi. Negation. If e evaluates to a true value, produce the value #f. Otherwise, produce the value #t. 11 12 Arity-based name lookup Arity-based name lookup Arity-based name lookup can for example be used to define several functions to create lists of different lengths: A special name lookup rule is used when analyzing a (def list-make@1 (fun (e1) …)) function application in which the function is a simple name: (def list-make@2 (fun (e1 e2) …)) (n e1 e2 … ek) and so on for list-make@3, list-make@4, etc. In such a case, the name n@k (i.e. the name itself, followed With these definitions, the following two function by @, followed by the arity in base 10) is first looked up, and applications are both valid: used instead of n instead if it exists. Otherwise, name 1. (list-make 1) (invokes list-make@1), analysis proceeds as usual. 2. (list-make 1 (+ 2 3)) (invokes list-make@2). This allows a kind of overloading based on arity (although it However, the following one is not valid, unless a definition is not overloading per se). for the bare name list-make also appears in scope: (map list-make l) 13 14 Values Primitives (@ p e1 e2 …) Primitive application. First evaluate expressions e1, e2, … in L3 integers are represented using 31 (!) bits, in two's that order, and then apply primitive p to the value of these complement, and therefore range from –231 to 231 – 1. expressions. Floored integer L3 characters represent Unicode code points, i.e. 21 bits L3 offers the following primitives: division, e.g. positive integers in one of the following ranges: < <= > >= + - * / % – integer: (/ -5 2) ⇒ -3 – from 000016 to D7FF16, or – bit vectors (integers): << >> & | ^ – from E00016 to 10FFFF16. – polymorphic: = != id identity (Notice that characters and integers are different types, and – type tests: block? int? char? bool? unit? conversion between the two must be done using the – character: char->int int->char primitives int->char and char->int.) – I/O: byte-read byte-write – tagged blocks: block-alloc-n 0 ≤ n ≤ 255 block-tag block-length block-get block-set! 15 16 Tagged blocks Using tagged blocks L3 offers a single structured datatype: tagged blocks. They are manipulated with the following primitives: (@ block-alloc-n s) Tagged blocks are a low-level data structure. They are not Allocates an uninitialized block with tag n and length s. meant to be used directly in programs, but rather as a (@ block-tag b) means to implement more sophisticated data structures like Returns the tag of block b (as an integer). strings, arrays, lists, etc. (@ block-length b) The valid tags range from 0 to 255, inclusive. Tags ≥ 200 are Returns the length of block b. reserved by the compiler, while the others are available for (@ block-get b n) general use. (For example, our L3 library uses a few tags to Returns the nth element (0-based) of block b. represent arrays, lists, etc.) (@ block-set! b n v) Sets the nth element (0-based) of block b to v. 17 18 Valid primitive arguments Valid primitive arguments Primitives only work correctly when applied to certain types of arguments, otherwise their behavior is undefined. + - * << >> & | ^ : int × int ⇒ int byte-read : ⇒ int between 0 and 255, or -1 / % : int × non-zero int ⇒ int byte-write : ∃α. int between 0 and 255 ⇒ α < <= > >= : int × int ⇒ bool block-alloc-n : int ⇒ block the return value = != : ∀α, β. α × β ⇒ bool block-tag block-length : block ⇒ int is arbitrary id : ∀α. α ⇒ α block-get : ∀α. block × int ⇒ α int->char : int (valid Unicode code-point) ⇒ char block-set! : ∀α ∃β. block × int × α ⇒ β char->int : char ⇒ int block? int? char? bool? unit? : ∀α. α ⇒ bool 19 20 Undefined behavior Grasping the syntax Like all Lisp-like languages, L3 “has no syntax”, in that its concrete syntax is very close to its abstract syntax. For example, the L3 expression on the left is almost a direct transcription of a pre-order traversal of its AST on the right, The fact that primitives have undefined behavior when in which nodes are parenthesized and tagged, while leaves applied to invalid arguments means that they can do are unadorned. anything in such a case. if For example, division by zero can produce an error, crash (if (@ < x 0) the program, or produce an arbitrary value. (@ - 0 x) @ @ ident(x) x) < int(0) – ident(x) ident(x) int(0) 21 22 L3 EBNF grammar (1) L3 EBNF grammar (2) program ::= { def | defrec | expr } expr def ::= (def ident expr) defrec ::= (defrec ident fun) expr ::= fun | let | let* | letrec | rec | begin | if | cond | and | or if ::= (if expr expr [ expr ]) | not | app | prim | ident | num | str | chr | bool | unit cond ::= (cond (expr exprs) {(expr exprs)}) exprs ::= expr { expr } and ::= (and expr expr { expr }) fun ::= (fun ({ ident }) exprs) or ::= (or expr expr { expr }) let ::= (let ({ (ident expr) }) exprs) not ::= (not expr) let* ::= (let* ({ (ident expr) }) exprs) app ::= (expr { expr }) letrec ::= (letrec ({ (ident fun) }) exprs) prim ::= (@ prim-name { expr }) rec ::= (rec ident ({ (ident expr) }) exprs) begin ::= (begin exprs) 23 24 L3 EBNF grammar (3) L3 EBNF grammar (4) str ::= "{any character except newline}" num ::= num2 | num10 | num16 chr ::= 'any character' num2 ::= #b [–] digit2 { digit2 } bool ::= #t | #f num10 ::= [–] digit10 { digit10 } unit ::= #u num16 ::= #x [–] digit16 { digit16 } ident ::= identstart { identstart | digit } [@ digit { digit }] digit2 ::= 0 | 1 identstart ::= a | … | z | A | … | Z | | | ! | % | & | * | + | – digit10 ::= digit2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | .

Load more