Abstract Syntax Tree

Implementing compilers There are two basic requirements of any compiler implementation: Code Generation 1. To represent the source / target and Optimisation program in a data structure, usually referred to as an abstract syntax tree. Haskell for Compiler Writers 2. To traverse the abstract syntax tree, extracting information and transforming it from one form to another. Why Haskell? This lecture Haskell has two main features which make it good for writing How to: compilers. define abstract syntax . transform abstract syntax trees 1. Algebraic data types allow an abstract syntax tree to be easily using Haskell. constructed. Aim: to appreciate features of 2. Pattern matching makes it easy Haskell that make it good for to define functions that traverse implementing compilers. an abstract syntax tree. Functions The factorial function in Haskell: FUNCTIONS AND “is of type” Type signature APPLICATIONS fact :: Int -> Int Principles of Haskell fact(n) = if n == 1 then 1 else n * fact(n-1) Equation Function names begin with a lower-case letter. Application and reduction Evaluation Expressions are evaluated by A function applied to an input repeatedly reducing applications. is called an application, e.g. fact(3) fact(3) ⇒ if 3 == 1 then 1 else 3 * fact(3-1) ⇒ if False then 1 else 3 * fact(3-1) An application reduces to the ⇒ 3 * fact(3-1) right-hand-side of the first ⇒ 3 * fact(2) matching equation. ⇒ ... ⇒ 6 if 3 == 1 then 1 fact(3) ⇒ else 3 * fact(3-1) fact(3) ⇒* 6 “reduces to” “evaluates to” Lists The empty list is written [] The list with head h and tail t is LISTS AND TUPLES written h:t “the list “cons” containing 1” Commonly used data types [1] ≡ 1:[] ['x', 'y'] ≡ 'x' : ('y':[]) [5,6,7] ≡ 5:6:7:[] The type of a list Sum The type of a list of elements of type a is written [a]. A function to sum the elements of a list: For example, the list ['a', 'b'] is a value of type [Char]. sum :: [Int] -> Int sum([]) = 0 If x:xs is of type [a] sum(x:xs) = x + sum(xs) then x must have type a Two equations and xs must have type [a]. Exercise 1 Polymorphism Give the reduction steps to A function to compute the evaluate the following application. length of a list: Type variable sum([1,2,3]) length :: [a] -> Int length([]) = 0 length(x:xs) = 1 + length(xs) It is a polymorphic function: it can be applied to a list of values of any type. Polymorphism Tuples If x1, x2, …, xn are values of types A couple more examples of t1, t2, …, tn respectively polymorphic functions: then the tuple (x1, x2, …, xn) head :: [a] -> a is a value of type head(x:xs) = x (t , t , …, t ) 1 2 n For example, the following values tail :: [a] -> [a] are of type (Char, [Int]). tail(x:xs) = xs . ('a', []) . ('b', [9]) . ('z', [5,6,7]) Multiple inputs Exercise 2 Tuples can be used to pass Define a function multiple inputs to a function. append :: ([a], [a]) -> [a] min :: (Int, Int) -> Int that joins two lists into a single min(x, y) = if x < y then x else y list, e.g. For example: append([1,4,2], [3,4]) min(5, 10) ⇒* 5 ⇒* [1,4,2,3,4] Homework Exercise Infix operators Infix operators can be defined Define a function for functions of two arguments. first :: (Int, [a]) -> [a] For example, the definition such that first(n, xs) returns the xs ++ ys = append(xs, ys) first n elements of the list xs, e.g. allows ++ to be used as follows. first(2, [9,8,3,5]) ⇒* [9,8] [1,2] ++ [3] ++ [4,5,6] first(4, [3,5]) ⇒* [3,5] ⇒* [1,2,3,4,5,6] Precedence and Exercise 3 associativity In Haskell, we have: Any Haskell operator can be given a precedence (from 0 to 9) and infixr 5 ++ left, right, or non-associativity. Why make ++ right-associative? For example, we can write: infixl 6 – infixr 7 * So x–y–z is interpreted as (x–y)–z. And x–y*z is interpreted as x–(y*z). Type synonyms Type synonyms allow a new (more meaningful) name to be USER-DEFINED TYPES given to an existing type, e.g. type String = [Char] Type synonyms and algebraic data types New name Existing type The new type String is entirely equivalent to [Char]. In Haskell: "hi!" ≡ ['h', 'i', '!'] Algebraic data types Pattern matching A data definition introduces a new Examples of functions involving type, and a set of constructors that Bool and Colour: can be used to create values of that type. not :: Bool -> Bool data Bool = True | False not(False) = True not(True) = False Data type Data constructors data Colour = Red | Green | Blue isRed :: Colour -> Bool Type and constructor names isRed(Red) = True begin with an upper-case letter. isRed(x) = False Shapes Area A data constructor may have associated components, e.g. A function to compute the area of any given shape. Component value (radius) data Shape = area :: Shape -> Float Circ(Float) area(Rect(w, h)) = w * h | Rect(Float, Float) area(Circ(r)) = pi * r * r Component values (width & height) Example values of type Shape: (Compare with C code for same task in LSA Chapter 2.) . Circ(10.5) . Rect(10.2, 20.9) Concrete syntax Here is a concrete syntax for arithmetic expressions. CASE STUDY v = [a-z]+ n = [0-9]+ e → v A simplifier for arithmetic expressions. | n | e + e | e * e | ( e ) Example expression: x * y + (z* 10) Simplification Problem Consider the algebraic law: 1. Define an abstract syntax, in ∀e. e * 1 = e Haskell, for arithmetic. This law can be used to simplify expressions by using it as a 2. Implement the simplification rewrite rule from left to right. rule as a Haskell function over abstract syntax trees. Example simplification: x * (y * 1) → x * y Abstract syntax Abstract syntax trees An op is an addition An abstract syntax tree that or a multiplication represents the expression data Op = Add | Mul x + (y * 2) data Expr = is represented by the following Num(Int) Haskell expression | Var(String) | Apply(Expr, Op, Expr) Apply( Var("x") , Add , Apply( Var("y") An expression is a number, or a variable, or an application of , Mul an op to two sub-expressions , Num(2))) Abstract syntax trees Simplification We can view constructors as nodes of a tree and the e * 1 → e constructor components as sub-trees. For example: is implemented by App simplify :: Expr -> Expr simplify(Apply(e, Mul, Num(1))) = simplify(e) Var Add App simplify(Apply(e1, op, e2)) = Apply(simplify(e1), op, simplify(e2)) 'x' Var Mul Num simplify(e) = e 'y' 2 Homework exercise Homework exercise What is result if the simplifier is Extend the simplifier to exploit given the following input? the following algebraic law. 1*(1*1) ∀e. e * 0 = 0 Is it correct? If not, can you fix it? (Source code for the simplifier available on the CGO web page.) Guards Equations may contain boolean conditions called guards. “SYNTACTIC SUGAR” fib :: Int -> Int fib(n) Guards | n == 0 = 0 Convenient features of Haskell. | n == 1 = 1 | otherwise = fib(n-1) + fib(n-2) The chosen equation is the first one whose guard succeeds. The keyword otherwise is equivalent to True. Case expressions List enumerations Using a case expression, pattern A list of values can be created matching can be expressed on using an enumeration, e.g. the RHS of an equation, e.g. [1..5] ⇒* [1, 2, 3, 4, 5] isEmpty :: [a] -> Bool The start and end of the range isEmpty([]) = True isEmpty(x:xs) = False need not be literals, e.g. ≡ [length([1,2])..fact(3)] ⇒* [2,3,4,5,6] isEmpty :: [a] -> Bool A step can be specified by giving isEmpty(xs) = the first two values, e.g. case xs of [] -> True [0,2..10] ⇒* [0, 2, 4, 6, 8, 10] x:xs -> False List comprehensions Example Recall from MCS, we can write The function { n | n ∊ {1..10} ∧ odd(n) } inc :: [Int] -> [Int] inc(xs) = [x+1 | x <- xs] to denote the set of odd numbers between 1 and 10. Similarly, in increments each element of a Haskell we can write given list. For example [n | n <- [1..10], odd(n) ] inc([1, 2, 3]) “drawn from” Filter evaluates to [2,3,4]. Exercise 4 Homework Exercise Define a function omit Define a function unique omit :: (Int, [Int]) -> [Int] unique :: [Int] -> [Int] such that omit(x, ys) returns ys such that unique(xs) returns xs omitting all occurrences of x. omitting duplicates. For example, For example, omit(1, [1, 2, 1, 3]) unique([1, 2, 1, 3, 1, 3]) should evaluate to [2,3]. should evaluate to [1,2,3]. (Order doesn’t matter.) Type classes We have seen that == can compare values of type Int. TYPE CLASSES But what if we want to compare values of type Char, Colour, or [Bool] using the == operator? Making types more general 'x' == 'y' Red == Green [True, False] == [True] Type classes let us do this. Type classes Deriving classes A type class is a set of types One way to make an algebraic providing a common set of data type, such as Colour, a functions. member of Eq class is to write: Example: Eq is a type class data Colour = Red | Green | Blue providing functions == and /=. deriving Eq Members of the Eq class This gives == the “obvious” include Int and Char, e.g. meaning on values of type Colour. For example: 1 == 2 ⇒ False Red == Red ⇒ True 'x' == 'x' ⇒ True Red == Green ⇒ False Class constraints Other standard classes Class constraints may appear in Besides the Eq class, there are type signatures of polymorphic the Num, Ord and Show classes.

Load more