Implementing compilers

There are two basic requirements of any compiler implementation: Code Generation 1. To represent the source / target and Optimisation program in a data structure, usually referred to as an abstract syntax tree. Haskell for Compiler Writers

2. To traverse the abstract syntax tree, extracting information and transforming it from one form to another.

Why Haskell? This lecture

Haskell has two main features which make it good for writing How to: compilers. . define abstract syntax . transform abstract syntax trees 1. Algebraic data types allow an

abstract syntax tree to be easily using Haskell. constructed.

Aim: to appreciate features of 2. makes it easy Haskell that make it good for to define functions that traverse implementing compilers. an abstract syntax tree. Functions

The factorial function in Haskell:

FUNCTIONS AND “is of type” Type signature APPLICATIONS fact :: Int -> Int

Principles of Haskell

fact(n) = if n == 1 then 1 else n * fact(n-1)

Equation Function names begin with a lower-case letter.

Application and reduction Evaluation

Expressions are evaluated by A function applied to an input repeatedly reducing applications. is called an application, e.g. fact(3) fact(3) ⇒ if 3 == 1 then 1 else 3 * fact(3-1) ⇒ if False then 1 else 3 * fact(3-1) An application reduces to the ⇒ 3 * fact(3-1) right-hand-side of the first ⇒ 3 * fact(2) matching equation. ⇒ ... ⇒ 6 if 3 == 1 then 1 fact(3) ⇒ else 3 * fact(3-1) fact(3) ⇒* 6

“reduces to” “evaluates to” Lists

The empty list is written []

The list with head h and tail t is LISTS AND TUPLES written h:t

“the list “cons” containing 1” Commonly used data types [1] ≡ 1:[]

['x', 'y'] ≡ 'x' : ('y':[])

[5,6,7] ≡ 5:6:7:[]

The type of a list Sum

The type of a list of elements of type a is written [a]. A function to sum the elements of a list: For example, the list ['a', 'b'] is a value of type [Char]. sum :: [Int] -> Int

sum([]) = 0 If x:xs is of type [a] sum(x:xs) = x + sum(xs) then x must have type a Two equations and xs must have type [a]. Exercise 1 Polymorphism

Give the reduction steps to A function to compute the evaluate the following application. length of a list: Type variable sum([1,2,3]) length :: [a] -> Int

length([]) = 0 length(x:xs) = 1 + length(xs)

It is a polymorphic function: it can be applied to a list of values of any type.

Polymorphism Tuples

If x1, x2, …, xn are values of types A couple more examples of t1, t2, …, tn respectively polymorphic functions: then the tuple

(x1, x2, …, xn) head :: [a] -> a is a value of type head(x:xs) = x (t , t , …, t ) 1 2 n For example, the following values tail :: [a] -> [a] are of type (Char, [Int]).

tail(x:xs) = xs . ('a', []) . ('b', [9]) . ('z', [5,6,7]) Multiple inputs Exercise 2

Tuples can be used to pass Define a function multiple inputs to a function. append :: ([a], [a]) -> [a] min :: (Int, Int) -> Int that joins two lists into a single min(x, y) = if x < y then x else y list, e.g.

For example: append([1,4,2], [3,4]) min(5, 10) ⇒* 5 ⇒* [1,4,2,3,4]

Homework Exercise Infix operators

Infix operators can be defined Define a function for functions of two arguments. first :: (Int, [a]) -> [a] For example, the definition

xs ++ ys = append(xs, ys) such that first(n, xs) returns the first n elements of the list xs, e.g. allows ++ to be used as follows. first(2, [9,8,3,5]) ⇒* [9,8] [1,2] ++ [3] ++ [4,5,6] first(4, [3,5]) ⇒* [3,5] ⇒* [1,2,3,4,5,6] Precedence and Exercise 3 associativity

In Haskell, we have: Any Haskell operator can be given a precedence (from 0 to 9) and infixr 5 ++ left, right, or non-associativity. Why make ++ right-associative? For example, we can write:

infixl 6 – infixr 7 *

So x–y–z is interpreted as (x–y)–z.

And x–y*z is interpreted as x–(y*z).

Type synonyms

Type synonyms allow a new (more meaningful) name to be USER-DEFINED TYPES given to an existing type, e.g.

type String = [Char]

Type synonyms and algebraic data types New name Existing type The new type String is entirely equivalent to [Char]. In Haskell:

"hi!" ≡ ['h', 'i', '!'] Algebraic data types Pattern matching

A data definition introduces a new Examples of functions involving type, and a set of constructors that Bool and Colour: can be used to create values of that type. not :: Bool -> Bool data Bool = True | False not(False) = True not(True) = False Data type Data constructors data Colour = Red | Green | Blue isRed :: Colour -> Bool

Type and constructor names isRed(Red) = True begin with an upper-case letter. isRed(x) = False

Shapes Area

A data constructor may have associated components, e.g. A function to compute the area of any given shape. Component value (radius) data Shape = area :: Shape -> Float

Circ(Float) area(Rect(w, h)) = w * h | Rect(Float, Float) area(Circ(r)) = pi * r * r Component values (width & height) Example values of type Shape: (Compare with code for same task in LSA Chapter 2.) . Circ(10.5)

. Rect(10.2, 20.9) Concrete syntax

Here is a concrete syntax for arithmetic expressions.

CASE STUDY v = [a-z]+

n = [0-9]+

e → v A simplifier for arithmetic expressions. | n | e + e | e * e | ( e ) Example expression: x * y + (z* 10)

Simplification Problem

Consider the algebraic law: 1. Define an abstract syntax, in ∀e. e * 1 = e Haskell, for arithmetic. This law can be used to simplify expressions by using it as a 2. Implement the simplification rewrite rule from left to right. rule as a Haskell function over abstract syntax trees. Example simplification: x * (y * 1) → x * y Abstract syntax Abstract syntax trees

An op is an addition An abstract syntax tree that or a multiplication represents the expression

data Op = Add | Mul x + (y * 2)

data Expr = is represented by the following Num(Int) Haskell expression | Var(String) | Apply(Expr, Op, Expr) Apply( Var("x") , Add , Apply( Var("y") An expression is a number, or a variable, or an application of , Mul an op to two sub-expressions , Num(2)))

Abstract syntax trees Simplification

We can view constructors as nodes of a tree and the e * 1 → e constructor components as sub-trees. For example: is implemented by

App simplify :: Expr -> Expr

simplify(Apply(e, Mul, Num(1))) = simplify(e) Var Add App simplify(Apply(e1, op, e2)) = Apply(simplify(e1), op, simplify(e2)) 'x' Var Mul Num simplify(e) = e

'y' 2 Homework exercise Homework exercise

What is result if the simplifier is Extend the simplifier to exploit given the following input? the following algebraic law.

1*(1*1) ∀e. e * 0 = 0

Is it correct? If not, can you fix it?

(Source code for the simplifier available on the CGO web page.)

Guards

Equations may contain boolean conditions called guards. “” fib :: Int -> Int Guards fib(n) | n == 0 = 0

Convenient features of Haskell. | n == 1 = 1 | otherwise = fib(n-1) + fib(n-2) The chosen equation is the first one whose guard succeeds. The keyword otherwise is equivalent to True. Case expressions List enumerations

Using a case expression, pattern A list of values can be created matching can be expressed on using an enumeration, e.g. the RHS of an equation, e.g. [1..5] ⇒* [1, 2, 3, 4, 5]

isEmpty :: [a] -> Bool The start and end of the range isEmpty([]) = True isEmpty(x:xs) = False need not be literals, e.g.

≡ [length([1,2])..fact(3)] ⇒* [2,3,4,5,6]

isEmpty :: [a] -> Bool A step can be specified by giving isEmpty(xs) = the first two values, e.g. case xs of [] -> True [0,2..10] ⇒* [0, 2, 4, 6, 8, 10] x:xs -> False

List comprehensions Example

Recall from MCS, we can write The function

{ n | n ∊ {1..10} ∧ odd(n) } inc :: [Int] -> [Int] inc(xs) = [x+1 | x <- xs] to denote the set of odd numbers between 1 and 10. Similarly, in increments each element of a Haskell we can write given list. For example [n | n <- [1..10], odd(n) ] inc([1, 2, 3])

“drawn from” Filter evaluates to [2,3,4]. Exercise 4 Homework Exercise

Define a function omit Define a function unique

omit :: (Int, [Int]) -> [Int] unique :: [Int] -> [Int]

such that omit(x, ys) returns ys such that unique(xs) returns xs omitting all occurrences of x. omitting duplicates.

For example, For example,

omit(1, [1, 2, 1, 3]) unique([1, 2, 1, 3, 1, 3])

should evaluate to [2,3]. should evaluate to [1,2,3]. (Order doesn’t matter.)

Type classes

We have seen that == can compare values of type Int. TYPE CLASSES But what if we want to compare values of type Char, Colour, or [Bool] using the == operator? Making types more general

'x' == 'y'

Red == Green [True, False] == [True] Type classes let us do this. Type classes Deriving classes

A is a set of types One way to make an algebraic providing a common set of data type, such as Colour, a functions. member of Eq class is to write:

Example: Eq is a type class data Colour = Red | Green | Blue providing functions == and /=. deriving Eq

Members of the Eq class This gives == the “obvious” include Int and Char, e.g. meaning on values of type Colour. For example: 1 == 2 ⇒ False Red == Red ⇒ True

'x' == 'x' ⇒ True Red == Green ⇒ False

Class constraints Other standard classes

Class constraints may appear in Besides the Eq class, there are type signatures of polymorphic the Num, Ord and Show classes. functions. For example: Class Functions member :: Eq t => (t, [t]) -> Bool Eq ==, /= member(x, []) = False Num +, -, * member(x, y:ys) = Ord <=, <, >, >= x == y || member(x, ys) Show show The constraint states that type t To illustrate: must be in the Eq class. This is 'a' <= 'b' ⇒ True because values of type t are compared using == in the body show(Red) ⇒ "Red" of the function. show([1,2,3]) ⇒ "[1, 2, 3]" Local definitions

Definitions local to a particular equation can be defined using a where clause. To illustrate, here HIGHER-ORDER FUNCTIONS is QuickSort in Haskell. AND sort :: Ord a => [a] -> [a] Two Haskell features NOT used in CGO sort([]) = [] sort(pivot:xs) = sort(smaller) ++ [pivot] ++ sort(greater) where smaller = [x | x <- xs, x <= pivot] greater = [x | x <- xs, x > pivot]

Higher-order functions Currying

Functions can take functions as A function of 2 arguments can be inputs and return functions as implemented as a function that returns a function e.g. results. These are called higher- order functions. add :: Int -> (Int -> Int) add(x) = f add(1)(2) ⇒ 3 To illustrate, each(f, xs) applies f where f(y) = x+y to each element in the list xs. or more simply: each :: (a -> b, [a]) -> [b] add :: Int -> Int -> Int add 1 2 ⇒ 3 each(f, []) = [] add x y = x + y each(f, x:xs) = f(x) : each(f, xs) The same idea works for functions of each(not, [False, True]) ⇒* [True, False] n > 2 arguments. CGO notes, questions, and practicals will assume no FINITE MAPS knowledge of higher order functions and currying.

But if you are comfortable A simple but very useful data structure with these features, feel free to use them in your answers.

Finite Map Keys and Values

A finite map of type goals :: Map String Int goals = Map a b [ ("Charlton", 49) , ("Owen", 40) is a data structure that maps , ("Lineker", 48) values of type a (domain) to ] values of type b (range).

It can be represented as a list of “Key” “Value” (a, b) pairs. key :: (a, b) -> a value :: (a, b) -> b type Map a b = [(a, b)] key(k, v) = k value(k, v) = v Fetch Fetch operator

A common operation is to fetch the value associated with a given We will use the symbol ! as an key. infix operator for fetch: fetch :: Eq a => (Map a b, a) -> b fetch(p:ps, k) m ! k = fetch(m, k) | key(p) == k = value(p) | otherwise = fetch(ps, k) For example:

For example: goals ! "Owen" ⇒* 40 fetch(goals, "Owen") ⇒* 40

Exercise 5 Summary

Define a function to insert a . Algebraic data types and functions given key/value pair into a map. defined by pattern matching are For example, the core features. . Other features such as insert(goals, “Owen", 41) – infix operators – list comprehensions should return the following map. – type classes are nice but not as important. [ ("Charlton", 49), . Powerful data structures such as ("Owen", 41), finite maps are easily defined. ("Lineker", 48) ] . These ingredients will let us write concise compilers! Modified to 41