MASARYK UNIVERSITY }w¡¢£¤¥¦§¨  FACULTY OF I !"#$%&'()+,-./012345

Syntactic analysis and type evaluation in Haskell

BACHELOR’STHESIS

Pavel Dvoˇrák

Brno, Spring 2009 Declaration

I hereby declare that this thesis is my original work and that, to the best of my knowledge and belief, it contains no material previously published or written by another author except where the citation has been made in the text.

......

signature

Adviser: RNDr. Václav Brožek

2 Acknowledgements

I am grateful to my adviser Vašek Brožek for his time, great patience, and excel- lent advices, to Libor Škarvada for his worthwhile lectures, to Marek Kružliak for the beautiful Hasky logo, to Miran Lipovaˇca for his cool Haskell tutorial, ∗ to Honza Bartoš, Jirka Hýsek, Tomáš Janoušek, Martin Milata, Peter Molnár, Štˇepán Nˇemec, Tomáš Stanˇek, and Matúš Tejišˇcák for their important remarks, to the rest of the Haskell community and to everybody that I forgot to mention.

Thank you all.

http://learnyouahaskell.com/ ∗Available online at .

3 Abstract

The main goal of this thesis has been to provide a description of Haskell and basic information about analysis of this . That includes Haskell standards, implementations, syntactic and lexical analysis, and its type system.

Another objective was to write a library for syntactic and type analysis that can be used in further development. †

Keywords

Haskell, , code analysis, syntactic analysis, , type evaluation, type-checking, type system

†Source code available online at http://hasky.haskell.cz/.

4 Contents

1 Introduction 7

1.1 Why functional programming? ...... 7

1.2 Structure of the thesis ...... 8

2 The Haskell language 9

2.1 Main Haskell features ...... 10

2.1.1 ...... 10

2.1.2 Purity and referential transparency ...... 11

2.1.3 Strong type system with inference ...... 12

2.1.4 Monadic input and output ...... 12

2.2 Evolution of Haskell ...... 13

2.2.1 Early development ...... 13

2.2.2 The stable Haskell 98 ...... 14

2.2.3 Haskell Prime and the future of Haskell ...... 14

2.3 Haskell implementations ...... 15

2.3.1 GHC ...... 15

2.3.2 Hugs ...... 15

2.3.3 Yhc and nhc98 ...... 15

2.3.4 Other implementations ...... 16

5 3 Syntactic analysis 18

3.1 Haskell code analysis tools ...... 18

3.1.1 Parsec ...... 19

3.1.2 Happy ...... 19

3.1.3 Alex ...... 19

3.1.4 Other tools ...... 19

3.2 Hasky — the base of the Haskell ...... 20

3.2.1 Language.Haskell ...... 21

3.2.2 The source code of the Haskell 98 libraries ...... 21

3.2.3 The Haskell module system ...... 22

4 Type evaluation 23

4.1 Haskell’s type system in more detail ...... 23

4.1.1 Monomorphism ...... 23

4.1.2 Polymorphism ...... 24

4.1.3 Type classes ...... 24

4.1.4 Algebraic data types ...... 25

4.1.5 Type inference ...... 25

4.2 Type-checking libraries ...... 26

4.2.1 GHC API ...... 26

4.2.2 Typing Haskell in Haskell ...... 26

4.3 Type evaluation in Hasky ...... 26

5 Conclusions and future work 28

6 Chapter 1

Introduction

Be careful, Haskell is addictive. . . The rst monad is free.

—SHAE ERISSON ∗

1.1 Why functional programming?

For many years, functional programming did not belong to the category of main- stream paradigms. In fact, it was considered by the general programming public as an academic plaything. But modern functional languages, such as Haskell, Scala, F#, and Erlang, have recently gained popularity. When the hardware be- came cheaper than the programmer’s work — because of Moore’s law — code readability and maintainability started being considered more important than processor time and memory consumption. And with the arrival of affordable multicore processors, parallelization has become the general issue.

There are some successful commercial projects which show that the use of func- tional programming can be a better way of writing programs than more estab- lished approaches (good examples are [1], [2, pages 6–8], and [3, pages 75–78]). The chief reason for this success lies in modularity that improves productivity in such large projects, which has been described in [4]. Besides that, because functional programming is based on lambda calculus, it is highly formal and thus suitable for applications in exact science. This is why functional programming matters and why we definitely need high-quality code analysis tools for these lan- guages.

∗One of the #haskell IRC channel founders (also known as shapr).

7 1.2 Structure of the thesis

The following chapter gives a cursory overview of the Haskell programming lan- guage, describes its history and compares its implementations.

The third chapter concerns itself with syntactic and lexical analysis issues. It features a description of current tools that are used for this purpose in Haskell. We will also introduce Hasky — our attempt to write the germ of the simplified Haskell interpreter.

And finally, the fourth chapter is about Haskell’s type system and about its im- plementation in Hasky.

8 Chapter 2

The Haskell language

Haskell is faster than ++, more concise than Perl, more regular than Python, more exible than Ruby, more typeful than C#, more robust than Java, and has absolutely nothing in common with PHP.

—AUDREY TANG ∗

The only supported paradigm in Haskell is functional programming which be- longs to the family of declarative paradigms. The declarative paradigm is fun- damentally in contrast to the imperative paradigm that includes procedural and object-oriented programming. In imperative-based programming, we describe the program as a sequence of commands that change its state. In functional programming, we evaluate expressions and try to achieve a term in the normal form (the result). The transformation from one expression to another is executed by substituting function names with their corresponding definitions. So we are concerned with applying functions to values, just like in mathematics.

Factorial computation is the classic example of a simple Haskell program. Let us define the factorial function fact as follows: fact 0 = 1 fact n = n * fact (n - 1)

This is a common technique for defining a function in Haskell called . The first definition says that the factorial of zero is one. The other defi- nition is a recursive one and defines the factorial of a non-zero number as the num- ber multiplied by the factorial of its predecessor. In other words, the factorial of a number n is defined as the product of all numbers that are less than or equal to n. This definition is not perfect — when we call the function with a negative

∗Author of Pugs, first functioning (and functional) implementation of Perl 6.

9 number, the program will get stuck in an infinite loop. There are better ways of defining the factorial function but this one is convenient for illustrative pur- poses.

If we want to know the factorial of a number, say 3, we call the fact function with this parameter and the expression is evaluated step-by-step using the defini- tion: fact 3 3 * fact (3 - 1)

3 * 2 * fact (2 - 1) ∗ 3 * 2 * 1 fact (1 - 1) ∗ 3 * 2 * 1 * 1 ∗ 3 * 2 * 1

3 * 2

6

As a programming language, Haskell has a lot of interesting features.

2.1 Main Haskell features

2.1.1 Lazy evaluation

The default evaluation strategy in Haskell is lazy evaluation. Along with normal evaluation, it is essentially opposite to the strict evaluation that is used in the ma- jority of programming languages. Instead of calling a function straight by value (or by similar way), non-strict approaches evaluate the function arguments only when it is necessary. This has several advantages — we can work with infinite structures and we do not have to evaluate unnecessary code.

The main difference between normal and lazy evaluation strategies is that lazy evaluation uses intermediate results, which is usually more effective but comes at the cost of added overhead. In cases where the overhead is greater than ben- efits of lazy evaluation, laziness can be suppressed directly in the data definition (by a unary operator !), or in the code by the standard function seq and by the in- fix operator $!. For more information about adding strictness to Haskell see [5]. Generally, it does not matter what evaluation strategy we choose, it has to pro- vide the same result, but there is no guarantee that every strategy attains it. More eager strategies can evaluate code that the other strategies omitted and get stuck in an infinite loop or raise an error. We can show the differences between strate- gies, for example, when we define the function dbl, that doubles its first argu- ment, as dbl x y = x + x and then we call it with some expressions.

10 Example of strict evaluation

Arguments are evaluated before the function application (the innermost first). dbl (3 * 7) (4096 * 1048576) dbl (3 * 7) 4294967296

dbl 21 4294967296

21 + 21

42

Example of normal evaluation

Arguments are evaluated after the function application (the outermost first). dbl (3 * 7) (4096 * 1048576) (3 * 7) + (3 * 7)

21 + (3 * 7)

21 + 21

42

Example of lazy evaluation

Arguments are evaluated after the function application and if the argument was already evaluated, the intermediate result is used. (It is executed in one step.) dbl (3 * 7) (4096 * 1048576) (3 * 7) + (3 * 7)

21 + 21

42

2.1.2 Purity and referential transparency

Every Haskell function should be pure (except for input and output, see below). A pure function is basically any function, that always returns the same result for an argument and does not implicate any side effects. And because everything in Haskell is pure, the language is referentially transparent, so every expression can be substituted with its value without side effects.

For instance, if we have functions sin and cos (trigonometric functions sine and cosine), we can write something like sin (pi / 2) + cos (pi / 2) which is the same as let x = pi / 2 in sin x + cos x. Trigonometric functions are

11 pure, therefore will always return the same result for the same argument (in this case is 1.0 + 0 which equals 1.0) and we need not worry about correctness.

Referential transparency has an impact on our way of programming e.g. there are no real variables in Haskell — we can define a shortcut for a value using a nullary function, but it is immutable. It also follows that every Haskell program has deterministic behaviour and thus can be easily verified and parallelized.

2.1.3 Strong type system with inference

The type system is an important part of every programming language. It de- scribes allowed types of values that can be used in the program.

Haskell has a strong static type system, which means that the code is type-checked during compilation and every expression must have a corresponding type decla- ration. The declaration need not be written by a programmer, it can be auto- matically inferred from previously defined functions. For instance, if we have the function addOne x = x + 1 that adds the number one to x, we can infer that the parameter x will be a number. This is considered safe and the programmer is not bothered with explicit type declaration of every function.

The type system is also polymorphic — the type definition can contain so-called type variables that represent an arbitrary type. The scope of a type variable can be restricted to one or more data types using the . This is useful, because many data types share similar attributes. For example, floating-point numbers are defined in the data types Float and Double (single and double precision) that are included in the type class Floating. So we do not need to think twice about floating-point precision and we can just use a type variable which is a member of the Floating class in order to make functions more generic.

2.1.4 Monadic input and output

In Haskell, monad is an that is used primarily for everything that changes its state. It does not have to be necessarily input and output of data files or streams, but it may be things like date and time manipulation, random number generation, exception handling, data communication with an external device — in general any function that cannot be pure.

The monad could be visualized as a wrapper around normal data types that can easily separate the impure parts from the pure ones. But a monadic function is not in fact a regular function, it is rather a definition containing a description of what it does. We could say that the monad describes an action. For instance,

12 we have the data type String, which is used for representing text, and the stan- dard library function getLine, which reads a line of text from the command line and passes it on. The return type of this function is not a regular String, but the IO String — that denotes the IO monad and these two types are com- pletely different. The real value of this function is assigned during evaluation time; loaded text has to be transformed to a regular type such as with the left arrow operator (line getLine) that binds the text to a definition. ←

This subject is very wide and there exist many sources of information about it — for example [6, 7, 8].

2.2 Evolution of Haskell

The history of development of Haskell has been described in [9] by the re- searchers themselves.

2.2.1 Early development

In the beginning of 1988, a meeting about functional programming took place. The design of a new standard common functional language was discussed and a committee to make decisions about it was formed. The language was named after Haskell Brooks Curry, the famous mathematician and logician. The initial language definition was published on 1st April 1990 and others followed soon afterwards:

Haskell 1.0 (1990) The first Haskell Language Report was implemented by the HBC compiler and later by GHC (see below). Instead of monads, stream- and continuation- based input and output were used. The syntax was similar to the Miranda language.

13 Haskell 1.1 (1991) The version that introduced the let expression and partially applied oper- ators. Haskell 1.2 (1992) This specification was published in the special issue of ACM SIGPLAN No- tices, a prestigious programming languages newsletter, together with the first Haskell textbook [10]. Haskell 1.3 (1996) Another four years of research bear fruit — this version presented the first comprehensive use of monads in functional programming. Also type classes and algebraic data types were improved. And because Haskell implemen- tations had been growing up, a description of the libraries was included to the report. Haskell 1.4 (1997) This report made improvements of the previous version and introduced monad comprehensions — an idea that was dropped at the following re- port. Unicode was chosen to be the default character set.

2.2.2 The stable Haskell 98

After more than ten years of cutting-edge development and enhancement, Haskell became stable. The Haskell 98 Report (February 1999) fully replaced the previous definitions. By that time, the committee was unofficially dismissed and the lan- guage was opened — everyone could make suggestions. The growing Haskell community stopped being limited to researchers from academia and the work on real projects in Haskell had begun. The report was updated and published in book form four years later [11] and is considered as a present de facto stan- dard.

2.2.3 Haskell Prime and the future of Haskell

Haskell Prime (Haskell’) † is a successor to Haskell 98 that will include various extensions to the stable version. These extensions should be backwards compati- ble. A draft was released in the beginning of the year 2007 and concrete decisions were made one year after. The date of completion is currently unknown.

The master plan to create a new common functional language turned out to be good. Haskell is used worldwide and became successful despite its unoffi- cial motto “avoid success at all costs” [12]. It also influenced other languages, †Specification available online at http://hackage.haskell.org/trac/haskell-prime/.

14 such as F#, Python, Perl, and Scala. Research on functional languages continues and there are new ideas developed every day.

2.3 Haskell implementations

In spite of the fact that Haskell is well defined, specifications are not rigorous (as described in [9, page 9]), which makes implementations slightly varied.

2.3.1 GHC

The Glorious Glasgow Haskell Compilation System is a state-of-the-art Haskell compiler and interpreter that fully supports Haskell 98 and includes more than a dozen extensions to it. It can compile Haskell programs to the C and C-- lan- guages or to native code for various platforms. The first version of this compiler was released on 1st April 1991, but the interpreter GHCi was provided addition- ally ten years after. In the latest versions, GHC developers are working on im- proving performance, mainly in parallel evaluation.

http://haskell.org/ghc/ •

2.3.2 Hugs

Hugs is based on Gofer, an experimental language and interpreter created by Mark Peyton Jones. Gofer is no longer maintained and was fully replaced by Hugs. The current version of Hugs (called Hugs 98) implements most of the Haskell 98 language definition and it also contains several extensions that are optional.

Hugs just interprets the (i.e. it does not create executable) and was the best available Haskell interpreter before GHCi was released.

http://haskell.org/hugs/ •

2.3.3 Yhc and nhc98

The nhc98 compiler complies with the Haskell 98 definition and includes a few extensions. This implementation of Haskell is very stable and lightweight. Un- fortunately it has bad support of Unicode and it is not available for 64-bit CPU

15 architectures, so it cannot be recommended for use in the contemporary pro- gramming world.

Yhc is a fork of the nhc98 compiler and is intensively developed, but it is not finished yet. Unlike nhc98, it has native support for 64-bit architectures. It can be run as an interpreter with Yhe (the Yhc evaluator). ‡

Yhc and nhc98 are closely connected with Hat, the Haskell Tracer. Hat is a de- bugging tool that trails function calls. It also supports GHC.

http://community.haskell.org/~ndm/yhc/ • http://haskell.org/nhc98/ •

2.3.4 Other implementations

HBC HBC was something like a prototype implementation of the Haskell lan- guage by Lennart Augustsson. It supports versions prior to Haskell 98 only, but it is complete and working.

http://www.cs.chalmers.se/~augustss/hbc/hbc. • Helium Helium is an interpreter of a dialect of Haskell that was designated for learn- ing purposes at the University of Utrecht. The self titled dialect includes just a part of the Haskell language — the type system is simplified in order to provide clearer error messages that do not confuse Haskell beginners. It was successfully integrated to the teaching process of functional program- ming as described in [13]. http://www.cs.uu.nl/wiki/Helium • EHC and UHC EHC/UHC is a set of experimental compilers that was also developed at the University of Utrecht. Code is passed on through a sequence of trans- formers presented by authors in [14]. This approach can make the whole compilation process easier, because code is translated during many simple transformations rather than a few complicated ones.

http://www.cs.uu.nl/wiki/bin/view/Ehc • ‡A GUI version of the interpreter also exists and is called Gyhe.

16 Jhc and LHC Jhc is a Haskell compiler written by John Meacham. It is focused on effi- ciency and optimization. LHC is a fork of Jhc project that later switched to the GHC front end. Both are actively developed. http://repetae.net/computer/jhc/ • http://lhc.seize.it/ • DDC The Disciplined Disciple Compiler is an implementation of the strict Haskell dialect Disciple that may have side effects outside world of monads. Lazi- ness is available only on demand and every object is mutable (i.e. can be modified). http://code.google.com/p/disciple/ •

17 Chapter 3

Syntactic analysis

In the programming-language world, one rule of survival is simple: dance or die. It is not enough to make a beautiful language. You must also make it easy for programs written in your beautiful language to interact with programs written in other languages.

—SIMON PEYTON JONES ∗ [7, page 36]

Every programming language is unambiguously defined by its grammar. Because most programming languages — including Haskell — are defined by non-regular grammar (they cannot be processed with ordinary regular expressions), a parser has to be used.

Syntactic analysis is a process where the parser transforms the code of a program into a structure that is appropriate for machine compilation. The structure is mostly in the form of a parse tree. Lexical analysis chops the code to small chunks of data called tokens and can be part of syntax analysis, but because the tokeniza- tion is handled with regular expressions, it is often separated.

3.1 Haskell code analysis tools

There are several Haskell tools that can be used for generating parsers or lexers (lexical analyzers). These tools are able to create an analyzer for an arbitrary lan- guage — it does not even need to be a programming language, it can be anything that we are able to describe with a formal grammar.

∗Apparently the most important person behind Haskell. He is a co-author of the language and worked as the lead designer of GHC.

18 3.1.1 Parsec

As a flagship among Haskell parser generators, Parsec supports both lexical and syntactic analysis. It does the job so well that implementations for other lan- guages were written, such as JParsec for Java or FParsec for F#. Parsing is based on higher order combinators, which are functions without free variables. Parsers are manipulated as first-class values and can be customized and combined [15]. Parsec is powerful, easy to use and is distributed with the major Haskell imple- mentations as a standard library nowadays.

http://legacy.cs.uu.nl/daan/parsec.html •

3.1.2 Happy

Happy is a standard generator of syntax analyzers that creates parsing functions from the grammar definition via its preprocessor. The grammar is specified in the Backus-Naur form (BNF) and we need to write our own lexer or gener- ate it with another tool, if we do not want to parse raw data. This functionality is similar to C language tools like yacc or bison. It is possible to define several parsers with multiple starting points in one file that contains BNF definitions.

http://haskell.org/happy/ •

3.1.3 Alex

If we do not want to write our own lexer for Happy (or a similar syntactic analysis tool) manually, we need a lexer generator. In C, we would use lex or flex; in Haskell, we would probably use Alex which is a complementary tool to Happy. Alex takes a set of regular expressions and generates Haskell code that matches input against those expressions and tokenizes it. Created tokens are used for further analysis.

http://haskell.org/alex/ •

3.1.4 Other tools

The standard way for code processing in Haskell is to generate a parser with Par- sec or with Happy in combination with Alex. But there are some interesting experimental tools that are not as popular as the above mentioned generators.

19 Frown As we can see from the project logo, † Frown is a parser generator directly inspired by Happy. It contains plenty of features — one stackless parsing output mode and three modes with a stack, predefined schemata, debug- ging and tracing. The latest version is a beta. http://www.informatik.uni-bonn.de/~ralf/frown/ • Polyparser Polyparser is not a single library, it is a whole set of parser generators. Much like Parsec, they consist of combinators, and each library is based on a different algorithm, which makes them suitable for a variety of tasks. http://www.cs.york.ac.uk/fp/polyparse/ • Frisby Frisby is a parser generator that utilizes the parsing expression grammar (PEG) which is more recent alternative to BNF. There is at least one ad- vantage to using PEG: the grammar is defined unambiguously, therefore every input has exactly one parse tree and we do not need any external lex- ers. But we have to pay the price — we cannot directly generate parsers as libraries. http://repetae.net/computer/frisby/ • Pappy Pappy is a prototype implementation of the packrat parsers generator. Pack- rat algorithm is based on an old unheeded tabular parsing algorithm and it is quite fast, which has been described in [16]. Generated parser handles both lexical and syntactic analysis. http://pdos.csail.mit.edu/~baford/packrat/thesis/ •

3.2 Hasky — the base of the Haskell interpreter

We are aimed at writing a Haskell stepwise interpreter for teaching purposes. The word “stepwise” means the ability to evaluate each reduction separately in- stead of only printing out the normal form of the input expression. As a part of this thesis, the first step towards such an interpreter was developed under the name Hasky. ‡ The program is able to load Haskell modules, parse them and manipulate the parsed code (see the appendix for the feature list). The im- plementation was designed to be easily expanded to the full-fledged interpreter.

†Just glance at the project pages. ‡Hasky is the Czech phonetic transcription of the word husky.

20 Hasky is written entirely in Haskell and it uses the parser Language.Haskell and the basic set of the Haskell 98 source code libraries.

3.2.1 Language.Haskell

Language.Haskell (distributed under the name haskell-src) is a Haskell library for Haskell code manipulation created by Simon Marlow. § It can be found inside the GHC package extralibs. The code for syntax analysis is generated with Happy, the lexical part is written by hand. Except for a few non-standard Haskell extensions, the parser is able to process any valid Haskell code. The parser takes a Haskell module and returns a parsed data structure or displays an error message if the parsing fails.

3.2.2 The source code of the Haskell 98 libraries

Once we have selected the parsing library, we need some code to parse. We can take the Haskell code from the package base that is distributed with most of the compilers. The package is very extensive — it contains 2 MB of a code and is adapted for particular Haskell implementations. It means that many preproces- sor directives are included for the purpose of assisting in generating completely different library code for every compiler.

But we want the lightweight way — we want to use only the libraries that are defined in [11]. The Haskell 98 Libraries Report defines the basic set of Haskell libraries, but does not specify the implementation of each function. For exam- ple, some IO functions such as openFile have to be implemented internally be- cause of their low-level functionality. And it is rather pointless to copy the code page by page from the Report so we have decided to use the code from the code- base of the project A Lexer for Haskell in Haskell ¶ by Thomas Hallgren. This is a completed version that substitutes the proper foreign declaration for every missing function body, and thus is nearly ready to use. The whole code takes 100 KB of disk space which means it is more appropriate for our purposes than the base library.

We had several problems with parsing some definitions. Firstly, it was the def- inition infixr 5 : that tells us that the list operator “:” has priority 5 and is infix and right-associative. The operator is built-in so it is not a legal Haskell code. Secondly, it was the data primitives; it is not a good idea to define them

§One of the principal developers of GHC, also the main developer of Happy, Alex, and Haddock, the Haskell documentation tool. ¶Available online at http://ogi.altocumulus.org/~hallgren/Talks/LHiH/base/tests/ HaskellLibraries/.

21 by enumeration. For instance, the number of elements in the numeric data type Int would be 2n (where n depends on the actual CPU architecture), and the enu- meration of the unlimited data type Integer is impossible. The data primitives defined in Haskell 98 are: Char, Int, Integer, Float, Double, Handle, and IO a. It is better to build them in too. And finally, the tuples. A tuple of each size have its own data constructor and tuples have to be defined at least up to size 15 (see [11, chapter 6.1.4]). For example, the pair (a tuple of size two) is defined as data (a,b) = (,) a b deriving (Eq, Ord, Bounded) which is not valid Haskell code either. The parser Language.Haskell supports tuples of any size so we do not need those definitions anyway.

3.2.3 The Haskell module system

In cases that we have an application that is even a little bit sophisticated, it is better to decompose it to individual smaller pieces. A Haskell module is that small part of a decomposed Haskell program. The whole specification of the module system in Haskell was described in [11, chapter 5]. There is also a formal analysis of the issue [17], but we are going to use just the basic Haskell code, so we want the simplest implementation of the module system that is possible.

Haskell supports visibility in modules: functions and data types can be hidden outside a module and it is possible to import just a part of the module. The idea is that we do not need these features right now; in case of need, we can add them easily later. We just load the whole module to the memory and all the functions from the currently loaded modules are visible globally.

Our simplified implementation of the Haskell module system uses two maps (as- sociative arrays) named MapMods and MapFuns. The first map contains loaded modules and the second one maintains a lookup table of functions in the individ- ual modules. We can simply find out where is the function located and grab its definition or type signature. The syntax Module.function is supported, so there should be no problems with ambiguity. After loading a module, Hasky will try to resolve the module dependencies. Each module imported by the current one is processed unless it is already loaded. As for the performance — after loading all of the standard Haskell 98 libraries, Hasky takes under 10 MB of RAM.

22 Chapter 4

Type evaluation

Whereas some declarative programmers only pay lip service to equational reasoning, users of functional languages exploit them every time they run a compiler, whether they notice it or not.

—PHILIP WADLER ∗ [18, page 12]

Because Haskell is a statically typed language, to verify the code being correct, we need to check that the types of arguments passed to functions match the relevant parts of the types of the functions. This is called type-checking. The first step to solve here is inferring the type of a given expression. We call this type evaluation.

4.1 Haskell’s type system in more detail

We have already given the reader an outline of the type system in Haskell in- side the initial description of the language (section 2.1.3). Let us take a closer look at the individual parts.

4.1.1 Monomorphism

As we said earlier, every expression in Haskell has to have its own type. The type of a particular expression is denoted by a double colon. For example, the notation 'x' :: Char signifies that the character 'x' has a type Char. In a similar way, if we apply the function toUpper from the module Char, that takes any character and returns its uppercase equivalent, to the character, it will have a type Char again. It is because the function itself has a type toUpper :: Char Char, → ∗The originator of the idea of type classes and monad application in functional programming.

23 which means that the function takes exactly one argument of the type Char and returns a value of the type Char. So the result of the application is 'X' which corresponds to the type toUpper 'x' :: Char.

The function toUpper is monomorphic. Every function that restricts each input and output value to one concrete type is monomorphic. For instance, if we have our own monomorphic function strLen that takes a string and returns the num- ber of characters, it will have a type strLen :: String Int. When we try to call that function with an argument that is not of type String→ , the type-checker will reject that input and throw an error. We can also be sure that every produced value from such function will be of the type Int.

4.1.2 Polymorphism

Monomorphism can sometimes be too restrictive, e.g. when we want to work with lists. A list is an ordered homogeneous data structure that may contain items of certain type. If we want to write a monomorphic function that takes a list and returns the number of elements in that list separately for each data type, it would be a tough proposition. It is better to make the function more generic and not to restrict to a particular input type.

That is the motivation for introduction of type variables. The type variable can represent an arbitrary type and every function that uses at least one such variable is considered to be polymorphic. Our previously mentioned function for count- ing of the elements in a list is a standard Haskell function length. The function has a type length :: [a] Int where a is the type variable and square brack- ets denote a list. In contrast→ to our function strLen, it can be used with a list of any type, for example with a list of numbers or with a list of other lists, whereas strLen works only with lists of characters (type [Char] † ).

4.1.3 Type classes

The term class was adopted from the object-oriented programming. A class is a group of objects that have similar characteristic. This concept can be used in a context of types too. For example, elementary arithmetic functions like addition, subtraction, or multiplication should be performed on every number, no matter if it is whole, rational, or real number. It is possible to define these operations with monomorphic functions, but it would be very laborious. And the polymorphism is not appropriate either — it does not make sense to multiply, for instance, the character 'x' by the character 'y'.

†The type String is just a type synonym for the type [Char].

24 The type class restricts the type variables to the particular set of types. We have all sorts of type classes, like Num for numbers, Integral for whole numbers, or Ord for ordered objects that can be compared. The restriction of type variables is denoted by the rightwards double arrow, for example the modulo function has a type mod :: (Integral a) a a a. ⇒ → →

4.1.4 Algebraic data types

It often happens that the built-in types in Haskell are not enough and we need a complex data structure. That is the reason for algebraic data types that consist from one or more data constructors and a constructor can optionally hold several data components. We can form our own with the keyword data or newtype.

For example, the standard Boolean data type is defined with two constructors and without any components. The definition is data Bool = True | False. It looks very similar to enumeration in other programming languages. Another ex- ample is the recursive definition of the list. The structure of the list can be defined as data List a = Cons a (List a) | Nil. In this case, the defined construc- tors are Cons and Nil and the components are a and List a. The second com- ponent is the recursive use of the type List a and the value held by constructor can be easily extracted by pattern matching. It is a very natural way of handling types.

We can also assign one or more type classes to each defined algebraic data type with the keyword deriving or instance. The type classes add some abilities to the data type, like ordering (the type class Ord) or mapping to a string (the type class Show). Furthermore, we can designate our own type name with the type keyword. For instance, the more intuitive name for a list of characters is the string that is defined as type String = [Char].

4.1.5 Type inference

Every function in Haskell should have its type declared. The type of the func- tion can be either manually written or automatically inferred. The problem with the inference is that the type of the function can be occasionally ambigu- ous and we need a little help from the programmer in that cases. Also, the in- ferred type is sometimes excessively generic and it is better to restrict it. That is the reason why we have to be cautious and check every function for the type consistency. If the type signature written by user is valid, we should prefer it to the inferred type of the function.

25 4.2 Type-checking libraries

The type-checking functionality is usually built directly into a compiler, but there are some possibilities how to use it separately.

4.2.1 GHC API

The latest versions of GHC provide the programming interface called GHC API. Through the interface we have access to the type-checking, code manipulation and evaluation functions. The intensive development is in motion (e.g. within the Google Summer of Code project) so it is rapidly changing. The seamy side of the fact is that the current documentation is very sparse, but this will hopefully be rectified soon. Also, the price for such rich interface is in the size of the output binary file which is very large (in the order of tens of megabytes), because it loads the bundle of GHC libraries into the file.

GHC API was used by the Haskell experimental interpreter hint. It is basically just a wrapper around the API that provides a control monad.

http://haskell.org/haskellwiki/GHC/As_a_library • http://projects.haskell.org/hint/ •

4.2.2 Typing Haskell in Haskell

For lack of a comprehensive description of the Haskell type system, a paper was written about it [19] with an attached type-checker. The code is sort of demon- stration that illustrates how the type system in Haskell works. A set of type- evaluated standard Haskell libraries is provided.

http://web.cecs.pdx.edu/~mpj/thih/ •

4.3 Type evaluation in Hasky

However, the described type-checking libraries are not compatible with the parse tree from the Language.Haskell library that we are using in Hasky. The GHC API uses the GHC internal code representation and the Typing Haskell in Haskell library works only with the parse tree in Hugs. Our possibilities how to do the type evaluation was to create our own implementation or modify the parse

26 tree. First, we have tried to implement the type-checker on our own, but it has ended in failure. In short, the process of implementing a type-checker for such a complex language is a very time-consuming job and we did not have so much time.

So we have decided to adjust the parse tree in order to type-check our code in the Typing Haskell in Haskell library. Our code matches some special cases (like [] or just the name of the function with a declared type signature) and other- wise recode the input to the Hugs form and call the inference function from the li- brary. It does not cover every possible input yet, but it is working.

The base of the type-checker is the Hindley-Milner type inference algorithm that was described in [20, chapter 22]. Basically, the algorithm just forms a system of equations from the particular type constraints and solve them. The solution is based on substitution and unification that are provided by custom monads. If the solution does not exist, the types do not match.

27 Chapter 5

Conclusions and future work

SQL, Lisp, and Haskell are the only programming languages that I've seen where one spends more time thinking than typing.

—PHILIP GREENSPUN ∗

We have described the functional programming language Haskell and tools that are used for the code analysis. Furthermore, we have presented the first part of Hasky, the Haskell interpreter. Hasky is able to parse Haskell modules and perform basic type evaluation of code. Our work is adequately documented with the Haddock documentation generator and we have also written a brief overview of the program.

What about future work? The most important objective is to write the part that is responsible for executing Haskell code. Without it, Hasky cannot be consid- ered to be a complete application, because the overall goal was to write the step- wise interpreter for teaching purposes. It is expected that the code execution part will be delineated in a separate thesis topic. Also, we need an improved user interface which would integrate features like auto-completion of function names and browsing in previously entered commands. This can be done using the GNU Readline library. It is possible to extend the interpreter with a graphi- cal or a web interface.

∗Famous developer, computer science teacher, and Internet businessman.

28 Bibliography

[1] Paul Graham. Beating the averages. In Franz Developer Symposium, Harvard University, March 2001. Also available at URL http://paulgraham.com/ avg.html (April 2009).

[2] Joe Armstrong. Erlang — a survey of the language and its industrial ap- plications. In The 9th Symposium and Exhibition on Industrial Applications of Prolog (INAP’96), Hino (Tokyo), October 1996. Also available at URL http://erlang.se/publications/inap96.ps (April 2009).

th [3] Janis Voigtländer, editor. Haskell Communities and Activities Report. 15 edition, November 2008. Also available at URL http://haskell.org/ communities/11-2008/html/report.html (April 2009).

[4] John Hughes. Why functional programming matters. Computer Journal, 32(2):98–107, 1989. Also available at URL http://www.cs.chalmers.se/ ~rjmh/Papers/whyfp.html (April 2009).

[5] Amanda Clare. Making Haskell programs faster and smaller, April 2002. Available at URL http://users.aber.ac.uk/afc/stricthaskell.html (April 2009).

th [6] Philip Wadler. The essence of functional programming. In 19 Symposium on Principles of Programming Languages, Albuquerque, January 1992. ACM Press. Also available at URL http://homepages.inf.ed.ac.uk/wadler/ papers/essence/essence.ps (April 2009).

[7] Simon Peyton Jones. Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell. In Tony Hoare, Manfred Broy, and Ralf Steinbruggen, editors, Engineering theo- ries of software construction, pages 47–96. IOS Press, 2001. Also available at URL http://research.microsoft.com/en-us/um/people/simonpj/ Papers/marktoberdorf/ (April 2009).

[8] Jeff Newbern. All about monads, 2003. Available at URL http:// haskell.org/all_about_monads/html/ (April 2009).

29 [9] Paul Hudak, John Hughes, Simon Peyton Jones, and Philip Wadler. A history of Haskell: Being lazy with class. In The Third ACM SIGPLAN History of Programming Languages Conference, San Diego, June 2007. Also available at URL http://research.microsoft.com/en-us/um/people/ simonpj/papers/history-of-haskell/ (April 2009).

[10] Paul Hudak, John Peterson, and Joseph Fasel. A gentle introduction to Haskell. SIGPLAN Notices, 27(5):1–52, May 1992. The updated version available at URL http://haskell.org/tutorial/.

[11] Simon Peyton Jones. Haskell 98 Language and Libraries: the Revised Re- port. Cambridge University Press, 2003. Also available at URL http: //haskell.org/onlinereport/ (April 2009).

[12] Duncan Coutts, Isaac Potoczny-Jones, and Don Stewart. Haskell: Batteries included. In ACM SIGPLAN 2008 Haskell Symposium, Victoria, Canada, September 2008. ACM Press. Also available at URL http://www.cse. unsw.edu.au/~dons/papers/CPJS08.html (April 2009).

[13] Bastiaan Heeren, Daan Leijen, and Arjan van IJzendoorn. Helium, for learning Haskell. In ACM Sigplan 2003 Haskell Workshop, pages 62– 71, New York, August 2003. ACM Press. Also available at URL http: //people.cs.uu.nl/bastiaan/papers.html#helium (April 2009).

[14] Atze Dijkstra, Jeroen Fokker, and S. Doaitse Swierstra. The structure of the essential Haskell compiler, or Coping with compiler complexity. In 19th International Symposium on Implementation and Application of Functional Languages, pages 107–122, Freiburg, Germany, September 2007. University of Kent. Also available at URL http://www.cs.uu.nl/wiki/bin/view/ Ehc/TheStructureOfTheEssentialHaskellCompiler (April 2009).

[15] Daan Leijen and Erik Meijer. Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-27, Department of Computer Science, Universiteit Utrecht, 2001. Also available at URL http://legacy.cs.uu.nl/daan/download/papers/ parsec-paper. (May 2009).

[16] Bryan Ford. Packrat parsing: a practical linear-time algorithm with back- tracking. Master’s thesis, Massachusetts Institute of Technology, September 2002.

[17] Iavor S. Diatchki, Mark P. Jones, and Thomas Hallgren. A formal specifi- cation for the Haskell 98 module system. In ACM SIGPLAN 2002 Haskell Workshop, Pittsburgh, October 2002. ACM Press. Also available at URL http://ogi.altocumulus.org/~hallgren/hsmod/ (May 2009).

30 [18] Philip Wadler. How to declare an imperative. ACM Computing Surveys, 29(3):240–263, September 1997. Also available at URL http://homepages.inf.ed.ac.uk/wadler/papers/monadsdeclare/ monadsdeclare.ps (April 2009).

[19] Mark P. Jones. Typing Haskell in Haskell. In Haskell99 Workshop, Paris, October 1999. Also available at URL http://web.cecs.pdx.edu/~mpj/ thih/ (April 2009).

[20] Benjamin C. Pierce. Types and Programming Languages. The MIT Press, 2002.

[21] Simon Peyton Jones, Philip Wadler, Peter Hancock, and David Turner. The Implementation of Functional Programming Languages. Prentice Hall, 1987. Also available at URL http://research.microsoft.com/en-us/ um/people/simonpj/papers/slpj-book-1987/ (April 2009).

[22] Simon Peyton Jones and David R. Lester. Implementing Func- tional Languages: a tutorial. Prentice Hall, 1992. Also available at URL http://research.microsoft.com/en-us/um/people/simonpj/ papers/pj-lester-book/ (April 2009).

[23] Simon Thompson. Haskell: The Craft of Functional Programming. Pearson Education Limited, 2nd edition, 1999.

[24] Bryan O’Sullivan, John Goerzen, and Donald Stewart. Real World Haskell. O’Reilly Media, 2009. Also available at URL http://book. realworldhaskell.org/read/ (April 2009).

31 Appendix — an overview of the Hasky interpreter

Hasky is provided with a text user interface for entering commands. The applica- tion shows the welcome message after start and tries to load the Prelude module from the directory haskell98, together with all dependencies.

EXAMPLE Hasky -- version 0.01 (Haskell interpreter) Loaded modules: Array, Char, Ix, List, Maybe, Numeric, Prelude, PreludeBuiltin, PreludeIO, PreludeList, PreludeText, Ratio.

Every command should be prefixed with the colon symbol, except for evaluating an expression, but the evaluation is not part of this work and therefore is not implemented yet. The overview is sorted alphabetically.

:f Show number of the currently loaded functions and modules. Also display list of all function names.

EXAMPLE > :f Number of the loaded modules: 2 Number of the loaded functions: 77 all and any asciiTab break chr concat concatMap cycle digitToInt drop dropWhile elem filter foldl foldl1 foldr foldr1 head init intToDigit isAlpha isAlphaNum isAlphanum isAscii isControl isDigit isHexDigit isLatin1 isLower isOctDigit isPrint isSpace isUpper iterate last length lexLitChar lines lookup map maxChar maximum minChar minimum

32 notElem null or ord product protectEsc readLitChar repeat replicate reverse scanl scanl1 scanr scanr1 showLitChar span splitAt sum tail take takeWhile toLower toUpper unlines unwords unzip unzip3 words zip zip3 zipWith zipWith3

:h or :? Show the help — list of all available commands.

EXAMPLE > :h Help:

evaluate the expression (does not work yet) :f show info about functions and modules :h this help :i show info about the function :l load the module :m show the parsed module (use carefully) :p parse the expression :q quit :r reload currently loaded modules :s show source code of the function :t the type of the expression :u unload the module

:i Show useful information about the function. For now, it is just a name of the module in which the function is defined.

EXAMPLE > :i zip Function zip is defined in modules: PreludeList.

:l Load the module and its dependencies if the module file exists. Also show the currently loaded modules.

EXAMPLE > :l List Loaded modules: Array, Char, Ix, List, Maybe, Numeric, Prelude, PreludeBuiltin, PreludeIO, PreludeList, PreludeText, Ratio.

33 :m Show the parsed content of the loaded module. This is just a testing feature and its input can be very voluminous.

EXAMPLE > :m Ratio HsModule (SrcLoc {srcFilename = "Ratio", srcLine = 3, srcColumn = 1}) (Module "Ratio") (Just [HsEAbs (UnQual (HsIdent "Ratio")), HsEAbs (UnQual (HsIdent "Rational")), --- snip --- (UnQual (HsIdent "simplest'"))) (HsVar (UnQual (HsIdent "d'")))) (HsVar (UnQual (HsIdent "r'")))) (HsVar (UnQual (HsIdent "d")))) (HsVar (UnQual (HsIdent "r"))))) []]]]]]

:p Show the parse tree for the expression. You can enter one or more Haskell statements when you separate them by semicolon.

EXAMPLE > :p "Hello world!" HsLit (HsString "Hello world!") > :p 1 + 10 HsInfixApp (HsLit (HsInt 1)) (HsQVarOp (UnQual (HsSymbol "+"))) (HsLit (HsInt 10)) > :p a^b HsInfixApp HsVar (UnQual (HsIdent "a")) HsQVarOp (UnQual (HsSymbol "^")) HsVar (UnQual (HsIdent "b")) > :p length $ "foo" ++ "bar" HsInfixApp HsInfixApp HsVar (UnQual (HsIdent "length")) HsQVarOp (UnQual (HsSymbol "$")) HsLit (HsString "foo") HsQVarOp (UnQual (HsSymbol "++")) HsLit (HsString "bar") > :p (\x -> x) 42; y = [x | x <- [1..]] HsApp HsParen HsLambda [HsPVar (HsIdent "x")] HsVar (UnQual (HsIdent "x")) HsLit (HsInt 42); HsPatBind HsPVar (HsIdent "y") HsUnGuardedRhs HsListComp HsVar (UnQual (HsIdent "x")) [HsGenerator HsPVar (HsIdent "x") HsEnumFrom HsLit (HsInt 1)] []

:q Quit the Hasky interpreter.

EXAMPLE > :q Woof!

34 :r Reload the currently loaded modules and show them all.

EXAMPLE > :r Reloading modules... Loaded modules: Array, Char, Ix, List, Maybe, Numeric, Prelude, PreludeBuiltin, PreludeIO, PreludeList, PreludeText, Ratio.

:s Show the source code of the function. The function has to be present in some of the loaded modules.

EXAMPLE > :s uncurry uncurry f p = f (fst p) (snd p) > :s (!!) xs !! n | n < 0 = error "Prelude.!!: negative index" [] !! _ = error "Prelude.!!: index too large" (x : _) !! n | n == 0 = x (_ : xs) !! n = xs !! (n - 1)

:t Show the type of the expression.

EXAMPLE > :t isNothing isNothing :: Maybe a -> Bool > :t (>>=) (>>=) :: m a -> (a -> m b) -> m b > :t [[1], [2, 3]] [[1, 2], [3]] :: [[Int]] > :t "Everybody likes huskies!" "Everybody likes huskies!" :: String

:u Unload the module if it is loaded. Also show the currently loaded modules.

EXAMPLE > :u Numeric Loaded modules: Array, Char, Ix, List, Maybe, Prelude, PreludeBuiltin, PreludeIO, PreludeList, PreludeText, Ratio.

35