Syntactic Analysis and Type Evaluation in Haskell
Total Page:16
File Type:pdf, Size:1020Kb
MASARYK UNIVERSITY }w¡¢£¤¥¦§¨ FACULTY OF I !"#$%&'()+,-./012345<yA|NFORMATICS Syntactic analysis and type evaluation in Haskell BACHELOR’S THESIS Pavel Dvoˇrák Brno, Spring 2009 Declaration I hereby declare that this thesis is my original work and that, to the best of my knowledge and belief, it contains no material previously published or written by another author except where the citation has been made in the text. ............................................................ signature Adviser: RNDr. Václav Brožek 2 Acknowledgements I am grateful to my adviser Vašek Brožek for his time, great patience, and excel- lent advices, to Libor Škarvada for his worthwhile lectures, to Marek Kružliak for the beautiful Hasky logo, to Miran Lipovaˇca for his cool Haskell tutorial, ∗ to Honza Bartoš, Jirka Hýsek, Tomáš Janoušek, Martin Milata, Peter Molnár, Štˇepán Nˇemec, Tomáš Stanˇek, and Matúš Tejišˇcák for their important remarks, to the rest of the Haskell community and to everybody that I forgot to mention. Thank you all. http://learnyouahaskell.com/ ∗Available online at . 3 Abstract The main goal of this thesis has been to provide a description of Haskell and basic information about source code analysis of this programming language. That includes Haskell standards, implementations, syntactic and lexical analysis, and its type system. Another objective was to write a library for syntactic and type analysis that can be used in further development. † Keywords Haskell, functional programming, code analysis, syntactic analysis, parsing, type evaluation, type-checking, type system †Source code available online at http://hasky.haskell.cz/. 4 Contents 1 Introduction 7 1.1 Why functional programming? . 7 1.2 Structure of the thesis . 8 2 The Haskell language 9 2.1 Main Haskell features . 10 2.1.1 Lazy evaluation . 10 2.1.2 Purity and referential transparency . 11 2.1.3 Strong type system with inference . 12 2.1.4 Monadic input and output . 12 2.2 Evolution of Haskell . 13 2.2.1 Early development . 13 2.2.2 The stable Haskell 98 . 14 2.2.3 Haskell Prime and the future of Haskell . 14 2.3 Haskell implementations . 15 2.3.1 GHC . 15 2.3.2 Hugs . 15 2.3.3 Yhc and nhc98 . 15 2.3.4 Other implementations . 16 5 3 Syntactic analysis 18 3.1 Haskell code analysis tools . 18 3.1.1 Parsec . 19 3.1.2 Happy . 19 3.1.3 Alex . 19 3.1.4 Other tools . 19 3.2 Hasky — the base of the Haskell interpreter . 20 3.2.1 Language.Haskell . 21 3.2.2 The source code of the Haskell 98 libraries . 21 3.2.3 The Haskell module system . 22 4 Type evaluation 23 4.1 Haskell’s type system in more detail . 23 4.1.1 Monomorphism . 23 4.1.2 Polymorphism . 24 4.1.3 Type classes . 24 4.1.4 Algebraic data types . 25 4.1.5 Type inference . 25 4.2 Type-checking libraries . 26 4.2.1 GHC API . 26 4.2.2 Typing Haskell in Haskell . 26 4.3 Type evaluation in Hasky . 26 5 Conclusions and future work 28 6 Chapter 1 Introduction Be careful, Haskell is addictive. The rst monad is free. —SHAE ERISSON ∗ 1.1 Why functional programming? For many years, functional programming did not belong to the category of main- stream paradigms. In fact, it was considered by the general programming public as an academic plaything. But modern functional languages, such as Haskell, Scala, F#, and Erlang, have recently gained popularity. When the hardware be- came cheaper than the programmer’s work — because of Moore’s law — code readability and maintainability started being considered more important than processor time and memory consumption. And with the arrival of affordable multicore processors, parallelization has become the general issue. There are some successful commercial projects which show that the use of func- tional programming can be a better way of writing programs than more estab- lished approaches (good examples are [1], [2, pages 6–8], and [3, pages 75–78]). The chief reason for this success lies in modularity that improves productivity in such large projects, which has been described in [4]. Besides that, because functional programming is based on lambda calculus, it is highly formal and thus suitable for applications in exact science. This is why functional programming matters and why we definitely need high-quality code analysis tools for these lan- guages. ∗One of the #haskell IRC channel founders (also known as shapr). 7 1.2 Structure of the thesis The following chapter gives a cursory overview of the Haskell programming lan- guage, describes its history and compares its implementations. The third chapter concerns itself with syntactic and lexical analysis issues. It features a description of current tools that are used for this purpose in Haskell. We will also introduce Hasky — our attempt to write the germ of the simplified Haskell interpreter. And finally, the fourth chapter is about Haskell’s type system and about its im- plementation in Hasky. 8 Chapter 2 The Haskell language Haskell is faster than C++, more concise than Perl, more regular than Python, more exible than Ruby, more typeful than C#, more robust than Java, and has absolutely nothing in common with PHP. —AUDREY TANG ∗ The only supported paradigm in Haskell is functional programming which be- longs to the family of declarative paradigms. The declarative paradigm is fun- damentally in contrast to the imperative paradigm that includes procedural and object-oriented programming. In imperative-based programming, we describe the program as a sequence of commands that change its state. In functional programming, we evaluate expressions and try to achieve a term in the normal form (the result). The transformation from one expression to another is executed by substituting function names with their corresponding definitions. So we are concerned with applying functions to values, just like in mathematics. Factorial computation is the classic example of a simple Haskell program. Let us define the factorial function fact as follows: fact 0 = 1 fact n = n * fact (n - 1) This is a common technique for defining a function in Haskell called pattern matching. The first definition says that the factorial of zero is one. The other defi- nition is a recursive one and defines the factorial of a non-zero number as the num- ber multiplied by the factorial of its predecessor. In other words, the factorial of a number n is defined as the product of all numbers that are less than or equal to n. This definition is not perfect — when we call the function with a negative ∗Author of Pugs, first functioning (and functional) implementation of Perl 6. 9 number, the program will get stuck in an infinite loop. There are better ways of defining the factorial function but this one is convenient for illustrative pur- poses. If we want to know the factorial of a number, say 3, we call the fact function with this parameter and the expression is evaluated step-by-step using the defini- tion: fact 3 3 * fact (3 - 1) 3 * 2 * fact (2 - 1) ∗ 3 * 2 * 1 fact (1 - 1) ∗ 3 * 2 * 1 * 1 ∗ 3 * 2 * 1 3 * 2 6 As a programming language, Haskell has a lot of interesting features. 2.1 Main Haskell features 2.1.1 Lazy evaluation The default evaluation strategy in Haskell is lazy evaluation. Along with normal evaluation, it is essentially opposite to the strict evaluation that is used in the ma- jority of programming languages. Instead of calling a function straight by value (or by similar way), non-strict approaches evaluate the function arguments only when it is necessary. This has several advantages — we can work with infinite structures and we do not have to evaluate unnecessary code. The main difference between normal and lazy evaluation strategies is that lazy evaluation uses intermediate results, which is usually more effective but comes at the cost of added overhead. In cases where the overhead is greater than ben- efits of lazy evaluation, laziness can be suppressed directly in the data definition (by a unary operator !), or in the code by the standard function seq and by the in- fix operator $!. For more information about adding strictness to Haskell see [5]. Generally, it does not matter what evaluation strategy we choose, it has to pro- vide the same result, but there is no guarantee that every strategy attains it. More eager strategies can evaluate code that the other strategies omitted and get stuck in an infinite loop or raise an error. We can show the differences between strate- gies, for example, when we define the function dbl, that doubles its first argu- ment, as dbl x y = x + x and then we call it with some expressions. 10 Example of strict evaluation Arguments are evaluated before the function application (the innermost first). dbl (3 * 7) (4096 * 1048576) dbl (3 * 7) 4294967296 dbl 21 4294967296 21 + 21 42 Example of normal evaluation Arguments are evaluated after the function application (the outermost first). dbl (3 * 7) (4096 * 1048576) (3 * 7) + (3 * 7) 21 + (3 * 7) 21 + 21 42 Example of lazy evaluation Arguments are evaluated after the function application and if the argument was already evaluated, the intermediate result is used. (It is executed in one step.) dbl (3 * 7) (4096 * 1048576) (3 * 7) + (3 * 7) 21 + 21 42 2.1.2 Purity and referential transparency Every Haskell function should be pure (except for input and output, see below). A pure function is basically any function, that always returns the same result for an argument and does not implicate any side effects. And because everything in Haskell is pure, the language is referentially transparent, so every expression can be substituted with its value without side effects.