Type-Inference Based Deforestation of Functional Programs

Type-Inference Based Deforestation of Functional Programs Von der FakultätfürMathematik, Informatik und Naturwissenschaften der Rheinisch-Westfälischen Technischen Hochschule Aachen zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigte Dissertation vorgelegt von Diplom-Informatiker Olaf Chitil aus Süchteln, jetzt Viersen Berichter: UniversitätsprofessorDr. rer. nat. Klaus Indermark UniversitätsprofessorDr. rer. nat. Michael Hanus Tag der mündlichen Prüfung: 27. Oktober 2000 ii Abstract In lazy functional programming modularity is often achieved by using intermediate data structures to combine separate parts of a program. Each intermediate data structure is produced by one part and consumed by another one. Deforestation optimises a functional program by transformation into a program which does not produce such intermediate data structures. In this thesis we present a new method for deforestation, which combines a known method, short cut deforestation, with a new analysis that is based on type inference. Short cut deforestation eliminates an intermediate list by a single, local transformation. In return, short cut deforestation expects both producer and consumer of the intermediate list in a certain form. Whereas the required form of the consumer is generally considered desirable in a well-structured program anyway, the required form of the producer is only a crutch to enable deforestation. Hence only the list-producing functions of the standard libraries were defined in the desired form and short cut deforestation has been confined to compositions of these functions. Here, we present an algorithm which transforms an arbitrary producer into the required form. Starting from the observation that short cut deforestation is based on a parametricity theorem of the second-order typed λ-calculus, we show how the construc- tion of the required form can be reduced to a type inference problem. Typability for the second-order typed λ-calculus is undecidable, but we only need to solve a partial type inference problem. For this problem we develop an algorithm based on the well-known Hindley-Milner type inference algorithm. The transformation of a producer often requires inlining of some function definitions. Type inference even indicates which function definitions need to be inlined. However, only limited inlining across module boundaries is practically feasible. Therefore, we extend the previously developed algorithm to split a function definition into a worker definition and a wrapper definition. We only need to inline the small wrapper definition, which transfers all information required for deforestation. The flexibility of type inference allows us to remove intermediate lists which original short cut deforestation cannot remove, even with hand-crafted producers. In contrast to most previous work on deforestation, we give a detailed proof of completeness and semantic correctness of our transformation. Acknowledgements My supervisor Prof. Klaus Indermark introduced me to the foundations of functional programming and taught me to precisely formulate my statements. He employed me at his chair, left me much freedom in selecting a subject for my thesis and showed trust in me all the time. Dr. Markus Mohnen introduced me to the area of optimisation through program transformation in general and deforestation in particular. The enthusiasm of Christian Tüffersfor his Diplomarbeit on deforestation raised my interest in the area. Many people in and outside Aachen discussed the topic of my thesis with me. Of these Frank Huch has to be mentioned first. We talked so often about my deforestation method and he had non-enumerably many ideas for improvement. Prof. Michael Hanus was always helpful and agreed to be second examiner although my thesis has no logical elements. Prof. Simon Peyton Jones pointed out several shortcomings in the first versions of my deforestation method and answered all my questions about the internals of the Glasgow Haskell compiler. Matthias Höff,Dr. Armin Kühnemannand I compared several deforestation techniques, thus improving my understanding of these techniques. This was possible because Armin and Prof. Heiko Vogler had invited me to Dresden. Prof. Patricia Johann and Prof. Franklyn Turbak read versions of the thesis and made numerous suggestions for improvement. All the members of the Lehrstuhl fürInformatik II and the Lehr- und Forschungsgebiet fürInformatik II at the RWTH Aachen made working there such a pleasure, especially my office mates Dr. George Botorog and Thomas Richert. Finally, working all these years on this thesis would not have been possible without the support of my friends and my family. I thank all of you. Contents 1 Introduction 1 1.1 Intermediate Data Structures for Modularity . 1 1.2 Deforestation . 2 1.3 Short Cut Deforestation . 3 1.4 Type Inference Identifies List Constructors . 5 1.5 Structure of the Thesis . 6 2 The Functional Language F 7 2.1 Syntax . 7 2.2 Substitutions and Renaming of Bound Variables . 10 2.2.1 Substitutions . 10 2.2.2 Renaming of Bound Variables . 13 2.3 Type System . 14 2.4 Evaluation of Programs . 18 2.5 Program Transformations . 22 2.6 Contextual Equivalence . 23 2.7 Kleene Equivalence . 26 2.8 The Short Cut Fusion Rule . 27 2.9 A Cost Model . 28 2.9.1 A Semantics for Lazy Evaluation . 29 2.9.2 Improvement . 33 3 Type-Inference Based Deforestation 35 3.1 List Abstraction through Type Inference . 35 3.1.1 Basic List Abstraction . 35 3.1.2 Polymorphic List Constructors . 37 3.2 Inlining of Definitions . 38 3.2.1 Determination of Variables that Need to be Inlined . 38 3.2.2 Instantiation of Polymorphism . 39 3.3 The Worker/Wrapper Scheme . 39 3.3.1 Functions that Produce Several Lists . 41 3.3.2 List Concatenation . 42 3.3.3 Recursively Defined Producers . 42 3.4 Functions that Consume their Own Result . 43 3.4.1 A larger example: inits ........................ 44 viii Contents 3.4.2 Deforestation Changes Complexity . 45 3.4.3 Inaccessible Recursive Arguments . 45 4 List Abstraction And Type Inference 47 4.1 The Instance Substitution . 47 4.2 List Replacement Phase . 49 4.2.1 Locally Bound Type Variables . 49 4.2.2 The List Replacement Algorithm . 49 4.2.3 Properties of the List Replacement Algorithm . 51 4.3 Type Inference Phase . 54 4.3.1 Idempotent Substitutions . 56 4.3.2 The Unification Algorithm ..................... 58 4.3.3 Consistency of Types of ConstructorU Variables . 65 4.3.4 Towards a Type Inference Algorithm . 68 4.3.5 The Type Inference Algorithm ................... 71 4.3.6 The Consistency Closure AlgorithmT . 80 4.3.7 Principal Consistent Typing . .C . 82 4.4 Instantiation of Variables and Abstraction Phase . 84 4.4.1 Superfluous Type Inference Variables . 84 4.4.2 Instantiation of Constructor Variables . 87 4.4.3 Merge of Constructor Variables . 88 4.4.4 Abstraction . 90 4.5 Properties and Implementation . 91 4.6 Alternative List Abstraction Algorithms . 92 5 The Deforestation Algorithm 95 5.1 The Worker/Wrapper Split Algorithm . 95 5.1.1 Non-Recursive Definitions . 95 5.1.2 Recursive Definitions . 96 5.1.3 Polymorphic Recursion . 98 5.1.4 Traversal Order . 100 5.2 The Complete Deforestation Algorithm . 101 5.2.1 Avoiding Inefficient Workers . 102 5.2.2 Consumers in foldr form . 102 5.2.3 Improvement by Short Cut Deforestation . 104 5.2.4 No Duplication of Evaluation | Linearity is Irrelevant . 107 5.3 Measuring an Example: Queens . 108 6 Extensions 115 6.1 A Worker/Wrapper Scheme for Consumers . 115 6.1.1 Scheme A . 116 6.1.2 Scheme B . 116 6.2 Deforestation Beyond Arguments of foldr . 117 6.3 More Efficient Algorithms . 118 6.4 Other Data Types than Lists . 119 7 Other Deforestation Methods 121 7.1 Short Cut Deforestation . 121 7.1.1 Gill's Worker/Wrapper Scheme . 121 7.1.2 Limitations of build . 122 7.1.3 Warm Fusion . 123 Contents ix 7.1.4 Short Cut Fusion in the Glasgow Haskell compiler . 124 7.2 Hylomorphism Fusion . 124 7.3 Unfold-Fold Transformations . 128 7.4 Attribute Grammar Composition . 130 7.5 Further Methods . 131 8 Conclusions 133 A Rules 149 B Standard List Functions Used in the Program Queens 151 Chapter 1 Introduction A primary feature of a good programming language for productive software development is that it supports writing of highly modular programs. A small program module which is weakly coupled with other modules can be both coded and debugged quickly. Fur- thermore, it can be reused in several different contexts. Even a complex task should be performed by a small module which uses other small modules. 1.1 Intermediate Data Structures for Modularity In functional programming functions are the basic modules from which programs are built. A powerful form of modularity is based on the use of intermediate data structures. Two separately defined functions can be glued together by an intermediate data structure that is produced by one function and consumed by the other. For example, the function any, which tests whether any element of a list xs satisfies a given predicate p, may be defined as follows in Haskell [PH+99]: any :: (a -> Bool) -> [a] -> Bool any p xs = or (map p xs) The function map applies p to all elements of xs yielding a list of boolean values. The function or combines these boolean values with the logical or operation (||). So we defined any by gluing together the producer map p xs and the consumer or by an intermediate list. We survey some more examples. A function anyTree shall test whether any element of a tree satisfies a given predicate. We can define this function by composing any with a general purpose function treeToList which simply produces a list of all elements stored in a tree.

Type-Inference Based Deforestation of Functional Programs

Structured Recursion for Non-Uniform Data-Types

Binary Search Trees

Recursive Type Generativity

Data Structures Are Ways to Organize Data (Informa- Tion). Examples

Recursive Type Generativity

Compositional Data Types

4. Types and Polymorphism

Computing Lectures

Table of Contents

Scrap Your Boilerplate: a Practical Design Pattern for Generic Programming

Lecture 5. Data Types and Type Classes Functional Programming

Implementing Reflection in Nuprl