<<

Conception, Evolution, and Application of Languages

PAUL HUDAK , Department of Computer Science, New Haven, Connecticut 06520

The foundations of functional programming languages are examined from both historical and technical perspectives. Their evolution is traced through several critical periods: early work on and combinatory calculus, Lisp, Iswim, FP, ML, and modern functional languages such as Miranda’ and Haskell. The fundamental premises on which the functional programming methodology stands are critically analyzed with respect to philosophical, theoretical, and pragmatic concerns. Particular attention is paid to the main features that characterize modern functional languages: higher-order functions, , equations and , strong static typing and , and data abstraction. In addition, current research areas-such as parallelism, nondeterminism, input/output, and state-oriented computations-are examined with the goal of predicting the future development and application of functional languages.

Categories and Subject Descriptors: D.l.l [Programming Techniques]: Applicative (Functional) Programming; D.3.2 [Programming Languages]: Language Classifications-applicative languages; data-flow languages; nonprocedural languages; very high-level languages; F.4.1 [Mathematical Logic and Formal Languages]: Mathematical Logic-lambda calculus and related systems; K.2 [History of Computing]: Software General Terms: Languages Additional Key Words and Phrases: Data abstraction, higher-order functions, lazy evaluation, referential transparency, types

INTRODUCTION ent kinds of machines increased, the need The earliest programming languages were arose for a common language with which to developed with one simple goal in mind: to program all of them. provide a vehicle through which one could Thus from primitive assembly lan- control the behavior of computers. Not sur- guages (which were at least a step up from prisingly, the early languages reflected the raw ) there grew a plethora of structure of the underlying machines fairly high-level programming languages, begin- well. Although at first blush that goal ning with in the 1950s. The seems eminently reasonable, the viewpoint development of these languages grew so quickly changed for two very good reasons. rapidly that by the 1980s they were best First, it became obvious that what was easy characterized by grouping them into fami- for a machine to reason about was not lies that reflected a common computation necessarily easy for a human being to rea- model or programming style. Debates over son about. Second, as the number of differ- which language or family of languages is best will undoubtedly persist for as long as 1 Miranda is a trademark of Research Software Ltd. computers need programmers.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1989 ACM 0360-0300/89/0900-0359 $01.50

ACM Computing Surveys, Vol. 21, No. 3, September 1989 360 .

CONTENTS the significant interest current researchers have in functional languages and the im- pact they have had on both the theory and pragmatics of programming languages in INTRODUCTION general. Spectrum Among the claims made by functional Referential Transparency and Equational language advocates are that programs can Reasoning Plan of Study be written quicker, are more concise, are 1. EVOLUTION OF FUNCTIONAL LANGUAGES higher level (resembling more closely tra- 1.1 Lambda Calculus ditional mathematical notation), are more 1.2 Lisp amenable to formal reasoning and analysis, 1.3 Iswim 1.4 APL and can be executed more easily on parallel 1.5 FP architectures. Of course, many of these fea- 1.6 ML tures touch on rather subjective issues, 1.7 SASL, KRC, and Miranda which is one reason why the debates can be 1.8 Dataflow Languages so lively. 1.9 Others 1.10 Haskell This paper gives the reader significant 2. DISTINGUISHING FEATURES OF MODERN insight into the very essence of functional FUNCTIONAL LANGUAGES languages and the programming method- 2.1 Higher Order Functions ology that they support. It starts with a 2.2 Nonstrict Semantics (Lazy Evaluating) 2.3 Data Abstraction discussion of the nature of functional lan- 2.4 Equations and Pattern Matching guages, followed by an historical sketch of 2.5 Formal Semantics their development, a summary of the dis- 3. ADVANCED FEATURES AND ACTIVE tinguishing characteristics of modern func- RESEARCH AREAS tional languages, and a discussion of 3.1 Overloading 3.2 Purely Functional Yet Universal I/O current research areas. Through this study 3.3 Arrays we will put into perspective both the power 3.4 Views and weaknesses of the functional program- 3.5 Parallel Functional Programming ming paradigm. 3.6 Caching and Memoization 3.7 Nondeterminism A Note to the Reader: This paper as- 3.8 Extensions to Polymorphic-Type Inference sumes a good understanding of the funda- 3.9 Combining Other Programming Language mental issues in programming language Paradigms design and use. To learn more about mod- 4. DISPELLING MYTHS ABOUT FUNCTIONAL ern functional programming techniques, PROGRAMMING 5. CONCLUSIONS including the important ideas behind rea- REFERENCES soning about functional programs, refer to Bird and Wadler [1988] or Field and Har- rison [1988]. To read about how to imple- ment functional languages, see Peyton Jones [ 19871 (additional references are The class of functional, or applicative, given in Sections 1.8 and 5). programming languages, in which compu- Finally, a comment on notation: Unless tation is carried out entirely through the otherwise stated, all examples will be writ- evaluation of expressions, is one such fam- ten in Haskell, a recently proposed func- ily of languages, and debates over its merits tional language standard [Hudak and have been quite lively in recent years. Are Wadler 19881. Explanations will be given functional languages toys? Or are they in square brackets [ ] as needed.’ tools? Are they artifacts of theoretical fan- tasy or of visionary pragmatism? Will they ‘Since the Haskell Report is relatively new, some ameliorate software woes or merely com- minor changes to the language may occur after this pound them? Whatever answers we might paper has appeared. An up-to-date copy of the Report have for these questions, we cannot ignore may be obtained from the author.

ACM Computing Surveys, Vol. 21, NO. 3, September 1989 Functional Programming Languages 361 Programming Language Spectrum in the functional language Haskell by Imperative languages are characterized as fat x 1 having an implicit state that is modified where fat n a (i.e., side effected) by constructs (i.e., com- = if n>O then fat (n-l) (a*n) mands) in the source language. As a result, else a such languages generally have a notion of in which the formal parameters n and a are sequencing (of the commands) to permit examples of carrying the state around ex- precise and deterministic control over plicitly, and the recursive structure has the state. Most, including the most pop- been arranged so as to mimic as closely as ular, languages in existence today are possible the looping behavior of the pro- imperative. gram given earlier. Note that the condi- As an example, the assignment state- tional in this program is an expression ment is a (very common) command, since rather than command; that is, it denotes a its effect is to alter the underlying implicit value (conditional on the value of the pred- store so as to a different binding for icate) rather than a sequencer of com- a particular variable. The begin . . . end mands. Indeed the value of the program is construct is the prototypical sequencer of the desired factorial, rather than it being commands, as are the well-known goto found in an implicit store. statement (unconditional transfer of con- Functional (in general, declarative) pro- trol), conditional statement (qualified se- gramming is often described as expressing quencer), and while loop (an example of a what is being computed rather than how, structured command). With these simple although this is really a matter of degree. forms, we can, for example, compute the For example, the above program may say factorial of the number X: less about how factorial is computed than the imperative program given earlier, but n:= x; is perhaps not as abstract as a := 1; while n>O do fat x begin a := a*n; where fat n n := n-l = if n==O then 1 end; else n*fac (n-l) [== is the infix operator for equality], which appears very much like the mathe- After execution of this program, the value matical definition of factorial and is indeed of a in the implicit store will contain the a valid functional program. desired result. Since most languages have expressions, In contrast, declarative languages are it is tempting to take our definitions liter- characterized as having no implicit state, ally and describe functional languages via and thus the emphasis is placed entirely on derivation from conventional programming programming with expressions (or terms). languages: Simply drop the assignment In particular, functional languages are dec- statement and any other side-effecting larative languages whose underlying model primitives. This approach, of course, is very of computation is the function (in contrast misleading. The result of such a derivation to, for example, the relation that forms the is usually far less than satisfactory, since basis for languages). the purely functional subset of most im- perative languages is hopelessly weak In a declarative language state-oriented (although there are important exceptions, computations are accomplished by carrying such as Scheme [Rees and Clinger 19861). the state around explicitly rather than im- Rather than saying what functional lan- plicitly, and looping is accomplished via guages don’t have, it is better to character- recursion rather than by sequencing. For ize them by the features they do have. For example, the factorial of x may be computed modern functional languages, those fea-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 362 . Paul Hudak tures include higher-order functions, lazy created by the where expression, such evaluation, pattern matching, and various as in the subexpression x+x. The same kinds of data abstraction-all of these fea- cannot generally be said of an imperative tures will be described in detail in this language, where we must first be sure that paper. Functions are treated as first-class no assignment to x is made in any of the objects, are allowed to be recursive, higher statements intervening between the initial order, and polymorphic, and in general are definition of x and one of its subsequent provided with mechanisms that ease their uses.4 In general this can be quite a tricky definition and use. Syntactically, modern task, for example, in the case in which functional languages have an equational procedures are allowed to induce nonlocal look in which functions are defined using changes to lexically scoped variables. mutually recursive equations and pattern Although the notion of referential trans- matching. parency may seem like a simple idea, the This discussion suggests that what is im- clean equational reasoning that it allows is portant is the functional programming very powerful, not only for reasoning for- style, in which the above features are man- mally about programs but also informally ifest and in which side effects are strongly in writing and debugging programs. A pro- discouraged but not necessarily eliminated. gram in which side effects are minimized This is the viewpoint taken, for example, but not eliminated may still benefit from by the ML community and to some extent equational reasoning, although naturally the Scheme community. On the other hand, more care must be taken when applying there is a very large contingency of purists such reasoning. The degree of care, how- in the functional programming community ever, may be much higher than we might who believe that purely functional lan- think at first: Most languages that allow guages are not only sufficient for general minor forms of side effects do not minimize computing needs but are also better because their locality lexically-thus any call to any of their “purity”. At least a dozen purely function in any module might conceivably functional languages exist along with their introduce a side effect, in turn invalidating implementations.3 The main property that many applications of equational reasoning. is lost when side effects are introduced is The perils of side effects are appreciated referential transparency; this loss in turn by the most experienced programmers in impairs equational reasoning, as described any language, although most are loathe to below. give them up completely. It remains the goal of the functional programming com- Referential Transparency and Equational munity to demonstrate that we can do com- Reasoning pletely without side effects, without sacrificing efficiency or modularity. Of The emphasis on a pure declarative style of course, as mentioned earlier, the lack of programming is perhaps the hallmark of side effects is not all there is to the func- the functional . The tional programming paradigm. As we shall term referentially transparent is often used soon see, modern functional languages rely to describe this style of programming, in heavily on certain other features, most no- which “equals can be replaced by equals”. tably higher-order functions, lazy evalua- For example, consider the (Haskell) expres- tion, and data abstraction. sion . . . x+x . . . Plan of Study wherex=fa Unlike many developments in computer The function application (f a) may be sub- science, functional languages have main- stituted for any free occurrence of x in the tained the principles on which they were 3 This situation forms an interesting contrast with the logic programming community, where is often 4 In all fairness, there are logics for reasoning about described as declarative (whereas Lisp is usually not), imperative programs, such as those espoused by Floyd, and there are very few pure logic programming lan- Hoare, Dijkstra, and Wirth. None of them, however, guages (and even fewer implementations). exploits any notion of referential transparency.

ACM Computing Surveys, Vol. 21, NO. 3, September 1989 Functional Programming Languages l 363 founded to a surprising degree. Rather than [1932-1933, 19411 on the lambda calculus. changing or compromising those ideas, Indeed the lambda calculus is usually modern functional languages are best clas- regarded as the first functional language, sified as embellishments of a certain set of although it was certainly not thought of as ideals. It is a distinguishing feature of mod- programming language at the time, given ern functional languages that they have so that there were no computers on which to effectively held on to pure mathematical run the programs. In any case, modern principles in a way shared by very few other functional languages can be thought of as languages. (nontrivial) embellishments of the lambda Because of this, we can learn a great deal calculus. about functional languages simply by It is often thought that the lambda cal- studying their evolution. On the other culus also formed the foundation for Lisp, hand, such a study may fail to yield a but this in fact appears not to be the case consistent treatment of any one feature [McCarthy 19781. The impact of the that is common to most functional lan- lambda calculus on early Lisp development guages, for it will be fractured into its man- was minimal, and it has only been very ifestations in each of the languages as they recently that Lisp has begun to evolve more were historically developed. For this reason toward lambda calculus ideals. On the other I have taken a three-fold approach to our hand, Lisp had a significant impact on the study: subsequent development of functional lan- First, Section 1 provides an historical guages, as will be discussed in Section 1.2. sketch of the development of functional Church’s work was motivated by the de- languages. Starting with the lambda calcu- sire to create a calculus (informally, a syn- lus as the prototypical functional language, tax for terms and set of rewrite rules for it gradually embellishes it with ideas transforming terms) that captured one’s as they were historically developed, lead- intuition about the behavior of functions. ing eventually to a reasonable technical This approach is counter to the considera- characterization of modern functional tion of functions as, for example, sets (more languages. precisely, sets of argument/value pairs), Next, Section 2 presents a detailed since the intent was to capture the compu- discussion of four important concepts- tational aspects of functions. A calculus is higher-order functions, lazy evaluation, a formal way for doing just that. data abstraction mechanisms, and equa- Church’s lambda calculus was the first tions/pattern matching-which are critical suitable treatment of the computational components of all modern functional lan- aspects of functions. Its type-free nature guages and are best discussed as indepen- yielded a particularly small and simple cal- dent topics. culus, and it had one very interesting prop- Section 3 discusses more advanced ideas erty, capturing functions in their fullest and outlines some critical research areas. generality: Functions could be applied to Then to round out the paper, Section 4 puts themselves. In most reasonable theories of some of the limitations of functional lan- functions as sets, this is impossible, since guages into perspective by examining some it requires the notion of a set containing of the myths that have accompanied their itself, resulting in well-known paradoxes. development. This ability of self-application is what gives the lambda calculus its power. It allows us to gain the effect of recursion without 1. EVOLUTION OF FUNCTIONAL explicitly writing a recursive definition. LANGUAGES Despite this powerful ability, the lambda calculus is consistent as a mathematical 1.1 Lambda Calculus system-no contradictions or paradoxes The development of functional languages arise. has been influenced from time to time by Because of the relative importance of the many sources, but none is as paramount lambda calculus to the development of nor as fundamental as the work of Church functional languages, I will describe it in

ACM Computing Surveys, Vol. 21, No. 3, September 1989 364 l Paul Hudak detail in the remainder of this section, using We say that x is free in e iff x E fu(e). modern notational conventions. The substitution [el/x]e2 is then defined inductively by 1.1.7 Pure Untyped Lambda Calculus The abstract syntax of the pure untyped lambda calculus (a name chosen to distin- guish it from other versions developed later) embodies what are called lambda Wxlk2 e3) = ([edxle2)([edxl4 expressions, defined by5 x E Id Identifiers e E Exp Lambda expressions where e ::= x 1el e2 I Xx.e hXj.e2, ifi=j

Expressions of the form Xx.e are called hX,..[edxile2, if i # j and xi $ fu(eJ abstractions and of the form (el e,) are called applications. It is the former that captures the notion of a function and the hXk.[el/xi]([xklxjle2), otherwise, latter that captures the notion of applica- tion of a function. By convention, applica- where k f i, k # j, tion is assumed to be left associative, so that (ei ez e3) is the same as ((ei en) e3). and xk 4 fu (el) U fu (e2) The rewrite rules of the lambda calculus depend on the notion of substitution of an The last rule is the subtle one, since it is expression ei for all free occurrences of an where a name conflict could occur and is identifier x in an expression e2, which we resolved by making a name change. The write as [el/x]e2.6 Most systems, including following example demonstrates applica- both the lambda calculus and predicate cal- tion of all three rules: culus, that use substitution on identifiers must be careful to avoid name conflicts. [y/x]((Xy.x)(Xx.x)x) = (Xz.y)(Xx.x)y Thus, although the intuition behind substi- tution is strong, its formal definition can To complete the lambda calculus, we de- be somewhat tedious. fine three simple rewrite rules on lambda To understand substitution, we must expressions: first understand the notion of the free vari- ables of an expression e, which we write as (1) ol-conversion (renaming): . fu(e) and define by the following simple rules: Xxi.e H Xxj.[xj/xi]e, where xj 4 fu(e). b(x) = 1x1 (2) p-conversion (application): fu(el e2) = fukl) U fuk2) fu(Xx.e) = fu(e) - (x) 0-3 k2 H k2lxh.

(3) q-conversion: 6 The notation d E D means that d is a typical element of the set D, whose elements may be distinguished by subscripting. In the case of identifiers, we assume that Xx.(e x) H e, if x 4 fu(e). each xi is unique; that is, xi # zj if i # j. The notation d ::= altl ) ah2 ] . . ] altn is standard BNF syntax. These rules, together with the standard 6 In denotational semantics the notation e[u/x] is used equivalence relation rules for reflexivity, to denote the function e’ that is just like e except that symmetricity, and transitivity, induce a e’ x = U. Our notation of placing the brackets in front of the expression is to emphasize that [u/x]e is a theory of convertibility on the lambda cal- syntactic transformation on the expression e itself. culus, which can be shown to be consistent

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 365 as a mathematical system.7 The well- sion. Obviously, we would like for that known Church-Rosser theorem [Church value to be unique; and we would like to be and Rosser 19361 (actually two theorems) able to find it whenever it exists. The is what embodies the strongest form of Church-Rosser theorems give us positive consistency and has to do with a notion of results for both of these desires. reduction, which is the same as convertibil- ity but restricted so that P-conversion 1.1.2 Church-Rosser Theorems and v-conversion only happen in one direction: Church-Rosser Theorem I (1) ,&reduction: If e. & e, then there exists an e2 such that e. A e2 and el + e2.’ (2) s-reduction: In other words, if e. and el are intracon- vertible, then there exists a third term (pos- Xx.(e x) * e, if x 4 fu(e). sibly the same as e. or el) to which they can both be reduced. We write e, & e2 if e2 can be derived from zero or more /3- or v-reductions or cu-con- versions; in other words 5 is the reflex- Corollary ive, transitive closure of + including No lambda expression can be converted to a-conversions. Similarly, & is the reflexive, two distinct normal forms (ignoring differ- transitive closure of w. In summary, * ences due to a-conversion). captures the notion of reducibility, and 45 captures the notion of intraconvertibility. One consequence of this result is that how we arrive at the normal form does not matter; that is, the order of evaluation Definition is irrelevant (this has important conse- A lambda expression is in normal form if quences for parallel evaluation strategies). it cannot be further reduced using p- or The question then arises as to whether or q-reduction. not it is always possible to find the normal form (assuming it exists). We begin with Note that some lambda expressions have some definitions. no normal form, such as (Xx. b x)1 ox. (x x)), Definition where the only possible reduction leads to A normal-order reduction is a sequential an identical term, and thus the reduction reduction in which, whenever there is more is nonterminating. than one reducible expression (called a Nevertheless, the normal form appears reder), the leftmost one is chosen first. In to be an attractive canonical form for a contrast, an applicative-order reduction is a term, has a clear sense of finality in a sequential reduction in which the leftmost computational sense, and is what we intu- innermost redex is chosen first. itively think of as the value of an expres- Church-Rosser Theorem II 7 The lambda calculus as we have defined it here is If e. & el and e, is in normal form, then what Barendregt [1984] calls the XKq-calculus and is slightly more general than Church’s original XK-cal- there exists a normal-order reduction from culus (which did not include p-conversion). Further- e. to el. more, Church originally showed the consistency of the XI-calculus [Church 19411, an even smaller subset (it a Church and Rosser’s original proofs of their theorems only allowed abstraction of x from e if x was free in are rather long, and many have tried to improve on e). We will ignore the subtle differences between these them since. The shortest proof I am aware of for the calculi-our version is the one most often discussed first theorem is fairly recent and aptly due to Rosser in the literature on functional languages. 119821.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 366 l Paul Hudak

This is a very satisfying result; it says anthropomorphically as being Church- that if a normal form exists, we can always Rosser. find it; that is, just use normal-order reduc- tion. To see why applicative-order reduc- 1.1.3 Recursion, X-Definability, and tion is not always adequate, consider the Church’s following example: Another nice property of the lambda cal- Applicative-order reduction culus is embodied in the following theorem: (Xx. y)((Xx. x x) (Xx. x x)) 4 (Xx. y)((Xx. x x)(Xx. x Lx)) Fixpoint Theorem a* Every lambda expression e has a fixpoint e’ such that (e e’) & e’. Normal-order reduction Proof. Take e ’ to be (Y e), where Y, (Xx.y) ((Xxxx) (Xx.xx)) known as the Y combinator, is defined by *Y Y = Xf.(Xx.f (x x))(hx.f (x x)) We will return to the trade-offs between Then we have normal- and applicative-order reduction in Section 2.2. For now we simply note (Ye) = (Xx.e(x x))(Ax.e(x x)) that the strongest completeness and con- = e((Xx.e(x x))(Xx.e(x x))) sistency results have been achieved with = e(Y e) normal-order reduction. In actuality, one of Church’s (and oth- This surprising theorem (and equally surprising simple proof) is what has earned ers’) motivations for developing the lambda Y the name “paradoxical combinator”. The calculus in the first place was to form a theorem is quite significant-it means that foundation for all of mathematics (in the way that, for example, set theory is claimed any recursive function may be written non- recursively (and nonimperatively)., To see to provide such a foundation). Unfortu- how, consider a recursive function f defined nately, all attempts to extend the lambda calculus sufficiently to form such a foun- by dation failed to yield a consistent theory. fF . . . f . . . Church’s original extended system was shown inconsistent by the Kleene-Rosser This could be rewritten as paradox [Kleene and Rosser 19351; a sim- f = (Xf. ... f .**)f pler inconsistency proof is embodied in what is known as the Curry paradox [Ros- where the inner occurrence of f is now ser 19821. The only consistent systems that bound. This equation essentially says that have been derived from the lambda calculus f is a fixpoint of the lambda expression are much too weak to claim as a foundation (Xf. * * * f e. s). But that is exactly what Y for mathematics, and the problem remains computes for us, so we arrive at the follow- open today. ing nonrecursive definition for f: These inconsistencies, although disap- f = Y(Xf. -0. f .**) pointing in a foundational sense, did not slow down research on the lambda calculus, As a concrete example, the factorial func- which turned out to be quite a nice model tion of functions and of computation in general. The Church-Rosser theorem was an ex- fat = Xn. tremely powerful consistency result for a if (n = 0) then 1 else (n * fac(n - 1)) computation model, and in fact rewrite sys- can be written nonrecursively as tems completely different from the lambda calculus are often described as “possessing fat = Y(hfac. An. the Church-Rosser property” or even if (n = 0) then 1 else (n * fac(n - 1)))

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 367 The ability of the lambda calculus to expressed as the composition of only two simulate recursion in this way is the key to simple functions, K and S. Curry 119301 its power and accounts for its persistence proved the consistency of a pure combina- as a useful model of computation. Church tory calculus, and with Feys [Curry and recognized this power, and is perhaps best Feys 19581 elaborated the theory consider- expressed in his now famous thesis: ably. Although this work deserves as much attention from a logician’s point of view as Church’s Thesis the lambda calculus, and in fact its origins predate that of the lambda calculus, we will Effectively computable functions from posi- not pursue it here since it did not contribute tive integers to positive integers are just directly to the development of functional those definable in the lambda calculus. languages in the way that the lambda This is quite a strong claim. Although calculus did. On the other hand, the com- the notion of functions from positive inte- binatory calculus was eventually to play gers to positive integers can be formalized a surprising role in the implementation precisely, the notion of effectively comput- of functional languages, beginning with able cannot; thus no proof can be given for Turner [1979] and summarized in Peyton the thesis. It gained support, however, from Jones [1987, Chapter 161. Kleene [1936] who in 1936 showed that Another noteworthy attribute of the X-definability was precisely equivalent to lambda calculus is its restriction to func- Godel and Herbrand’s notions of recursive- tions of one argument. That it suffices to ness. Meanwhile, Turing [1936] had been consider only such functions was first sug- working on his now famous Turing ma- gested by Frege in 1893 [van Heijenoort chine, and in 1937 [Turing 19371 he showed 19671 and independently by Schonfinkel in that Turing computability was also pre- 1924. This restriction was later exploited cisely equivalent to X-definability. These by Curry and Feys [ 19581, who used the were quite satisfying results.’ notation (f x y) to denote ((f 3~)y), which The lambda calculus and the Turing previously would have been written f (x, y). machine were to have profound impacts on This notation has become known as cur- programming languages and computational rying, and f is said to be a curried function. complexity,1° respectively, and computer As we will see, the notion of currying has science in general. This influence was prob- carried over today as a distinguishing syn- ably much greater than Church or Turing tactic characteristic of modern functional could have imagined, which is perhaps not languages. surprising given that computers did not There are several variations and embel- even exist yet. lishments of the lambda calculus. They will In parallel with the development of the be mentioned in the discussion of the point lambda calculus, Schonfinkel and Curry at which functional languages exhibited were busy founding combinatory logic. It similar characteristics. In this way we can was Schonfinkel [ 19241 who discovered the clearly see the relationship between the surprising result that any function could be lambda calculus and functional languages.

1.2 Lisp ‘Much later Post [1943] and Markov [1951] pro- posed two other formal notions of effective com- A discussion of the history of functional putability; these also were shown to be equivalent to X-definability. languages would certainly be remiss if it lo Although the lambda calculus and the notion of h- did not include a discussion of Lisp, begin- definability predated the Turing machine, complexity ning with McCarthy’s seminal work in the theorists latched onto the Turing machine as their late 1950s. fundamental measure of decidability. This is probably Although lambda calculus is often con- because of the appeal of the Turing machine as a machine, giving it more credibility in the emerging sidered as the foundation of Lisp, by arena of electronic digital computers. See Trakhten- McCarthy’s [ 19781 own account the lambda brot [1988] for an interesting discussion of this issue. calculus actually played a rather small role.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 368 l Paul Hudak

Its main impact came through McCarthy’s (4) The use of S-expressions (and abstract desire to represent functions anonymously, syntax in general) to represent both and Church’s h-notation was what he program and data.” chose: A lambda abstraction written Xx.e in lambda calculus would be written (lambda All four of these features are essential in- (x) e) in Lisp. gredients of any Lisp implementation to- Beyond that, the similarity wanes. For day; the first three are essential to example, rather than use the Y combinator functional language implementations as to express recursion, McCarthy invented well. the conditional expression” with which re- A simple example of a typical Lisp defi- cursive functions could be defined explicitly nition is the following one for mapcar: (and, arguably, more intuitively). As an (define mapcar (fun 1st) example, the nonrecursive factorial func- (if (null 1st) tion given in the lambda calculus in Section nil 1.1.3 would be written recursively in Lisp (cons (fun (car 1st)) in the following way: (mapcar fun (cdr 1st))) 1) (define fat (n) (if (= n 0) This example demonstrates all of the points mentioned above. Note that the function t* n Vat (- n 1))) )) fun is passed as an argument to mapcar. This and other ideas were described in Although such higher-order programming two landmark papers in the early 1960s was very well known in lambda calculus [McCarthy 1960; 19631 that inspired work circles, it was certainly a radical departure on Lisp for many years to come. from FORTRAN and has become one of McCarthy’s original motivation for de- the most important programming tech- veloping Lisp was the desire for an alge- niques in Lisp and functional programming braic list-processing language for use in (higher order functions are discussed more artificial intelligence research. Although in Section 2.1). The primitive functions symbolic processing was a fairly radical cons, car, cdr, and null are the well-known idea at the time, his aims were quite prag- operations on lists whose names are still matic. One of the earliest attempts at de- used today. cons creates a new list cell signing such a language was suggested by without burdening the user with explicit McCarthy and resulted in FLPL (FOR- storage management; similarly, once that TRAN-compiled list processing language), cell is no longer needed a “garbage collec- implemented in 1958 on top of the FOR- tor” will come along and reclaim it, again TRAN system on the IBM 704 [Gelernter without user involvement. For example, et al. 19601. During the next few years since mapcar constructs a new list from an McCarthy designed, refined, and imple- old one, in the call mented Lisp. His chief contributions dur- (mapcar fun (cons a (cons b nil))) ing this period were the following: the list (cons a (cons b nil)) will become (1) The conditional expression and its use garbage after the call and will automatically in writing recursive functions. be reclaimed. Lists were to become the par- (2) The use of lists and higher-order oper- adigmatic data structure in Lisp and early ations over lists such as mapcar. functional languages. (3) The central idea of a cons cell and the The definition of mapcar in a modern use of garbage collection as a method functional language such as Haskell would of reclaiming unused cells. appear similarly, except that pattern matching would be used to destructure the I1 The conditional in FORTRAN (essentially the only other programming language in existence at the time) I2 Interestingly, McCarthy [1978] claims that it was was a statement, not an expression, and was for con- the read and print routines that influenced this trol, not value-defining, purposes. notation most.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages 369 list: in the previous section used a conditional (not to mention arithmetic operators), yet mapcar fun [ ] = [I most readers probably understood it per- mapcar fun (xxs) = fun x : mapcar fun xs fectly and did not object to the departure [ [ ] is the null list and : is the infix operator from precise lambda calculus syntax. for cons; also note that function application The effect of conditional expressions can has higher precedence than any infix in fact be achieved in the lambda calculus operator.] by encoding the true and false values as McCarthy was also interested in design- functions, as well as by defining a function ing a practical language, and thus Lisp had to emulate the conditional: many pragmatic features-in particular, true = hx.Ay.x sequencing, the assignment statement, and other primitives that induced side effects false = Xx.Ay.y on the store. Their presence undoubtedly cond = Xp.Xc.Xa.(p a) had much to do with early experience with In other words, (cond p c a) = (if p then c FORTRAN. Nevertheless, in his early pa- else a). One can then define, for example, pers, McCarthy emphasized the mathemat- the factorial function by ical elegance of Lisp, and in a much later paper his student Cartwright demonstrated fuc = An. cond (= n 0) 1 (* n( fuc (- n 1))) the ease with which one could prove where = is defined by properties about pure Lisp programs [ Cartwright 19761. (=nn) *true Despite its impurities, Lisp had a great (= n m) + false, if m # n influence on functional language develop- where m and n range over the set of integer ment, and it is encouraging to note that constants. However, I am still cheating a modern Lisps (especially Scheme) have re- bit by not explaining the nature of the turned more to the purity of the lambda objects -, *, 0, 1, and so on, in pure lambda calculus rather than the ad hocery that calculus terms. It turns out that they can plagued the Maclisp era. This return to be represented in a variety of ways, essen- purity includes the first-class treatment of tially using functions to simulate the proper functions and the lexical scoping of identi- behavior, just as for true, false, and the fiers. Furthermore, the preferred modern conditional (for the details, see Church style of Lisp programming, such as es- [1941]). In fact any conventional data or poused by Abelson et al. [ 19851, can be control structure can be simulated in the characterized as being predominantly side- lambda calculus; if this were not the case, effect free. And, finally, note that Hender- it would be difficult to believe Church’s son’s [1980] Lispkit Lisp is a purely func- thesis. tional version of Lisp that uses an infix, Even if McCarthy knew of these ways to algebraic syntax. express things in the lambda calculus (there is reason to believe that he did not), effi- ciency concerns might have rapidly led him 1.2.1 Lisp in Retrospect to consider other alternatives, especially Before continuing the historical develop- since FORTRAN was the only high-level ment it is helpful to consider some of the programming language with which anyone design decisions McCarthy made and how had any experience. In particular, FOR- they would be formalized in terms of the TRAN functions evaluated their argu- lambda calculus. It may seem that condi- ments before entering the body of the tional expressions, for example, are an ob- function, resulting in what is often called a vious feature to have in a language, but that strict, or call-by-value, evaluation policy, only reflects our familiarity with modern corresponding roughly to applicative-order high-level programming languages, most of reduction in the lambda calculus. With this which have them. In fact the lambda cal- strategy extended to the primitives, includ- culus version of the factorial example given ing the conditional, we cannot easily define

ACM Computing Surveys, Vol. 21, No. 3, September 1989 370 l Paul Hudak recursive functions. For example, in the for which various &rules apply, such as the above definition of factorial all three argu- following: ments to cond would be evaluated, includ- ingfac (- n l), resulting in nontermination. (= 0 0) + True Nonstrict evaluation, corresponding to (= 0 1) + False the normal-order reduction that is essential to the lambda calculus in realizing recur- sion, was not very well understood at the time-it was not at all clear how to imple- ment it efficiently on a conventional von (+ 0 0) * 0 Neumann computer-and we would have (+ 0 1) * 1 to wait another 20 years or so before such an implementation was even attempted.13 The conditional expression essentially allowed one to invoke normal-order, or nonstrict, evaluation selectively. Stated (+ 27 32) + 59 another way, McCarthy’s conditional, although an expression, was compiled into code that essentially controlled the reduc- tion process. Most imperative program- (If True el e2) * e, ming languages today that allow recursion (If False el e2) + e2 do just that, and thus even though such languages are often referred to as strict, they all rely critically on at least one nonstrict construct: the conditional. (Car (Cons el e2)) * el 1.2.2 Lambda Calculus with Constants (Cdr (Cons el e2)) * e2 The conditional expression is actually only one example of very many primitive func- tions that were included in Lisp. Rather than explain them in terms of the lambda where =, +, 0, 1, If, True, False, Cons, and calculus by a suitable encoding (i.e., com- so on, are elements of Con. pilation), it is perhaps better to extend the The above rules can be shown to be a lambda calculus formally by adding a set of conservative extension of the lambda cal- constants along with a set of what are usu- culus, a technical term that in our context ally called &rules, which state relationships essentially means that convertible terms in between constants and effectively extend the original system are still convertible the basis set of a-, p-, and a-reduction rules. in the new, and (perhaps more importantly) For example, the reduction rules for = given inconvertible terms in the original system earlier (and repeated below) are &rules. are still inconvertible in the new. In gen- This new calculus, often called the lambda eral, care must be taken when introducing calculus with constants, can be given a d-rules, since all kinds of inconsistencies precise abstract syntax: could arise. For a quick and dirty example of inconsistency, we might define a primi- x E Id Identifiers tive function over integers called broken c E Con Constants with the following d-rules: e E Exp Lambda expressions (broken 0) * 0 where e ::= x 1c 1el e2 1Xx.e (broken 0) 4 1 In On the other hand, the call-by-name evaluation which immediately implies that more than strategy invented in ALGOL had very much of a normal-order reduction flavor. See Wadsworth [1971] one normal form exists for some terms, and Wegner [1968] for early discussions of these violating the first Church-Rosser theorem issues. (see Section 1.1).

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 371 As a more subtle example, suppose we significant syntactic and semantics ideas define the logical relation Or by [Landin 19661. Iswim, according to Landin, “can be looked on as an attempt to deliver (Or True e) 4 True Lisp from its eponymous commitment to (Or e True) 4 True lists, its reputation for hand-to-mouth stor- (Or False False) + Fake age allocation, the hardware dependent fla- Although these rules form a conservative vor of its pedagogy, its heavy bracketing, extension of the lambda calculus, a com- and its compromises with tradition.” When mitment to evaluate either of the argu- all is said and done, the primary contri- ments may lead to nontermination (even if butins of Iswim, with respect to the the other argument may reduce to True). development of functional languages, are In fact, it can be shown that with the above the following: rules there does not exist a deterministic (1) Syntactic innovations sequential reduction strategy that will (4 The abandonment of prefix syntax guarantee that the normal form True will in favor of infix. be found for all terms having such a normal form, and thus the second Church-Rosser (b) The introduction of let and where property is violated. This version of Or is clauses, including a notion of si- often called the parallel or, since a parallel multaneous and mutually recursive reduction strategy is needed to implement definitions. it properly (and with which the first (c) The use of an off-side rule based Church-Rosser theorem will at least hold). on indentation rather than sepa- Alternatively, we could define a sequential rators (such as commas or semi- or by colons) to scope declarations and expressions. For example (using (Or True e) + True Haskell syntax), the program frag- (Or False e) + e ment which can be shown to satisfy both e where f x = x Church-Rosser theorems. ab=l cannot be confused with 1.3 lswim e where f x = x a Historically speaking, ’s work b=l in the mid 1960s was the next significant impetus to the functional programming and is equivalent to what might paradigm. Landin’s work was deeply influ- otherwise be written as enced by that of Curry and Church. His e where {f x = x; a b = 1) early papers discussed the relationship between lambda calculus and both ma- (2) Semantic innovations chines and high-level languages (specifi- (4 An emphasis on generality. Landin cally ALGOL 60). Landin [1964] discussed was half serious in hoping that the how one could mechanize the evaluation of Iswim family could serve as the expressions through an abstract machine “next 700 programming languages.” called the SECD machine; in Landin [ 19651 Central to his strategy was the idea he formally defined a nontrivial subset of of defining a syntactically rich lan- ALGOL 60 in terms of the lambda calculus. guage in terms of a very small but It is apparent from his work that Landin expressive core language. regarded highly the expressiveness and pu- (b) An emphasis on equational reason- rity of the lambda calculus and at the same ing (i.e., the ability to replace equals time recognized its austerity. Undoubtedly with equals). This elusive idea was as a result of this work, in 1966 Landin backed up with four sets of rules for introduced a language (actually a family of reasoning about expressions, dec- languages) called Iswim (for If You See larations, primitives, and problem- What I Mean), which included a number of oriented extensions.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 372 . Paul Hudak (c) The SECD machine as a simple Landin’s emphasis on expressing what abstract machine for executing the desired result is, as opposed to saying functional programs. how to get it, and his claim that Iswim’s declarative14 style of programming was bet- We can think of Landin’s work as ex- ter than the incremental and sequential tending the lambda calculus with constants imperative style were ideas to be echoed by defined in the last section so as to include functional programming advocates to this more primitives, each with its own set of 6- day. On the other hand, it took another 10 rules, but more importantly let and where years before interest in functional lan- clauses, for which it is convenient to intro- guages was to be substantially renewed. duce the syntactic category of declarations: One of the reasons is that there were no e E Exp Expressions decent implementations of Iswim-like lan- guages around; this reason, in fact, was to where e ::= . +. ] e where dl . . . d, persist into the 1980s. 1let dI . . . d, in e d E Decl Declarations 1.4 APL where d ::= x = e Iverson’s [1962] APL, although not a 1 xx1 --. xn=e purely functional programming language, is worth mentioning because its functional and for which we then need to add some subset is an example of how we could axioms (i.e., reduction rules) to capture the achieve functional programming without desired semantics. Landin proposed special relying on lambda expressions. In fact, constructs for defining simultaneous and Iverson’s design of APL was motivated out mutually recursive definitions, but we will of his desire to develop an algebraic pro- take a simpler and more general approach gramming language for arrays, and his orig- here: We assume that a of decla- inal work used an essentially functional rations dI -. . d, always is potentially notation. Subsequent development of APL mutually recursive-if it isn’t, our rules resulted in several imperative features, but still work: the underlying principles should not be overlooked. (let d, . . . d, in e) APL was also unique in its goal of suc- + (e where dI - - . d,) cinctness, in that it used a specially de- (xx1 a.+ x,=e) signed alphabet to represent programs- * (x = Xx1.Xx2. . . - Xx,.e) each letter corresponding to one operator. (e where x1 = el) That APL became popular is apparent in 3 (Xx,.e)( YXxl.el) the fact that many keyboards, both for (e where (x, = ei) . . . (x, = e,)) typewriters and computer terminals, car- =a (Xxl.e)(YXxl.el) ried the APL alphabet. Backus’ FP, which where x2 = (Xxl.ez)(YXrl.e,) came after APL, was certainly influenced by the APL philosophy, and its abbreviated publication form also used a specialized x, = (Xx1. e,)(YXxl.el) alphabet (see the example in Section 1.5). In fact FP has much in common with APL, These rules are semantically equivalent to the primary difference being that FP’s fun- Landin’s, but they avoid the need for a damental data structure is the sequence, tupling operator to handle mutual recur- whereas APL’s is the array. sion and they use the Y combinator (de- It is worth noting that recent work on fined in Section 1.1) instead of an iterative APL has revived some of APL’s purely unfolding step. functional foundations. The most notable We will call this resulting system the recursive lambda calculus with constants, or I4 Landin actually disliked the term “declarative,” just recursive lambda calculus. preferring instead “denotative.”

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 373 work is that of Tu [Tu 1986; Tu and Perlis in implementation and expressiveness, be- 19861, who designed a language called FAC, cause the model was not suitably history for Functional Array Calculator (presum- sensitive (more specifically, it did not han- ably a take off on Turner’s KRC, Kent dle large data structures such as Recursive Calculator). FAC is a purely very easily). With regard to implementa- functional language that adopts most of the tion, this argument is certainly understand- APL syntax and programming principles able because it was not clear how to but also has special features to allow pro- implement the notion of substitution in an gramming with infinite arrays; naturally, efficient manner nor was it clear how to lazy evaluation plays a major role in the structure data in such ways that large data semantics of the language. Another inter- structures could be implemented efficiently esting approach is that of Mullin [1988]. (both of these issues are much better under- stood today). With regard to expressive- 1.5 FP ness, that argument is still a matter of debate today. In any case, these problems Backus’ FP was one of the earliest func- were the motivation for Backus’ Applica- tional languages to receive widespread at- tive State Transition (AST) Systems, in tention. Although most of the features in which state is introduced as something on FP are not found in today’s modern func- which purely functional programs interact tional languages, Backus’ [ 19781 Turing with in a more traditional (i.e., imperative) Award lecture was one of the most influ- way. ential and now most-often cited papers Perhaps more surprising, and an aspect extolling the functional programming par- of the paper that is usually overlooked, adigm. It not only stated quite eloquently Backus had this to say about lambda- why functional programming was “good” calculus based systems: . but also quite vehemently why traditional was Ubad,?.15 An FP system is founded on the use of a fixed set Backus’ coined the term “word-at-a-time of combining forms called functional forms. . . In programming” to capture the essence of contrast, a lambda-calculus based system is founded imperative languages, showed how such on the use of the lambda expression, with an asso- ciated set of substitution rules for variables, for languages were inextricably tied to the von building new functions. The lambda expression Neumann machine, and gave a convincing (with its substitution rules) is capable of defining argument why such languages were not all possible computable functions of all possible going to meet the demands of modern soft- types and of any number of arguments. This free- ware development. That this argument was dom and power has its disadvantages as well as its being made by the person who is given the obvious advantages. It is analogous to the power of most credit for designing FORTRAN and unrestricted control statements in conventional lan- who also had significant influence on the guages: with unrestricted freedom comes chaos. If development of ALGOL led substantial one constantly invents new combining forms to suit weight to the functional thesis. The expo- the occasion, as one can in the lambda calculus, one will not become familiar with the style or useful sure given to Backus’ paper was one of the properties of the few combining forms that are best things that could have happened to the adequate for all purposes. field of functional programming, which at the time was certainly not considered main- Backus’ argument, of course, was in the . context of garnering support for FP, which Despite the great impetus Backus’ paper had a small set of combining forms that gave to functional programming, it is inter- were claimed to be sufficient for most pro- esting to note that in the same paper gramming applications. One of the advan- Backus also said that languages based on tages of this approach is that each of these lambda calculus would have problems, both combining forms could be named with par- ticular brevity, and thus programs become i5 Ironically, the Turing Award was given to Backus quite compact-this was exactly the ap- in a large part because of his work on FORTRAN. proach taken by Iverson in designing APL

ACM Computing Surveys, Vol. 21, No. 3, September 1989 374 l Paul Hudak

(see Section 1.4). For example, an FP pro- higher order functions and user-defined gram for inner product looks like datatypes are allowed. Its authors still emphasize the algebraic style of reasoning Def IP = (/+) 0 (ax) 0 Trans that is paramount in FP, although it is where /, 0, and LY are combining forms also given a denotational semantics that is called insert, compose, and apply-to-all, re- probably consistent with respect to the spectively. In a modern functional language algebraic semantics. such as Haskell this would be written with slightly more verbosity as 1.6 ML ip 11 12 = fold1 (+) 0 (map2 (*) 11 12) In the mid 197Os, at the same time Backus [In Haskell an infix operator such as + may was working on FP at IBM, several re- be passed as an argument by surrounding search projects were underway in the it in parentheses.] Here fold1 is the equiv- that related to functional alent of insert (/), and map2 is a two-list programming, most notably work at Edin- equivalent of apply-to-all (a), thus elimi- burgh. There Gordon et al. [ 19791 had been nating the need for Trans. These functions working on a proof-generating system are predefined in Haskell, as they are in called LCF for reasoning about recursive most modern functional languages, for the functions, in particular in the context of same reason that Backus argues-they are programming languages. The system con- commonly used. If they were not, they could sisted of a deductive calculus called PPX easily be defined by the user. For example, (polymorphic predicate calculus) together we may wish to define an infix composition with an interactive programming language operator for functions, the first of which is called ML, for metalanguage (since it binary, as follows: served as the command language for LCF). LCF is quite interesting as a proof sys- (f.0.g) XY =f(gxy) tem, but its authors soon found that ML [Note how infix operators may be defined was also interesting in its own right, and in Haskell; operators are distinguished lex- they proceded to develop it as a stand-alone ically by being nonalphabetic.] With this functional programming language [Gordon we can reclaim much of FP’s succinctness et al. 19781. That it was, and still is, called in defining ipp: a functional language is actually somewhat misleading, since it has a notion of refer- lp = fold1 (+) 0 .o. map2 (*) ences that are locations that can be stored [Recall that in Haskell, function applica- into and read from, much as variables are tion has higher precedence than infix op- assigned and read. Its I/O system also in- erator application.] It is for this reason, duces side effects and is not referentially together with the fact that FP’s specializa- transparent. Nevertheless, the style of pro- tion precluded the generality afforded by gramming that it encourages is still func- user-defined higher order functions (which tional, and that is the way it was promoted is all that combining forms are), that mod- (the same is somewhat true for Scheme, ern functional languages did not follow the although to a lesser extent). FP style. As we shall soon see, certain other More recently a standardization effort kinds of became more pop- for ML has taken place, in fact taking some ular instead (such as pattern matching, list of the good ideas of Hope [Burstall et al. comprehensions, and sections). 19801 (such as pattern matching) along Many extensions to FP have been pro- with it, yielding a language now being called posed over the years, including the inclu- Standard ML, or SML [Milner 1984; sion of strong typing and abstract datatypes Wikstrom 19881. [Guttag et al. 19811. In much more recent ML is a fairly complete language-cer- work, Backus et al. [1986] have designed tainly the most practical functional lan- the language FL, which is strongly (al- guage at the time it appeared-and SML is though dynamically) typed and in which even more so. It has higher order functions,

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 375 a simple I/O facility, a sophisticated mod- 1.6.1 Hindley-Milner ule system, and even exceptions. But by far We can introduce types into the pure the most significant aspect of ML is its lambda calculus by first introducing a do- type system, which is manifested in several main of types, say BasTyp, as well as ways: a domain of derived types, say Typ, and then requiring that every expression be (1) It is strongly and statically typed. tagged with a member of Typ, which we do (2) It uses type inference to determine the by superscripting, as in e’. The derived type of every expression, instead of re- type r2 + 71 denotes the type of all func- lying on excplicit type declarations. tions from values of type 72 to values of (3) It allows polymorphic functions and type TV, and thus a proper application will data structures; that is, functions may have the form e;P-’ eF)T1. Modifying the take arguments of arbitrary type if in pure lambda calculus in this way, we arrive fact the function does not depend on at the pure typed lambda calculus: that type (similarly for data struc- bg Basic types tures) . TE Derived types (4) It has user-defined concrete and ab- stract datatypes (an idea actually bor- where T ::= b 171 + 72 rowed from Hope and not present in xT E Id Typed identifiers the initial design of ML). e E Exp Typed lambda expressions ML was the first language to use type in- where e ::= x7 ference as a semantically integrated com- e;2-Tle;z)T1 ponent of the language, and at the same 1 ~Xx’2.e’l)‘*-rl time its type system was richer than any previous statically typed language in for which we then provide the following that it permitted true polymorphism. It reduction rules: seemed that the best of two worlds had (1) Typed-oc-conversion: been achieved-not only is making explicit type declarations (a sometimes burdensome (xx;1 . eT) w (Xx;l . [x;‘/xi1]e7), task) not required, but in addition a pro- where xi’ 4 fu(e’). gram that successfully passes the type in- ferencer is guaranteed not to have any type (2) Typed-P-conversion: errors. Unfortunately, this idyllic picture is ((xxrz . e;l)e;l) et3 [e;2/xT2]e;‘. not completely accurate (although it is close), as we shall soon see. (3) Typed-s-converson: We shall next discuss the issue of types Xx” . (e’*x’l) -3 eT*, if x7’ @ fu(e’?). in a foundational sense, thus continuing our plan of elaborating the lambda calculus To preserve type correctness, we assume to match the language features being dis- that the typed identifiers xrl and x12, where cussed. This will lead us to a reasonable 71 # TV, are distinct identifiers. Note then picture of ML’s Hindley-Milner type sys- how every expression in our new calculus tem, the rich polymorphic type system that carries with it its proper type, and thus type was mentioned above and that was later checking is built in to the syntax. adopted by every other statically typed Unfortunately, there is a serious problem functional language, including Miranda with this result: Our new calculus has lost and Haskell. Aside from the type system, the power of X-definability. In fact, every the two most novel features in ML are its term in the pure typed lambda calculus can references and modules, which are also cov- be shown to be strongly normalizable, mean- ered in this section. Discussion of ML’s ing each has a normal form, and there data abstraction facilities will be postponed is an effective procedure for reducing each until Section 2.3. of them to its normal form [Fortune et al.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 376 l Paul Hudak

19851. Consequently, we can only compute given earlier, except that Y, would be used a rather small subset of functions from where 7 = Int + Int. integers to integers-namely, the extended In addition to this calculus, we can derive polynomials [Barendregt 19841 .I6 a typed recursive lambda calculus with con- The reader can gain some insight into stants in the same way that we did in the the problem by trying to write the defini- untyped case, except that instead of the tion of the Y combinator-that paradoxical unrestricted Y combinator we use the typed entity that allowed us to express recur- versions defined above. sion-as a typed term. The difficulty lies At this point our type system is about on in properly typing self-application (recall par with that of a strongly and explicitly the discussion in Section l.l), since typing typed language such as Pascal or Ada. We (ee) requires that e have both the type would, however, like to do better. As men- 72 + pi and r2, which cannot be done within tioned in Section 1.6, the designers of ML the structure we have given. It can, in fact, extended the state of the art of type systems be shown that the pure typed lambda cal- in two significant ways: culus has no fixpoint operator. l They permitted polymorphic functions. Fortunately, there is a clever way to solve this dilemma. Instead of relying on self- l They used type inference to infer types application to implement recursion, simply rather than requiring that they be de- add to the calculus a family of constant clared explicitly. fixpoint operators similar to Y, only typed. As an example of a polymorphic function, To do this, we first move into the typed consider lambda calculus with constants, in which a domain of constants Con is added as in the map :: (a + b) -3 [a] + [b] (untyped) lambda calculus with constants. mwf[l=[l mapf(x:xs) =fx:mapfxs We then include in Con a typed fixpoint operator of the form Y, for every type 7, The first line is a type signature that de- where clares the type of map; it can be read as “for all types a and b, map is a function that takes two arguments, a function from a into b and a list of elements of type a, Then for each fixpoint operator Y, we add and returns a list of elements of type b.” the &rule: [Type signatures are optional in Haskell; Typed- Y-conversion: the type system is able to infer them auto- matically. Also note that the type con- ( Y7eT+r)T * (e’“( Y,eT’T)T)T structor + is right associative, which is consistent with the left associativity of The reader may easily verify that type con- function application.] sistency is maintained by this rule. Therefore, map can be used to map By ignoring the type information, we can square down a list of integers, or head down see that the above &rule corresponds to the a list of lists, and so on. In a monomorphic conversion (Yf) w (f (Yf )) in the untyped language such as Pascal, one would have to case, and the same trick for implementing define a separate function for each of these. recursion with Y as discussed in Section The advantage of polymorphism should be 1.1 can be used here, thus regaining X- clear. definability. For example, a nonrecursive One way to achieve polymorphism in our definition of the typed factorial function calculus is to add a domain of type variables would be the same as the untyped version and extend the domain of derived types accordingly: I6 On the bright side, some researchers view the strong b E BasTyp Basic types normalization property as a feature, since it means that all programs are guaranteed to terminate. Indeed v E TypId Type variables this property forms the basis of much of the recent 7 E TYP Derived types work on using constructive as a foundation for programming languages. where 7 ::= b I u I ~14 72

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages 377

Thus, for example, u -+ 7 is a proper type signatures present. The use of such poly- and can be read as if the type variable was morphism however, is limited to the scope universally quantified: “for all types u, the in which map was defined. For example, type u + 7.” To accommodate this we must the program change the rule for P-conversion to read silly map f g (2) Typed-P-conversion with type vari- where f :: Int -+ Int ables: g :: Char ---) Char (a) ((XxTz.ez) @ ([e;a/xrs]e;‘)‘l map :: (a + 6) + bl + PI silly m f g = (m f num-list, (b) ((xx”.e;l)ep) m g char-list) @ ([~2/ul([e~/x”le;l))‘~ [(el, e2) is a tuple] results in a type error, where substitution on type variables is de- since map is passed as an argument and fined in the obvious way. Note that this then instantiated in two different ways; rule implies the validity of expressions of that is, once as type (Int -+ Int) + [Int] + the form (ey+T1e2) TV.Similar changes are [Int] and once as type (Char + Char) + required to accommodate expressions such [Char] + [Char]. If map were instantiated as (e ;-“e;) “. in several ways within the scope in which But, alas, now that type variables have it was defined or if m were only instantiated been introduced, it is no longer clear in one way within the function silly, there whether a program is properly typed-it is would have been no problem. not built in to the static syntactic structure This example demonstrates a fundamen- of the calculus. In fact, the type-checking tal limitation to the Hindley-Milner type problem for this calculus is undecideable, system, but in practice the class of pro- being a variant of a problem known as grams that the system rejects is not large partial polymorphic type inference [Boehm and is certainly smaller than that rejected 1985; Pfenning 19881. by any existing type-checking algorithm for Rather than trying to deal with the type- conventional languages in that, if nothing checking problem directly, we might go one else, it allows polymorphism. Many other step further with our calculus and try to functional languages have since then incor- attain ML’s lack of a requirement for ex- porated what amounts to a Hindley-Milner plicit typing. We can achieve this by simply type system, including Miranda and erasing all type annotations and then trying Haskell. It is beyond the scope of this ar- to solve the seemingly harder problem of ticle to discuss the details of type inference, inferring types of completely naked terms. but the reader may find good pragmatic Surprisingly, it is not known whether an discussions in Hancock [1987] and Damas effective type inference algorithm exists for and Milner [1982] and a good theoretical this calculus, even though the problem of discussion in Milner [ 19781. partial polymorphic type inference, known As a final comment we point out that an to be undecidable, seems as if it should be alternative way to gain polymorphism is to easier. introduce types as values and give them at Fortunately, Hindley [ 19691 and Milner least some degree of first-class status (as [1978] independently discovered a re- we did earlier for functions); for example, stricted polymorphic type system that is allowing them to be passed as arguments almost as rich as that provided by our cal- and returned as results. Allowing them to culus and for which type inference is decid- be passed as arguments only (and then used able. In other words, there exist certain in type annotations in the standard way) terms in the calculus presented above that results in what is known as the polymorphic one could argue are properly typed but or second-order typed lambda calculus. would not be allowed in the Hindley- Girard [1972] and Reynolds [1974] discov- Milner system. The system still allows pol- ered and studied this type system indepen- ymorphism, such as exhibited by map de- dently, and it appears to have great fined earlier, and is able to infer the type expressive power. It turns out, however, to of functions such as map without any type be essentially equivalent to the system we

ACM Computing Surveys, Vol. 21, No. 3, September 1989 378 l Paul Hudak

developed earlier and has the same diffi- For example, a new signature called SIG culties with respect to type inference. It is, (think of it as a new type) may be declared nevertheless, an active area of current re- by search (see Cardelli and Wegner [ 19851 and signature SIG = Reynolds [ 19851 for good summaries of this sig work). val x : int val succ : int + int 1.6.2 ML ‘s References end A reference is essentially a pointer to a cell in which the types of two identifiers have containing values of a particular type; ref- been declared, x of type int and succ of type erences are created by the (pseudo)function int + int. The following structure S (think ref. For example, ref 5 evaluates to an of it as an environment) has the implied integer reference-a pointer to a cell that signature SIG defined above: is allowed to contain only integers and in structure S = this case having initial contents 5. The struct contents of a cell can be read using the val n = 5 prefix operator !. Thus if x is bound to ref val succ x = x+1 5 then !r returns 5. end The cell pointed to by a reference may be updated via assignment using the infix If we then define the following functor F operator :=. Continuing with the above ex- (think of it as a function from structures to ample, x := 10, although an expression, has structures): the side effect of updating the cell pointed functor F( 2’: SIG) = to by x with the value 10. Subsequent eval- struct uations of !x will then return 10. Of course, valy= TX+ 1 to make the notion of subsequent well- val add2 x: = T.succ(Z’.su~~(x)) defined, it is necessary to introduce se- end quencing constructs; indeed ML even has then the new structure declaration an iterative while construct. References in ML amount to assignable structure U = F(S) variables in a conventional programming is equivalent to having written language and are only notable in that they can be included within a type structure structure U = such as Hindley-Milner’s and can be rele- struct gated to a minor role in a language that is valy=x+l val add2 x = succ(succ(~)) generally proclaimed as being functional. A val x: = 5 proper treatment of references within a val succ x = x+1 Hindley-Milner type system can be found end in Tofte [1988]. except that the signature of U does not include bindings for x and succ (i.e., they 1.6.3 Modules are hidden). Modules in ML are called structures and Although seemingly simple, the ML are essentially reified environments. The module facility has one very noteworthy type of a structure is captured in its signa- feature: Structures are (at least partially) ture and contains all of the static properties first class in that functor take them as of a module that are needed by some other arguments and return them as values. A module that might use it. The use of one more conservative design (such as adopted module by another is captured by special in Miranda and Haskell) might require all functions called functors that map struc- modules to be named, thus relegating them tures to new structures. This capability is to second-class status. Of course, this first- sometimes called a parameterized module class treatment has to be ended somewhere or generic package. if type checking is to remain effective, and

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 379 in ML that is manifested in the fact that particular, using SASL’s syntax for equa- structures can be passed to functors only tions gave programs a certain mathematical (e.g., they cannot be placed in lists), the flavor, since equations were deemed appli- signature declaration of a functor’s argu- cable through the use of guards and ment is mandatory in the functor declara- Landin’s notion of the off-side rule was tion, and functors themselves are not first revived. For example, this definition of the class. factorial function It is not clear whether this almost-first- class treatment of structures is worth the fat n = 1, n=O extra complexity, but the ML module facil- = n * fac(n-l), n>O ity is certainly the most sophisticated of looks a lot like the mathematical version those found in existing functional program- ming languages, and it is achieved with no 1 ifn=O loss of static type-checking ability, includ- fat n = { n * fac(n - 1) if n > 0 ing the fact that modules may be compiled independently and later linked via functor (In Haskell this program would be written application. with slightly different syntax as fat n(n==O= 1 1.7 SASL, KRC, and Miranda 1C-0 = n*fac(n-1) At the same time ML and FP were being developed, David Turner, first at the Uni- More on equations and pattern matching versity of St. Andrews and later at the may be found in Section 2.4.) University of Kent, was busy working on Another nice aspect of SASL’s equa- another style of functional languages re- tional style is that it facilitates the use of sulting in a series of three languages that higher order functions through currying. characterize most faithfully the modern For example, if we define school of functional programming ideas. addxy=x+y More than any other researcher, Turner [ 1981,1982] argued eloquently for the value then “add” 1 is a function that adds 1 to its of lazy evaluation, higher order functions argument. and the use of recursion equations as a KRC is an embellishment of SASL pri- syntactic sugaring for the lambda calculus. marily through the addition of ZF expres- Turner’s use of recurrence equations was sions (which were intended to resemble consistent with Landin’s argument 10 years Zemelo-Frankel set abstraction and whose earlier, as well as Burge’s [1975] excellent syntax was originally suggested by John treatise on recursive programming tech- Darlington), as well as various other short- niques and Burstall and Darlington’s hands for lists (such as [a . . b] to denote [ 19771 work on program transformation. the list of integers from a to b, and [a . .] to But the prodigious use of higher order func- denote the infinite sequence starting with tions and lazy evaluation, especially the a). For example (using Haskell syntax), latter, was something new and was to be- [ x*x 1x c [l . . loo], odd(x) ] come a hallmark of modern functional pro- gramming techniques. is the list of squares of the odd numbers In the development of SASL (St. from 1 to 100 and is similar to Andrews Static Language) [Turner 19761, KRC (Kent Recursive Calculator) [Turner (x2 1x E (1, 2, . . . , 100) A odd(x)} 19811, and Miranda17 [Turner 19851, Turner concentrated on making things eas- except that the former is a list, the latter is ier on the programmer, and thus he intro- a set. In fact Turner used the term set duced various sorts of syntactic sugar. In abstraction as synonymous with ZF expres- sion, but in fact both terms are some- I’ Miranda is one of the few (perhaps the only) func- what misleading since the expressions ac- tional languages to be marketed commercially. tually denote lists, not true sets. The more

ACM Computing Surveys, Vol. 21, No. 3, September 1989 380 l Paul Hudak popular current term is list comprehen- around the data dependencies in a program, sion,” which is what is used in the re- resulting in high degrees of parallelism. mainder of this paper. As an example of Since data dependencies were paramount the power of list comprehensions, here is and artificial sequentiality was objectiona- a concise and perspicuous definition of ble, the languages designed to support such quicksort: machines were essentially functional lan- guages, although historically they have 4s[ 1 =[I been called dataflow languages. In fact they qs (x:xs) = qs [y 1ytrs,y<3c ] ++ [x] ++ do have a few distinguishing features, typ- PIY I Y-wY>=xl ically reflecting the idiosyncrasies of the (just as imperative [+-t is the infix append operator.] languages reflect the von Neumann archi- Miranda is in turn an embellishment of tecture): They are typically first order (re- KRC, primarily in its treatment of types: flecting the difficulty in constructing It is strongly typed, using a Hindley-Milner closures in the dataflow model), strict (re- type system, and it allows user-defined con- flecting the data-driven mode of operation crete and abstract datatypes (both of these that was most popular and easiest to imple- ideas were presumably borrowed from ML; ment), and in certain cases do not even see Sections 1.6 and 1.6.1). One interesting allow recursion (reflecting Dennis’ original innovation in syntax in Miranda is its use static dataflow design, rather than, for ex- of sections (first suggested by Richard ample, Arvind’s dynamic tagged-token Bird), which are a convenient way to con- model [Arvind and Gostelow 1977; Arvind vert partially applied infix operators into and Kathail 19811). A good summary of functional values. For example, the expres- work on dataflow machines, at least sions (+), (x+), and (+x) correspond to the through 1981, can be found in Treleaven et functions f, g, and h, respectively, defined al. [1982]; more recently, see Vegdahl by [1989]. fxy=x+y The two most important dataflow lan- i?Y = x+y guages developed during this era were hy =y+x Dennis et al.‘s Val [Ackerman and Dennis 1979; McGraw 19821, and Arvind and In part because of the presence of sections, Gostelow’s Id [1982]. More recently, Val Miranda does not provide syntax for has evolved into SISAL [McGraw et al. lambda abstractions. (In contrast, the 19831 and Id into Id Nouveau [Nikhiel Haskell designers chose to have lambda et al. 19861. The former has retained much abstractions and thus chose not to have of the strict and first-order semantics of sections.) dataflow languages, whereas the latter has Turner was perhaps the foremost pro- many of the features that characterize ponent of both higher-order functions and modern functional languages. lazy evaluation, although the ideas origi- Keller’s FGL [Keller et al. 19801 and nated elsewhere. Discussion of both of Davis’ DDN [Davis 19781 are also notable these topics is delayed until Sections 2.1 developments that accompanied a flurry of and 2.2, respectively, where they are dis- activity on dataflow machines at the Uni- cussed in a more general context. versity of Utah in the late 70’s. Yet another interesting dataflow language is Ashcroft 1.8 Dataflow Languages and Wadge’s Lucid [Ashcroft and Wadge 1976a, 197613; Wadge and Ashcroft 19851 In the area of computer architecture, begin- [McGraw et al. 19831 whose distinguishing ning predominantly with the work of feature is the use of identifiers to represent Dennis’ and Misuras [1974] the early streams of values (in a temporal, dataflow 1970s there arose the notion of dataflow, sense), thus allowing the expression of it- a computer architecture organized solely eration in a rather concise manner. The authors also developed an algebra for rea- ” A term popularized by . soning about Lucid programs.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages 381

1.9 Others tional languages was being hampered by the lack of a common language. Thus it was In the late 1970s and early 1980s a surpris- decided that a committee should be formed ing number of other modern functional lan- to design such a language, providing faster guages appeared, most in the context of communication of new ideas, a stable implementation efforts. These included foundation for real applications develop- Hope at Edinburgh University [Burstall et ment, and a vehicle through which other al. 19801, FEL at Utah [Keller 19821, Lazy people would be encouraged to learn and ML (LML) at Chalmers [Augustsson use functional languages. The result of that 19841, Alfl at Yale [Hudak 19841, Ponder committee’s effort was a purely functional at Cambridge [Fairbairn 19851, Orwell at programming language called Haskell Oxford [Wadler and Miller 19881, Daisy at [Hudak and Wadler 19881, named after Indiana [Johnson N.d.] Twentel at the Haskell B. Curry, and described in University of Twente [Kroeze 19871, and Section 1.10. Tui at Victoria University [Boutel 19881. Perhaps the most notable of these lan- guages was Hope, designed and imple- 1.10 Haskell mented by , David MacQueen, and Ron Sannella at Edinburgh University Haskell is a general-purpose, purely func- [ 19801. Their goal was “to produce a very tional programming language exhibiting simple programming language which en- many of the recent innovations in func- courages the production of clear and tional (as well as other) programming manipulable programs.” Hope is strongly language research, including higher order typed, allows polymorphism, but requires functions, lazy evaluation, static poly- explicit type declarations as part of all morphic typing, user-defined datatypes, function definitions (which also allowed a pattern matching, and list comprehensions. useful form of overloading). It has lazy lists It is also a very complete language in that but otherwise is strict. It also has a simple it has a module facility, a well-defined func- module facility. But perhaps the most sig- tional I/O system, and a rich set of primi- nificant aspect of Hope is its user-defined tive datatypes, including lists, arrays, concrete datatypes and the ability to pat- arbitrary and fixed precision integers, and tern match against them. ML, in fact, did floating-point numbers. in this sense Has- not originally have these features; they kell represents both the culmination and were borrowed from Hope in the design of solidification of many years of research on SML. functional languages-the design was in- To quote Bird and Wadler [1988], this fluenced by languages as old as Iswim and proliferation of functional languages was “a as new as Miranda. testament to the vitality of the subject,” Haskell also has several interesting new although by 1987 there existed so many features; most notably, a systematic treat- functional languages that there truly was a ment of overloading, an orthogonal ab- Tower of Babel, and something had to be stract datatype facility, a universal and done. The funny thing was, the semantic purely functional I/O system, and, by anal- underpinnings of these languages were ogy to list comprehensions, a notion of fairly consistent, and thus the researchers array comprehensions. in the field had very little trouble under- Haskell is not a small language. The de- standing each other’s programs, so the mo- cision to emphasize certain features such tivation within the research community to as pattern matching and user-defined da- standardize on a language was not high. tatypes and the desire for a complete and Nevertheless, in September 1987 a meet- practical language that includes such things ing was held at the FPCA Conference in as I/O and modules necessitates a some- Portland, Oregon, to discuss the problems what large design. The Haskell Report also that this proliferation of languages was cre- provides a denotational semantics for both ating. There was a strong consensus that the static and dynamic behavior of the lan- the general use of modern, nonstrict func- guage; it is considerably more complex than

ACM Computing Surveys, Vol. 21, No. 3, September 1989 382 . Paul Hudak the simple semantics defined in Section 2.5 The main philosophical argument for for the lambda calculus, but then again one higher-order functions is that functions are wouldn’t really want to program in as values just like any others, so why not give sparse a language as the lambda calculus. them the same first class status? But there Will Haskell become a standard? Will it are also compelling pragmatic reasons for succeed as a useful programming language? wanting higher-order functions. Simply Only time will tell. As with any other lan- stated, the function is the primary abstrac- guage development, it is not only the qual- tion mechanism over values; thus facilitat- ity of the design that counts but also the ing the use of functions increases the use ready availability of good implementations of that kind of abstraction. and the backing from vendors, government As an example of a higher-order function, agencies, and researchers alike. At this date consider the following: it is too early to tell what role each of these twicefx=f(fx) factors will play. I will end our historical development of which takes its first argument, a function functional languages here, without elabo- f, and applies it twice to its second argu- rating on the details of Haskell just yet. ment, X. The syntax used here is important: Those details will surface in significant twice as written is curried, meaning that ways in the next section, where the most when applied to one argument it returns a important features of modern function lan- function that then takes one more argu- guages are discussed, and in the following ment, the second argument above. For ex- section, where more advanced ideas and ample, the function add2. active research areas are discussed. add2 = twice succ where succ x = n+l 2. DISTINGUISHING FEATURES OF MODERN FUNCTIONAL LANGUAGES is a function that will add 2 to its argument. Making function application associate to Recall that I chose to delay detailed dis- the left facilitates this mechanism, since cussion of four distinguishing features of (twice succ X) is equivalent to ((twice succ) modernfunctionallanguages-higher-order x), so everything works out just fine. functions, lazy evaluation, data abstrac- In modern functional languages func- tion met hanisms, and equations/pattern tions can be created in several ways. One matching. Now that we have completed our way is to name them using equations, as study of the historical development of func- above; another way is to create them di- tional languages, we can return to those rectly as lambda abstractions, thus render- features. Most of the discussion will center ing them nameless, as in the Haskell on how the features are manifested in expression Haskell, ML, and Miranda. \x + x+1 2.1 Higher Order Functions [in lambda calculus this would be written If functions are treated as first-class values Xx.x + 11, which is the same as the successor in a language-allowing them to be stored function succ defined above. add2 can then in data structures, passed as arguments, be defined more succinctly as and returned as results-they are referred add2 = twice (Lx + x+1) to as higher-order functions. I have not said too much about the use of higher order From a pragmatic viewpoint, we can un- functions thus far, although they exist in derstand the use of higher-order functions most of the functional languages that I have by analyzing the use of abstraction in gen- discussed, including of course the lambda eral. As known from introductory program- calculus. Their use has in fact been argued ming, a function is an abstraction of values in many circles, including ones outside of over some common behavior (an expres- functional programming, most notably the sion). Limiting the values over which the Scheme community [Abelson et al. 19851. abstraction occurs to nonfunctions seems

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages 383 unreasonable; lifting that restriction re- [An infix operator may be passed as an sults in higher-order functions. Hughes argument by enclosing it in parentheses; makes a slightly different but equally com- thus (:) is equivalent to \x y + x : y.] This pelling argument in Hughes [1984] where version of append simply replaces the [ ] at he emphasizes the importance of modular- the end of the list xs with the list ys. ity in programming and argues convinc- It is easy to verify that the new defini- ingly that higher-order functions increase tions are equivalent to the old using simple modularity by serving as a mechanism for equational reasoning and induction. It is glueing program fragments together. That also important to note that in arriving at glueing property comes not just from the the main abstraction we did nothing out of ability to compose functions but also from the ordinary-we just apply classical data the ability to abstract over functional be- abstraction principles in as unrestricted a havior as described above. way as possible, which means allowing A.s an example, suppose in the course of functions to be first-class citizens. program construction we define a function to add together the elements of a list as 2.2 Nonstrict Semantics (Lazy Evaluation) follows: 2.2.1 Fundamentals sum [] = 0 The normal-order reduction rules of the sum(x:xs) = add x (sum xs) lambda calculus are the most general in Then suppose we later define a function to that they are guaranteed to produce a multiply the elements of a list as follows: normal form if in fact one exists (see Section 1.1). In other words, they result in prod [ ] = 1 termination of the rewriting process most prod(x:xs)= mu1 x (prod xs) often, and the strategy lies at the heart of But now we notice a repeating pattern and the Church-Rosser theorem. Furthermore, anticipate that we might see it again, so we as argued earlier, normal-order reduction ask ourselves if we can possibly abstract allows recursion to be emulated with the the common behavior. In fact, this is easy Y combinator, thus giving the larnbda cal- to do: We note that add/mu1 and O/l are culus the most powerful form of effec- the variable elements in the behavior, and tive computability, captured in Church’s thus we parameterize them; that is, we Thesis. make them formal parameters, say f and Given all this, it is quite natural to con- init. Calling the new function fold, the sider using normal-order reduction as the equivalent of sum/prod will be “fold f init”, computational basis for a programming and thus we arrive at language. Unfortunately, normal-order re- duction, implemented naively, is hopelessly (fold f init) [ ] = init inefficient. To see why, consider this simple (fold f init)(x:xs) = f x ((fold f init) xs) normal-order reduction sequence: where the parentheses around “fold f init” (Xx. (+ x x))(* 5 4) are used only for emphasis, and are other- * (+ (* 5 4)(* 5 4)) wise superfluous. * (+ 20 (* 5 4)) From this we can derive new definitions * (+ 20 20) for sum and product: 4 40 sum = fold add 0 Note that the multiplication (* 5 4) is done prod = fold mu1 1 twice. In the general case, this could be an Now that the fold abstraction has been arbitrarily large computation, and it could made, many other useful functions can be potentially be repeated as many times as defined, even something as seemingly un- there are occurrences of the formal param- related as append: eter (for this reason an analogy is often drawn between normal-order reduction and append xs ys = fold (:) ys xs call-by-name parameter passing in Algol).

ACM Computing Surveys, Vol. 21, No. 3, September 1989 384 l Paul Hudak

In practice this can happen quite often, semantics.” In addition to overcoming the reflecting the simple fact that results are efficiency problem of normal-order reduc- often shared. tion, applicative-order reduction could be One solution to this problem is to resort implemented with relative ease using the to something other than normal-order re- call-by-value technology that had duction, such as applicative-order reduc- been developed for conventional imperative tion, which for the above term yields the programming languages. following reduction sequence: Nevertheless, the appeal of normal-order reduction cannot be ignored. Returning to (Ax. (+ x x))(* 5 4) lambda calculus , we can try to get to 4 (Xx. (+ x x)) 20 the root of the efficiency problem, which =$ (+ 20 20) seems to be the following: Reduction in the * 40 lambda calculus is normally described as string reduction, which precludes any pos- Note that the argument is evaluated before sibility of sharing. If instead we were to the P-reduction is performed (similar to describe it as a graph reduction process, a call-by-value parameter-passing mecha- perhaps sharing could be achieved. This nism), and thus the normal form is reached idea was first suggested by Wadsworth in three steps instead of four, with no re- in his Ph.D. dissertation in 1971 [1971, computation. The problem with this solu- Chapter 41, in which he outlined a graph- tion is that it requires the introduction of reduction strategy that used pointers to a special reduction rule to implement re- implement sharing. Using this strategy re- cursion (such as gained through the &rule sults in the following reduction sequence for McCarthy’s conditional), and further- for the first example given earlier: more there are examples for which it does more work than normal-order reduction. For example, consider

Applicative order normal order (Xx. l)(* 5 4) (Xx. l)(* 5 4) * (Xx. 1) 20 *l *l or even worse (repeated from Section 1.1): which takes the same number of steps as Applicative order the applicative-order reduction sequence. We will call an implementation of (Xx. l)((Xx. x x)(Xx. x x)) normal-order reduction in which recompu- - (Xx. l)((Xx. x x)(Xx. x x)) tation is avoided lazy evaluation (another =+ term often used is call by need). Its key feature is that arguments in function calls are evaluated at most once. It possesses the full power of normal-order reduction while Normal order

(Xx. l)((hr. x x)(Xx. x z)) I9 Actually this is not quite true-most implementa- *l tions of these languages use an applicative-order re- duction strategy for the top-level redices only, thus which in the applicative-order case does not yielding what is known as a weak head normal form. terminate. Despite these problems, most of This strategy turns out to be easier to implement than complete applicative-order reduction and also permits the early functional languages, including certain versions of the Y combinator to be imple- pure Lisp, FP, ML, Hope, and all of the mented without special rules. See Burg [1975] for an dataflow languages used applicative-order example of this using Landin’s SECD machine.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 385 being more efficient than applicative-order know the greatest common divisor of b and reduction in that “at most once” sometimes c in some computation involving 2. In a amounts to no computation at all. modern functional language we might write Despite the appeal of lazy evaluation and this apparent solution to the efficiency fax problem, it took a good 10 years more for where a = gcd b c researchers to discover ways to implement without worrying about a being evaluated it efficiently compared to conventional pro- gramming languages. The chief problem needlessly-if in the computation of f a n centered on the difficulty in implementing the value of a is needed, it will be computed; lazy graph-reduction mechanisms on con- otherwise it will not. If we were to have ventional computers, which seem to be written this program in Scheme, for exam- better suited to call-by-value strategies. ple, we might try Simulating the unevaluated portions of (let ( (a kcd b c)) 1 the graph in a call-by-value environment (fax)) amounts to implementing closures, or “thunks” efficiently, which have some in- which will always evaluate a. Knowing that herent, nontrivial costs [Bloss et al. 19811. f doesn’t always need that value and being It is beyond the scope of this paper to concerned about efficiency, we may decide discuss the details of these implementation to rewrite this as concerns; see Peyton-Jones [1987] for an excellent summary. (let ( (a (delay &cd b cl)) 1 Rather than live with conventional com- (fax)) puters, we could alternatively build special- ized graph-reduction or dataflow hardware, which requires modifying f so as to force its but so far this has not resulted in any first argument. Alternatively, we could just practical, much less commercially avail- write (f b c x), which requires modifying f able, machines. Nevertheless, this work is so as to compute the gcd internally. Both quite promising, and good summaries of of these solutions are severe violations of work in this area can be found in articles modularity and arise out of the program- by Treleaven et al. [1982] and Vegdahl mer’s concern about evaluation order. Lazy [1984], both of which are reprinted in evaluation eliminates that concern and pre- Thakkar [ 19871. serves modularity. The second argument for lazy evaluation is perhaps the one more often heard: the 2.2.2 Expressiveness ability to compute with unbounded “infinite” Assuming that we can implement lazy eval- data structures. The idea of lazily evaluat- uation efficiently (current practice is con- ing data structures was first proposed by sidered acceptably good), we should return Vuillemin [ 19741, but similar ideas were to the question of why we want it in the developed independently by Henderson first place. Previously we argued on philo- and Morris [1976] and Friedman and Wise sophical grounds-it is the most general [1976]. In a later series of papers, Turner evaluation policy-but is lazy evaluation [1981, 19821 provided a strong argument useful to programmers in practice? The for using lazy lists, especially when com- answer is an emphatic yes, which I will bined with list comprehensions (see Sec- show via a twofold argument. tion 1.7) and higher order functions (see First, lazy evaluation frees a programmer Section 2.1). Aside from Turner’s elegant from concerns about evaluation order. The examples, Hughes [1984] presented an ar- fact is, programmers are generally con- gument based on facilitating modularity in cerned about the efficiency of their pro- programming where, along with higher or- grams, and thus they prefer not evaluating der functions, lazy evaluation is described things that are not absolutely necessary. As as a way to “glue” pieces of programs a simple example, suppose we may need to together.

ACM Computing Surveys, Vol. 21, No. 3, September 1939 306 . Paul Hudak The primary power of lazily evaluated and type checking in particular. Some of data structures comes from its use in sep- this work has also taken on a theoretical arating data from control. The idea is that flavor, not only in foundational mathemat- a programmer should be able to describe a ics where logicians have used types to re- specific data structure without worrying solve many famous paradoxes, but also in about how it gets evaluated. Thus, for ex- formal semantics where types aid our un- ample, we could describe the sequence of derstanding of programming languages. natural numbers by the following simple Fueling the theoretical work are two sig- program: nificant pragmatic advantages of using data abstraction and strong typing in one’s nats = 0: map succ nats programming methodology. First, data ab- straction improves the modularity, secu- or alternatively by rity, and clarity of programs. Modularity is numsfrom n = n:numsfrom (n+l) improved because we can abstract away nats = numsfrom 0 from implementation (i.e., representation) details; security is improved because inter- These are examples of “infinite lists”, or face violations are automatically prohib- streams, and in a language that did not ited, and clarity is improved because data support lazy evaluation would cause the abstraction has an almost self-documenting program to diverge. With lazy evaluation flavor. these data structures are only evaluated Second, strong static typing helps in de- as they are needed, on demand. For exam- bugging since we are guaranteed that if a ple, we could define a function that filters program compiles successfully no error can out only those elements satisfying a pro- occur at run time due to type violations. It perty p : also leads to more efficient implementa- tions, since it allows us to eliminate the filter p (X : xs) most run-time tag bits and type testing. = if (p X) then (x: rest) else rest Thus there is little performance penalty for where rest = filter p xs using data abstraction techniques. Of course, these issues are true for any in which case “filter p nats” could be writ- programming language, and for that reason ten knowing that the degree of the list’s a thorough treatment of types and data computation will be determined by its con- abstraction is outside the scope of this sur- text-that is, the consumer of the result. vey; the reader may find an excellent sum- Thus filter has no operational control mary in Cardelli and Wegner [1985]. The within it and can be combined with other basic idea behind the Hindley-Milner type functions in a modular way. For example, system was discussed in Section 1.6.1. I will we could compose it with a function to concentrate in this section on how data square each element in a list: abstraction is manifest in modern func- map (\x-+x*x) . filter p tional languages.

[. is the infix composition operator.] This kind of modular result, in which data is 2.3.1 Concrete Datatypes separated from control, is one of the key As mentioned earlier, there is a strong ar- features of lazy evaluation. Many more ex- gument for wanting language features that amples of this kind may be found in Hughes facilitate data abstraction, whether or not [1984]. the language is functional. In fact such mechanisms were first developed in the context of imperative languages such as 2.3 Data Abstraction , Clu, and Euclid. It is only natural Independently of the development of func- that they be included in functional lan- tional languages there has been considera- guages. ML, as I mentioned, was the first ble work on data abstraction in general and functional language to do this, but many on strong typing, user-defined datatypes others soon followed suit.

ACM Computing Surveys, Vol. 21, Nq. 3, September 1989 Functional Programming Languages l 387

In this section I will describe concrete the expressions \e1+e2, [e,], and (e1,e2) (or algebraic) datatypes as well as type have the types Tl+T2, [T,], and (Tl,T2), synonyms. I will use Haskell syntax, but respectively.] the ideas are essentially identical (at least Instances of these new types are built semantically) to those used in ML and simply by using the constructors. Thus Miranda. Empty is an empty tree, and Node 5 Empty New algebraic datatypes may be defined is a very simple tree of integers with one along with their constructors using data element. The type of the instance is in- declarations, as in the following definitions ferred via the same type inference mecha- of lists and trees: nism that infers types of polymorphic functions, as described previously. data List a Defining new concrete datatypes is fairly = Nil / Cons a (List a) data Tree b common not only in functional languages = Empty 1Node 6 (List (Tree b)) but also in imperative languages, although the polymorphism offered by modern func- The identifiers List and Tree are called type tional languages makes it all the more constructors, and the identifiers a and b are attractive. called type variables, which are implicity Type synonyms are a way of creating new universally quantified over the scope of the names for types, such as in the following: data declaration. The identifiers Nil, Cons, Empty, and Node are called data construc- type Intree = Tree Ints tors, or just constructors, with Nil and type Flattener = Intree -+ [Ints] Empty being nullary constructors. [Note Note that Intree is used in the definition of that both type constructors and data con- Flattener.Type synonyms do not introduce structors are capitalized, so that the latter new types (as data declarations do) but can be used without confusion in pattern rather are a convenient way to introduce matching (as discussed in the next section) new names (i.e., synonyms) for existing and to avoid confusion with type variables types. (such as a and b in the above example).] List and Tree are called type construc- tors since they construct types from other 2.3.2 Abstract Datatypes types. For example, Tree Ints is the type of Another idea in data abstraction originat- trees of integers. Reading from the data ing in imperative languages is the notion of declaration for Tree, we see then that a tree an abstract datatype (ADT) in which the of integers is either Empty or a Node con- details of the implementation of a datatype taining an integer and a list of more trees are hidden from the users of that type, thus of integers. enhancing modularity and security. The We can now see that the previously given traditional way to do this is exemplified type signature for map, by ML’s ADT facility and emulated in map :: (a * b) - [al ---fPI Miranda. Although the Haskell designers chose a different approach to ADTs (de- is equivalent to scribed below), the following example of a map :: (a-+b)+(Lista)-+(Listb) queue ADT is written as if Haskell had ML’s kind of ADTs, using the keyword That is, [. . .] in a type expression is just abstype: syntax for application of the type construc- tor List. Similarly, we can think of + as an abstype Queue a = Q [a] infix type constructor that creates the type where first (Q us) = last us of all functions from its first argument (a isempty (Q [ 1) = True type) to its second (also a type). isempty (Q as) = False [A useful property to note is the con- sistent syntax used in Haskell for ex- pressions and types. Specifically, if Ti is The main point is that the functions first, the type of expression or pattern ci, then isempty, and so on, are visible in the scope

ACM Computing Surveys, Vol. 21, No. 3, September 1989 388 l Paul Hudak of the abstype declaration, but the con- general and can be nested. Haskell uses a structor Q, including its type, is not. Thus conservative module system that relies on a user of the ADT has no idea whether this capability. queues are implemented as lists (as shown A disadvantage of the orthogonal ap- here) or some other data structure. The proach is that if the most typical ADT advantage of this, of course, is that one is scenario only requires hiding the represent- free to change the representation type with- ative type, the user will have to think out fear of breaking some other code that through the details in each case rather than uses the ADT. having the hiding done automatically by using abstype. 2.3.3 Haskell’s Orthogonal Design In Haskell a rather different approach was 2.4 Equations and Pattern Matching taken to ADTs. The observation was made One of the programming methodology at- that the main difference between a concrete tributes that is strongly encouraged in the and abstract datatype was that the latter modern school of functional programming had a certain degree of information hiding is the use of equational reasoning in the built into it. So instead of thinking of ab- design and construction of programs. The stract datatypes and information hiding as lack of side effects accounts for the primary going hand in hand, the two were made ability to apply equational reasoning, but orthogonal components of the language. there are syntactic features that can facili- More specifically, concrete datatypes were tate it as well. Using equations as part of made the only data abstraction mechanism, the syntax is the most obvious of these, but and to that an expose declaration was along with that goes pattern matching added to control information hiding, or whereby one can write several equations visibility. when defining the same function, only one For example, to get the effect of the of which is presumably applicable in a given earlier definition of a queue, we would write situation. Thus modern functional lan- expose Queue, first, isempty guages have tried to maximize the expres- from data Queue a = Q [a] siveness of pattern matching. first (Q as) = last as At first blush, equations and pattern isempty (Q [ ] = True matching seem fairly intuitive and rela- isempty (Q us) = False tively innocuous. Indeed, we have already given many examples that use pattern matching without having said much about Since Q is not explicitly listed in the expose the details of how it works. But in fact declaration, it becomes hidden from the pattern matching can have surprisingly user of the ADT. subtle effects on the semantics of a lan- The advantage of this approach to ADTs guage and thus should be carefully and is more flexibility. For example, suppose we precisely defined. also wish to hide isempty or perhaps some auxiliary function defined in the nested 2.4.1 Pattern Matching Basics scope. This is trivially done with the or- Pattern matching should actually be viewed thogonal design but is much harder with as the primitive behavior of a case expres- the ML design as described so far. Indeed, sion, which has the general form to alleviate this problem the ML designers provided an additional construct, a local case e of declaration, with which one can hide local pat1 + el declarations. Another advantage of the or- pat2 + e2 thogonal design is that the same mecha- nism can be used at the top level of a patn -+ en module to control visibility of the internals of the module to the external world. In Intuitively, if the structure of e matches other words, the expose mechanism is very pati, then the result of the case expression

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 389 is ei. A set of equations of the form Another possibility that seems desirable is the ability to repeat formal parameters fpatl = el on the left-hand side to denote that argu- f pat2 = e2 ments in those positions must have the same value, as in the second line of the f patn = en following definition: can then be thought of as shorthand for member x [ ] = False , x (x : xs) = True f=\x+caserof , x (y : xs) = member x xs pat1 -+ el pat2 ---, e2 [This is not legal Haskell syntax, since such repetition of identifiers is not allowed.] patn + en Care must, however, be taken with this approach since in something like Despite this translation, for convenience I alleq [x, x, x] = True will use the equational syntax in the re- , mainder of this section. Y = False The question to be asked first is just what it is not clear in what order the elements of the pattern-matching process consists of. the list are to be evaluated. For example, if For example, what exactly are we pattern they are evaluated left to right then “alleq matching against? One idea that seems rea- [I, 2, bot]“, where bot is any nonterminat- sonable is to allow one to pattern match ing computation, will return False, whereas against constants and data structures. with a right to left order the program will Thus, fat can be defined by diverge. One solution that would at least fat 0 = 1 guarantee consistency in this approach is ’ n = n*fac(n-1) to insist that all three positions are evalu- ated so that the program diverges if any of [The tick mark in the second equation is the arguments diverge. an abbreviation for fat.] But note that this In general the problem of what gets eval- relies on a top-to-bottom reading of the uated, and when, is perhaps the most subtle program, since (fat 0) actually matches aspect of reasoning about pattern matching both equations. We can remove this ambi- and suggests that the pattern-matching al- guity by adding a guard (recall the discus- gorithm be fairly simple so as not to mislead sion in Section 1.7), which in Haskell looks the user. Thus in Haskell the above repe- like the following: tition of identifiers is disallowed-equa- fat 0 = 1 tions must be linear-but some functional ’ n ] n>O = n*fac(n-1) languages allow it (e.g., Miranda and Alfl [Hudak 19841). As we have already demonstrated through A particularly subtle version of this prob- several examples, it is also reasonable to lem is captured in the following example. pattern match against lists: Consider these definitions:

length [] = 0 data Silly a = Foo a ] Other (x: xs) = 1 + length xs bar (Foo x) = 0 ’ Other = 1 and for that matter any data structure, including user-defined ones: Then a call “bar bot” will diverge, since bar data Tree2 a must be strict in order that it can distin- = Leaf a ] Branch (Tree2 a) (Tree2 a) guish between the two kinds of arguments fringe (Leaf x) = [x] that it might receive. But now consider this 9 (Branch left right) small modification: = fringe left ++ fringe right data Silly a = Foo a where ++ is the infix append operator. bar (Foo x) = 0

ACM Computing Surveys, Vol. 21, No. 3, September 1989 390 l Paul Hudak Now a call bar bot seems like it should thetical keyword else (not valid in Haskell): return 0, since bar need not be strict-it can only receive one kind of argument and sameShallowStructure [a] [c] = True [a&] [c,d] = True thus does not need to examine it unless it else is needed in computing the result, which in 7 x Y = False this case it does not. In Haskell a mecha- nism is provided so that either semantics The first two equations would be combined may be specified. using a disjoint semantics; together they Two useful discussions on the subject of would then be combined with the third pattern matching can be found in Augusts- using a top-to-bottom semantics. Thus the son [1985] and Wadler [1987a]. third equation acts as an “otherwise” clause in the case that the first two fail. A design of this sort was considered for Haskell early 2.4.2 Connecting Equations on, but the complexities of disjointness, especially in the context of guards, were Let us now turn to the more global issue of considered too great and the design was how the individual equations are connected eventually rejected. together as a group. As mentioned earlier, one simple way to do this is give the equa- tions a top-to-bottom priority, so that in 2.4.3 Argument Order In the same sense that it is desirable to fat 0 = 1 have the order of equations be irrelevant, ’ n = n * fac(n-1) it is desirable to have the order of argu- ments be irrelevant. In exploring this pos- the second equation is tried only after the sibility, consider first the functions f and’g first one has failed. This is the solution defined by adopted in many functional languages, in- fll=l cluding Haskell and Miranda. fZx=Z An alternative method is to insist that gll=l the equations be disjoint, thus rendering gx2=2 the order irrelevant. One significant moti- vation for this is the desire to reason about which differ only in the order of their ar- the applicability of an equation indepen- guments. Now consider to what “f 2 bot” dently of the others, thus facilitating equa- should evaluate. Clearly the equations for f tional reasoning. The question is, HOW can are disjoint, clearly the expression matchs one guarantee disjointness? For equations only the second equation, and since we without guards, the disjointness property want a nonstrict language it seems the an- can be determined statically; that is, by just swer should clearly be 2. For a compiler to examining the patterns. Unfortunately, achieve this it must always evaluate the when unrestricted guards are allowed, the first argument to f first. problem becomes undecidable, since it Now consider the expression “g bot 2”- amounts to determining the equality of ar- by the same argument given above the re- bitrary recursive predicates. This in turn sult should also be 2, but now the compiler can be solved by resolving the guard dis- must be sure always to evaluate the second jointness at run time. On the other hand, argument to g first. Can a compiler always this solution ensures correctness only for determine the correct order in which to values actually encountered at run time, evaluate its arguments? and thus the programmer might apply To help answer that quation, first con- equational reasoning erroneously to as yet sider this intriguing example (due to Berry unencountered values. [ 19781): The two ideas could also be combined by fOlx=l providing two different syntaxes for joining flrO=Z equations. For example, using the hypo- fx01=3

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 391 Clearly these equations are disjoint. So it required a domain D that was a solution what is the value of “f 0’1 bot”? The desired to the following domain equation: answer is 1. And what is the value of “f 1 D=D+D bot O”? The desired answer is 2. And what is the value of “f bot 0 l”? The desired At first this seems impossible-surely there answer is 3. But now the most difficult are more functions from D into D than question is In what order should the argu- there are elements in D-but Scott [1970] ments be evaluated? If we evaluate the was able to show that indeed such domains third one first, then the answer to the first existed, as long as one was willing to restrict question cannot be 1. If we evaluate the the allowable functions in certain (quite second one first, then the answer to the reasonable) ways and by treating = as an second question cannot be 2. If we evaluate isomorphism rather than an equality. the first one first, then the answer to the Scott’s work served as the mathematical third question cannot be 3. In fact there is foundation for Strachey’s work [Milne and no sequential order that will allow us to Stracheg 19761 on the denotational seman- answer these three questions the way we tics of programming languages; see [Stoy would like-some kind of parallel evalua- 19791 and [Schmidt 19851 for thorough tion is required. treatments. This subtle problem is solvable in several Denotational semantics and functional ways, but they all require some kind of programming have close connections, and compromise-insisting on a parallel (or the functional programming community pseudoparallel) implementation, rejecting emphasizes the importance of formal se- certain seemingly valid programs, making mantics in general. For completeness and equations more strict than one would like, to show how simple the denotational or giving up on the independence of semantics of a functional language can the order of evaluation of arguments. In be, we give the semantics of the recursive Haskell the last solution was chosen- lambda calculus with constants defined perform a left-to-right evaluation of the in Section 2.3. arguments-because it presented the sim- plest semantics to the user, which was Bas = Int + Bool + . . . Basic values judged to be more important than making D = Bas + (D + D) Denotable values the order of arguments irrelevant. Another Env = Id + D Environments way to explain this is to think of equations 8: Exp -+ Env + D as syntax for nested lambda expressions, in 3: Con + D which case one might not expect symmetry gI[xi)enu= enuI[xj with respect to the arguments anyway. 8I[cjenv = 3T[cl ~?(jele2]enu = (271[elJ/env)(Z?[epjenv) 2.5 Formal Semantics BI[Xx.ejenu = Au.Z?[ejenu[u/x] Simultaneously with work on functional kF[e where x1 = e,; .‘. . ; X, = e,l)env languages Scott, Strachey, and others were = 8I[ejenv’ busy establishing the foundations of deno- where tational semantics, now the most widely used tool for describing the formal seman- env’ = fix Aenv’.env[(Z[e,Denv’)/xl, tics of programming languages. There was a close connection between this work and functional languages primarily because the lambda calculus served as one of the sim- plest programming languages with enough This semantics is relatively simple, but useful properties to make its formal seman- in moving to a more complex language such tics interesting. In particular, the lambda as Miranda or Haskell the semantics can calculus had a notion of self-application, become significantly more complex due to which implies that certain domains had to the many syntactic features that make the contain their own function spaces. That is, languages convenient to use. In addition,

ACM Computing Surveys, Vol. 21, No. 3, September 1989 392 l Paul Hudak

one must state precisely the static seman- Regardless of the kind of list given to map tics as well, including type checking and it behaves in the same way. In contrast, pattern-matching usage. This complexity is consider the function +, which we normally managed in the Haskell Report by first wish to behave differently for integer and translating Haskell into a kernel which is floating point numbers and not at all (i.e., only slightly more complex than the above. be a static error) for nonnumeric argu- ments. Another common example is the function == (equality), which certainly be- 3. ADVANCED FEATURES AND ACTIVE haves differently when comparing the RESEARCH AREAS equality of two numbers versus, say, two Some of the most recent ideas in functional lists. language design are new enough that they Ad hoc polymorphism is normally (and should be regarded as on-going research. I suppose appropriately) treated in an Nevertheless, many of them are sound ad hoc manner. Worse, there is no accepted enough to have been included in current convention for doing this; indeed ML, language designs. In this section we will Miranda, and Hope all do it differently. explore a variety of such ideas, beginning Recently, however, a uniform treatment for with some of the innovative ideas in the handling overloading was discovered inde- Haskell design. Some of the topics have a pendently by Kaes [ 19881 (in trying to gen- theoretical flavor, such as extensions of the eralize ML’s ad hoc equality types) and Hindley-Milner type system; some have a Wadler and Blott [1989] (as part of the pragmatic flavor, such as expressing non- process of defining Haskell). Below I will determinism, efficient arrays, and I/O; and describe the solution as adopted in Haskell; some involve the testing of new application details may be found in Wadler and Blott areas, such as parallel and distributed com- [ 19891. putation. All in all, studying these topics The basic idea is to introduce a notion of should provide insight into the goals of type classes that capture a collection of functional programming as well as some of overloaded operators in a consistent way. the problems in achieving those goals. A class declaration is used to introduce a new and the overloaded operators that must be supported by any type that is 3.1 Overloading an instance of that class. An instance dec- The kind of polymorphism exhibited by laration declares that a certain type is an the Hindley-Milner type system is what instance of a certain class, and thus in- Strachey called parametric polymorphism cluded in the declaration are the definitions to distinguish it from another kind that he of the overloaded operators instantiated on called ad hoc polymorp.lzi.sm or overloading. the named type. The two can be distinguished in the follow- For example, say that we wish to overload ing way: A function with parametric poly- + and negate on types Int and Float. To do morphism does not care what type certain so, we introduce a new type class called of its arguments have and thus it behaves Num: identically regardless of the type. In con- class Num a where trast, a function with ad hoc polymorphism (+) :: a+a-+a does care and in fact may behave differently negate :: a -3 a for different types. Stated another way, ad This declaration may be read “a type a hoc polymorphism is really just a syntactic belongs to the class Num if there are (over- device for overloading a particular func- tion name or symbol with more than one loaded) functions + and negate, of the ap- propriate types, defined on it.” meaning. For example, the function map defined We may then declare Int and Float to be instances of this class, as follows: earlier exhibits parametric polymorphism and has typing instance Num Int where x+Y = addInt x y map :: (a + b) - [a] -+ [b] negate x = negateInt x

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages 9 393 instance Num Float where 3.2 Purely Functional Yet Universal I/O x+Y = addFloat x y negate x = negateFloat n To many the notion of I/O conjures an image of state, side effects, and sequencing. [note how infix operators are defined; Is there any hope at achieving purely func- Haskell’s lexical syntax prevents ambigui- tional yet universal and of course efficient ties] where addInt, negateInt, addFloat, I/O? Suprisingly, the answer is yes. Per- and negateFloat are assumed in this case haps even more surprising is that over the to be predefined functions but in general years there have emerged not one but two could be any user-defined function. The seemingly very different solutions: first declaration above may be read “Int is an instance of the class Num as witnessed The lazy stream model, in which temporal by these definitions of + and negate.” events are modeled as lists, whose lazy Using type classes we can thus treat over- semantics mimics the demand-driven be- loading in a consistent, arguably elegant, havior of processes. way. Another nice feature is that type The continuation model in which tem- classes naturally support a notion of inher- porality is modeled via explicit contin- itance. For example, we may define a class uations. Eq by Although papers have been written ad- class Eq a where (==) voacting both solutions, and indeed they :: a + a + Boo1 are very different in style, the two solutions Given this class, we would certainly expect turn out to be exactly equivalent in terms all members of the class Num, say, to have of expressiveness; in fact, there is an almost == defined on them. Thus the class decla- trivial translation from one to the other. ration for Num could be changed to The Haskell I/O system takes advantage of this fact and provides a unified framework class Eq a * Num a where that supports both styles. The specific I/O (+I ::a+a+a negate :: a + a operations available in each style are iden- tical-what differs is the way they are ex- which can be read as “only members of the pressed-and thus programs in either style class Eq may be members of the class Num, may be combined with a well-defined se- and a type a belongs to the class Num if. . . mantics. In addition, although certain of (as before).” Given this class declaration, the primitives rely on nondeterministic be- instance declarations for Num must include havior in the , referential a definition of == as in transparency is still retained internal to a Haskell program. instance Num Int where In this section the two styles will be x+Y = addInt x y negate n = negateInt x described as they appear in Haskell, to- = eqInt x y gether with the translation of one in terms X”Y of the other. Details of the actual Haskell The Haskell Report uses this inheritance design may be found in Hudak and Wadler mechanism to define a very rich hierarchi- [1988], and a good discussion of the trade- cal numeric structure that reflects fairly offs between the styles, including examples, well a mathematician’s view of numbers. may be found in Hudak and Sundaresh The traditional Hindley-Milner type [ 19881. ML, by the way, uses an imperative, system is extended in Haskell to include referentially opaque, form of I/O (perhaps type classes. The resulting type system is not surprising given the presence of refer- able to verify that the overloaded operators ences); Miranda uses a rather restricted do have the appropriate type. It is however, form of the stream model; and Hope uses possible (but not likely) for ambiguous sit- the continuation model but with a strict uations to arise, which in Haskell result in (i.e., call-by-value) semantics. type error but can be reconciled explicitly To begin, we can paint an appealing by the user (see Hudak and Wadler [1988] functional view of a collection of programs for the details). executing within an operating system (OS)

ACM Computing Surveys, Vol. 21, No. 3, September 1989 394 l Paul Hudak as shown in Figure 1. With this view pro- and the corresponding response list, grams are assumed to communicate with the OS via messages-programs issue re- [ . . . , Success, Return ~2, . . . ] quests to the OS and receive responses from the OS. then sl == 52, unless there were some Ignoring for now the OS itself as well as intervening external effect. the merge and split operations, a program In contrast, the continuation model is can be seen as a function from a stream normally characterized by a set of transac- (i.e., list) of responses to a stream of re- tions. Each transaction typically takes a quests. Although the above picture is quite success continuation and a failure contin- intuitive, this latter description may seem uation as arguments. These continuations counterintuitive-how can a program re- in turn are simply functions that generate ceive a list of responses before it has gen- more transactions. For example, erated any requests? But remember that we are using a lazy (i.e., nonstrict) language, data Transaction and thus the program is not obliged to = ReadFile’ Name FailCont RetCont examine any of the responses before it is- 1WriteFile’ Name Contents sues its first request. This application of FailCont SuccCont lazy evaluation is in fact a very common type FailCont style of programming in functional lan- = ErrorMsg + Transaction guages. type RetCont = Contents + Transaction Thus a Haskell program engaged in I/O Type SuccCont Transaction is required to have type Behavior, where [In Haskell the transactions are actually type Behavior = [Response] -+ [Request] provided as functions rather than construc- tors; see below.] The special transaction (Recall from Section 2.3 that [Response] is Done represents program termination. the type consisting of lists of values of type These declarations should be compared Response.) The main operational idea is with those for the stream model given that the nth response is the reply of the earlier. operating system to the nth request. Returning to the simple example given For simplicity, we will assume that there earlier, the request and response list are no are only two kinds of requests and three longer separate entities since their effect is kinds of responses, as defined below: interwoven into the continuation structure, yielding something like this: data Request = ReadFile’ Name WriteFile’ fname sl exit 1 WriteFile’ Name Contents (ReadFile’ fname exit (\s2 + . . .) ) data Response where exit errmsg = Done = Success 1Return Contents 1Failure ErrorMsg in which case, as before, we would expect type Name = String type Contents = String sl == s2 in the absence of external effects. type ErrorMsg= String This is essentially the way I/O is handled in Hope. Although these two styles seem very dif- [This is a subset of the requests available ferent, there is a simple translation of the in Haskell.] continuation model into the stream model. As an example, given this request list, In Haskell, instead of defining the new datatype Transaction, a set of functions is 1 . . . , WriteFile fname sl, Readfile fname, defined that accomplishes the same task . . . 1 but that is really stream transformers in

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 395

Stream 01 Stream of requests responses

Figure 1. Functional I/O.

the request/response style. In other words, input and output. Because the merge itself the type Transaction should be precisely is implemented in the operating system, the type Behavior and should not be a new referential transparency is retained within datatype at all. Thus we arrive at a Haskell program.

readFile :: Name + FailCont += FailCont ---) RetCont -+ Behavior writeFile :: Name + Contents ---$ FailCont + SuccCont ---* Behavior done :: Behavior type FailCont = ErrorMsg + Behavior type RetCont = Contents -+ Behavior type SuccCont = Behavior readFile name fail succ resps = (ReadFile name) : case (head resps) of Return contents --, succ contents (tail resps) Failure msg -+ fail msg (tail resps) writeFile name contents fail succ resps = (WriteFile name contents) : case (head resps) of Success + succ (tail resps) Failure msg -+= fail msg (tail resps) done resps = [ ]

This pleasing and very efficient translation Another useful aspect of the Haskell de- allows us to write Haskell programs in sign is that it includes a partial (but rigor- either style and mix them freely. ous) specification of the behavior of the OS The complete design, of course, includes itself. For example, it introduces the notion a fairly rich set of primitive requests besides of an agent that consumes data on out- ReadFile and WriteFile, including a set of put, channels and produces data on input requests for communicating through chan- channels. The user is then modeled as an nels, which include things such as standard agent that consumes standard output and

ACM Computing Surveys, Vol. 21, No. 3, September 1989 396 l Paul Hudak produces standard input. This particular translated almost verbatim into a func- agent is required to be strict in the standard tional language. The main philosophy is to output, corresponding to the notion that treat the entire array as a single entity the user reads the terminal display before defined declaratively rather than as a place typing at the keyboard. No other language holder of values that is updated incremen- design that I am aware of has gone this far tally. This, in fact, is the basis of the APL in specifying the I/O system with this de- philosophy (see Section 1.4), and some gree of precision; it is usally left implicit in researchers have concentrated on com- the specification. It is particularly impor- bining functional programming ideas with tant in this context however, because the those from APL [Tu 1986; Tu and Perlis proper semantics relies critically on how 19861. The reader may find good general the OS consumes and produces the request discussions of arrays in functional pro- and response lists. gramming languages in Wadler [ 19861, To conclude this section I will show two Hudak [1986a], and Wise [1987]. In the Haskell programs that prompt the user for remainder of this section I will describe the name of a file, read and echo the file Haskell’s arrays, which originated from name, and then look up and display the some ideas in Id Nouveau [Nikhil et al. contents of the file on standard output. The 19861. first version uses the stream model, the Haskell has a family of multidimensional second the continuation model. nonstrict immutable arrays whose special [The operator !! is the list selection opera- interaction with list comprehensions pro- tor; thus xs !! n is the nth element in the vides a convenient array comprehension list xs.] syntax for defining arrays monolithically.

main resps = [ AppendChannel “stdout” “please type a filename\CR\“, if (resps!!l == Success) then (ReadChannel “stdin”), AppendChannel “stdout” fname, if (resps!!3 == Success) then (ReadFile fname), AppendChannel “stdout” (case resps !! 4 of Failure msg ---f “can’t open” Return file-contents -+ file-contents) ] where fname = case resps !! 2 of Return user-input --+ get-line user-input

main = appendchannel “stdout” “please type a filename\CR\” exit (readchannel “stdin” exit (\user-input -+ appendchannel “stdout” fname exit (readFile fname (\msg + appendchannel “stdout” “can’t open” exit done) (\contents + appendchannel “stdout” contents exit done)) where fname = get-line user-input)) exit msg = done

3.3 Arrays As an example, here is how to define a vector of squares of the integers from 1 As it turns out, arrays can be expressed to n. rather nicely in a functional language, and . in fact all of the arguments about mathe- a = array (1, n) [ (i, i*i) 1i + [l . . n] ] matical elegance fall in line when using arrays. This is especially true of program The first argument to array is a tuple of development in scientific computation, bounds, and thus this array has size n where textbook matrix algebra can often be and is indexed from 1 to n. The second

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 397

argument is a list of index/value pairs and and the problem of avoiding the construc- is written here as a conventional list com- tion of the many intermediate lists that the prehension. The ith element of an array a second argument to array seems to need. A is written a ! i, and thus in the above case discussion of ways to overcome these prob- we have that a ! i = i*i. lems is found in Anderson and Hudak There are several useful semantic prop- [ 19891. Alternative designs for functional erties of Haskell’s arrays. First, they can be arrays and their implementations may be recursive-here is an example of defining found in Aasa et al. [1987], Holmstrom the first n numbers in the Fibonacci [ 19831, Hughes [1985a], and Wise [ 19871. sequence: Another problem is that array compre- hensions are not quite expressive enough fib = array (0, n) to capture all behaviors. The most conspic- ( I@, 11, (1,1) I ++ [ (i, fib!&l)+fib! (i-2)) 1i c [2 . . n] ] ) uous example of this is the case in which an array is being used as an accumulator, This example demonstrates how we can use say in building a histogram, and thus one an array as a cache, which in this case turns actually wants an incremental update ef- an exponentially poor algorithm into an fect. Thus in Haskell a function called efficient linear one. accumArray is provided to capture this Another important property is that array kind of behavior in a way consistent with comprehensions are constructed lazily, and the monolithic nature of array comprehen- thus the order of the elements in the list is sions (similar ideas are found in Steele et completely irrelevant. For example, we can al. [1986] and Wadler [1986]). It is not, construct an m-by-n matrix using a wave- however, clear that this is the most general front recurrence, where the north and west solution to the problem. An alternative ap- borders are 1 and each other element is proach is to define an incremental update the sum of its north, northwest, and west operator on arrays, but then even nastier neighbors, as follows: efficiency problems arise, since (concep- tually at least) the updates involve copying. a = array ((lm), (l,n)) Work on detecting when it is safe to imple- ( [ ((1,lM) 1 ++ ment such updates destructively has [ ((i,l),l) ( i c [2.. m] ] ++ resulted in at least one efficient implemen- 1 ((l,i),l) lj + 12.. nl I ++ tation [Bloss 1988; Bloss and Hudak N.d.; [ ((i,j), a!(&1,;) + a!(i,j-1) + a!(i-l,j-1)) Hudak and Bloss 19851, although the lic[2.. ml,j-P..nl I) analysis itself is costly. Nevertheless, array comprehensions The elements in this result can be accessed have the potential for being very useful, in any order-the demand-driven effect of and many interesting applications have al- lazy evaluation will cause the necessary ready been programmed using them (see elements to be evaluated in an order con- Hudak and Anderson [1988] for some ex- strained only by data dependencies. It is amples). It is hoped that future research this property that makes array comprehen- will lead to solutions to the remaining sions so useful in scientific computation, problems. where recurrence equations express the same kind of data dependencies. In imple- 3.4 Views menting such recurrences in FORTRAN we must be sure that the elements are eval- Pattern matching (see Section 2.4) is very uated in an order consistent with the de- useful in writing compact and readable pro- pendencies-lazy evaluation accomplishes grams. Unfortunately, knowledge of the that for us. concrete representation of an object is nec- On the other hand, although elegant, ar- essary before pattern matching can be in- ray comprehensions may be difficult to im- voked, which seems to be at odds with the plement efficiently. There are two main notion of an abstract datatype. To reconcile difficulties: the standard problem of over- this conflict Wadler [1987] introduced a coming the inefficiencies of lazy evaluation notion of views.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 398 l Paul Hudak

A view declaration introduces a new With this view, 7 is viewed as equivalent to algebraic datatype, just like a data decla- ration, but in addition establishes an iso- succ (Succ (Succ (Succ (Succ (Succ (Succ morphism between the values of this new Zero)))))) type and a subset of the values of an exist- Note that fromView defines a mapping of ing algebraic datatype. For example, any finite element of Peano into an integer, data Complex = Rectangular Float Float and toView defines the inverse mapping. view Complex = Polar Float Float Given this view, we can write definitions where toView (Rectangular x y) such as = Polar (sqrt (x**2+y**2)) fat Zero = 1 (arctan ( y/x 1) fromview (Polar r t) ’ (Succ n) = (Succ n) * (fat n) = Rectangular (r*(cos t)) which is very useful from an equational (r*(sin t)) reasoning standpoint, since it allows us [Views are not part of Haskell, but as with to use an abstract representation of inte- abstract datatypes we will use Haskell syn- gers without incurring any performance tax, here extended with the keyword view.] overhead-the view declarations provide Given the datatype Complex, we can read enough information to map all of the the view declaration as, “One view of Com- abstractions back into concrete implemen- plex contains only Polar values; to translate tations at compile time. from a Rectangular to a Polar value, use On the other hand, perhaps the most the function toView; to translate the other displeasing aspect of views is that an im- way, use fromview.” plicit coercion is taking place, which may Having declared this view, we can now be confusing to the user. For example, in use pattern matching using either the Rec- tangular constructor or the Polar construc- case (Foo a b) of tor; similarly, new objects of type Complex Fooxy+exp can be built with either the Rectangular or we cannot be sure in exp that a==x and Polar constructors. They have precisely the b==y. Although views were considered in same status, and the coercions are done an initial version of Haskell, they were automatically. For example, eventually discarded, in a large part be- rotate (Polar r t) angle = Polar r (t+angle) cause of this problem. As the example stands, objects of type Complex are concretely represented with 3.5 Parallel Functional Programming the Rectangular constructor, but this deci- An often-heralded advantage of functional sion could be reversed by making Polar the languages is that parallelism in a functional concrete constructor and Rectangular the program is implicit; it is manifested solely view, without altering any of the functions through data dependencies and the seman- that manipulate objects of type Complex. tics of primitive operators. This is in con- Whereas traditionally abstract data trast to more conventional languages, types are regarded as hiding the represen- where explicit constructs are typically used tation, with views we can reveal as many to invoke, synchronize, and in general co- representations (zero, one, or more) as are ordinate the concurrent activities. In fact, required. as discussed earlier, many functional lan- As a final example, consider this defini- guages were developed simultaneously with tion of Peano’s view of the natural number work on highly parallel dataflow and reduc- subset of integers: tion machines, and such research continues view Integer = Zero 1Succ Integer today. where fromView Zero = 0 In most of this work, parallelism in ’ (Succ n) 1n>=O =n+l a functional program is detected by the toView 0 = Zero system and allocated to processors auto- ’ n 1n>O = succ (n-l) matically. Although in certain constrained

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 399 classes of functional languages the mapping form: of process to processor can be determined exp on proc optimally [Chen 1986; Delosme and Ipsen 19851, in the general case the optimal strat- [on is a hypothetical keyword, and is not egy is undecidable, so heuristics such as valid Haskell] which intuitively declares load balancing are often used instead. that exp is to be computed on the processor But what if a programmer knows a good identified by proc. The expression exp is (perhaps optimal) mapping strategy for a the body of the mapped expression and program executing on a particular machine, represents the value to which the overall but the compiler is not smart enough to expression will evaluate (and thus can determine it? And even if the compiler is be any of expression. including another smart enough, how does one reason about mapped expression). The expression proc such behavior? We could argue that the must evaluate to a processor id. Without programmer should not be concerned about loss of generality the processor ids, or pids, such details, but that is a difficult argument are assumed to be integers, and there is to make to someone whose job is precisely some predefined mapping from those inte- to invent such algorithms. gers to the physical processors they denote. To meet these needs, various researchers Returning now to the example, we can have designed extensions to functional lan- annotate the expression (el+e2) as follows: guages, resulting in what I like to call (el on 0) + (e2 on 1) parafunctional programming languages. The extensions amount to a metalanguage where 0 and 1 are processor ids. Of course, (e.g., annotations) to express the desired this static mapping is not very interesting. behavior. Examples include annotations to It would be nice, for example, if we were control evaluation order [Burton 1984; able to refer to a processor relative to the Darlington and While 1987; Sridharan currently executing one. We can do this 19851, prioritize tasks, and map processes through the use of the reserved identifier to processors [Hudak 1986c; Hudak and self, which when evaluated returns the pid Smith 1986; Keller and Lindstrom 19851. of the currently executing processor. Using Similar work has taken place in the Prolog self we can now be more creative. For ex- community [Shapiro 19841. In addition, re- ample, suppose we have a ring of n proces- search has resulted in formal operational sors that are numbered consecutively; we semantics for such extensions [Hudak may then rewrite the above expression as 198613; Hudak and Anderson 19871. In (el on left self) + (e2 on right self) the remainder of this section one kind of where left pid = mod (pid-1) rz parafunctional behavior will be demon- right pid = mod (pid+l) n strated-that of mapping program to ma- [mod x y computes x modulo y.], which chine (based on the work in Hudak [1986c] denotes the computation of the two sub- and Hudak and Smith [1986]). expressions in parallel on the two neigh- The fundamental idea behind process-to- boring processors, with the sum being processor mapping is quite simple. Con- computed on self. sider the expression el+e2. The strict To see that it is desirable to bind self semantics of + allows the subexpressions dynamically, consider that one may wish el and e2 to be executed in parallel-this successive invocations of a recursive call to is an example of what is meant by saying be executed on different processors-this is that the parallelism in a functional program not easily expressed with lexically bound is implicit. But suppose now that we wish annotations. For example, consider the fol- to express precisely where (i.e., on which lowing list-of-factorials program, again processor) the subexpressions are to be using a ring of processors: evaluated; we may do so quite simply by annotating the subexpressions which ap- (map fat [2,3,4]) on 0 propriate mapping information. An expres- where map f [] = [I sion annotated in this way is called a f (xxs) = f x : ((map f xs) on mapped expression, which has the following (right self))

ACM Computing Surveys, Vol. 21, No. 3, September 1989 400 ’ Paul Hudak

Note that the recursive call to map is cache by mapped onto the processor to the right of cache fn = \n + (array (0,) the current one, and thus the elements 2, [(i&z i) 1it[O..max]]) ! n 6, and 24 in the result list are computed on processors 0, 1, and 2, respectively. where we assume max is the largest argu- Parafunctional programming languages ment to which fib will be applied. Expand- have been shown to be adequate in express- ing out the definitions and syntax yields ing a wide range of deterministic parallel fib n = (array (0,max) algorithms clearly and concisely [Hudak where k$lj i HO..maxll) ! n 1986c; Hudak and Smith 19861. It remains to be seen, however, whether the pragmatic ’ l=l concerns that motivate these kinds of lan- ’ n = fib (n-1) + fib (n-2) guage extensions persist, and if they do, which is exactly the desired result.” As a whether or not can become smart methodology this is very nice, since librar- enough to perform the optimizations auto- ies of useful caching functionals may be matically. Indeed these same questions can produced and reused to cache many differ- be asked about other language extensions, ent functions. There are limitations to the such as the memoization techniques dis- approach, as well as extensions, all of which cussed in the next section. are described in Keller and Sleep [1986]. One of the limitations of this approach 3.6 Caching and Memoization is that in general it can only be used with Consider this simple definition of the strict functions, and even then the expense Fibonacci function: of performing equality checks on, for ex- ample, list arguments can be expensive. As fibO=l a solution to this problem. Hughes intro- ’ l=l duced the notion of lazy memo-functions, ’ n = fib (n-l) + fib (n-2) in which the caching strategy uses an iden- Although simple, it is hopelessly inefficient. tity test (EQ in Lisp terminology) instead We could rewrite it in one of the classic of an equality test [Hughes 1985131.Such a ways, but then the simplicity and elegance strategy can no longer be considered as of the original definition is lost. Keller and syntactic sugar, since an identity predicate Sleep [ 19861 suggest an elegant alternative: is not something normally provided as a Provide syntax for expressing the caching primitive in functional languages because or memoization of selected functions. For it is implementation dependent. Neverthe- example, the syntax might take the form of less, if built into a language lazy memo- a declaration that precedes the function functions provide a very efficient (constant definition, as in time) caching mechanism and allow very elegant solutions to a class of problems not memo f’ib using cache solved by Keller and Sleep’s strategy: those fibO=l ’ l=l involving infinite data structures. For ex- ’ n = fib (n-l) + fib (n-2) ample, consider this definition of the infi- nite list of ones: which would be syntactic sugar for ones = 1 : ones fib = cache fib1 where fib1 0 = 1 Any reasonable implementation will rep- ’ l=l resent this as a cyclic list, thus consuming ’ n = fib (n-l) + fib (n-2) constant space. Another common idiom is the use of higher-order functions such as The point is that cache is a user-defined function that specifies a strategy for cach- ing values of fib. For example, to cache ” This is essentially the same solution as the one given values in an array, we might define in Section 3.3.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 401 map: minism completely destroys referential transparency, as we shall see. twos = map (\x+2*x) ones By way of introduction, McCarthy [ 19631 But now note that only the cleverest of defined a binary nondeterministic operator implementations will represent this as a called amb having the following behavior: cyclic list, since map normally generates a amb(el, I) = el new list cell on every recursive call. By lazy amb(l., e2) = e2 memoizing map, however, the cyclic list will amb(e,, e2) = either e, or e2, be recovered. To see how, note that the first chosen nondeterministically recursive call to map will be “map (\x+ 2*x) (tail ones)“-but (tail ones) is identi- The operational reading of amb(e,, e2) is cal to ones (remember that ones is cyclic), that e, and e2 are evaluated in parallel, and and thus map is called with arguments the one that completes first is returned as identical to the first call. Thus the old value the value of the expression. is returned, and the cycle is created. Many To see how referential transparency is more practical examples are discussed in lost, consider this simple example: Hughes [ 1985b]. (amb 1 2) + (amb 1 2) The interesting thing about memoization in general is that it begins to touch on some Is the answer 2 or 4? Or is it perhaps 3? of the limitations of functional languages- The possibility of the answer 3 indicates in particular, the inability to side effect that referential transparency is lost-there global objects such as caches-and solu- does not appear to be any simple syntactic tions such as lazy memo-functions repre- mechanism for ensuring that we could not sent useful compromises. It remains to be replace equals for equals in a misleading seen whether more general solutions can be way. Shortly, we will discuss possible solu- found that eliminate the need for these tions to this problem, but for now let us ’ special-purpose features. look at an example using this style of non- determinism. 3.7 Nondeterminism Using amb, we can easily define things such as merge that nondeterministically Most programmers (including the very merge two lists, or streams:21 idealistic among us) admit the need for nondeterminism, despite the semantic dif- merge as bs = amb ficulties it introduces. It seems to be an (if (as == [ 1) then bs else essential ingredient of real-time systems, (head as: merge (tail as) bs)) (if (bs == [ 1) then as else such as operating systems and device con- (head bs: merge as (tail bs))) trollers. Nondeterminism in imperative languages is typically manifested by run- which then can be used, for example, in ning in parallel several processes that combining streams of characters from dif- are side effecting some global state-the ferent computer terminals: nondeterminism is thus implicit in the process (merge term1 term2) semantics of the language. In functional languages, nondeterminism is manifested through the use of primitive operators such as amb or merge-the nondeterminism is ” Note that this version of merge: thus made explicit. Several papers have merge [ ] bs = bs been published on the use of such primi- merge as [ ] = as tives in functional programming, and it merge (a:as) (b: bs) appears quite reasonable to program con- = amb (a : merge as (b:bs)) (b : merge (a:as) bs) ventional nondeterministic applications is not correct, since merge I bs evaluates to I, whereas using them [Henderson 1982; Stoye 19851. we would like it to be bs ++ I, which in fact the The problem is, once introduced, nondeter- definition in the text yields.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 402 l Paul Hudak

Using this as a basis, Henderson [1982] An interesting variation on these two show how many operating system problems ideas is to combine them-centralize the can be solved in a pseudofunctional lan- nondeterminism and then use an oracle to guage. Hudak [1986a] uses nondetermin- define it in a referentially transparent way. ism of a slightly different kind to emulate Thus the disadvantages of both approaches the parallel updating of arrays. would seem to disappear. Although satisfying in the sense of being In any case, I should point out that none able to solve real-world kinds of nondeter- of these solutions makes reasoning about minism, these solutions are dissatisfying in nondeterminism any easier, they just make the way they destroy referential transpar- reasoning about programs easier. ency. One might argue that the situation is at least somewhat better than the conven- 3.8 Extensions to Polymorphic-Type tional imperative one in that the nondeter- Inference minism is at least made explicit, and thus The Hindley-Milner type system has cer- one could induce extra caution when rea- tain limitations; an example of this was soning about those sections of a program given in Section 1.6.1. Some success has exhibiting nondeterministic behavior. The been achieved in extending the type system only problem with this is that determining to include other kinds of data objects, but which sections of a program are nondeter- surprisingly little success has been achieved ministic may be difficult-it is not a lexical at removing the fundamental limitations property, but rather a dynamic one, since while still retaining the tractability of the any function may call a nondeterministic type inference problem. It is a somewhat subfunction. fragile system but is fortunately expressive At least two solutions have been pro- enough to base practical languages on it. posed to this problem. One, proposed by Nevertheless, research continues in this Burton [1988], is to provide a tree-shaped area. oracle as an argument to a program from Independently of type inference, consid- which nondeterministic choices may be se- erable research is underway on the expres- lected. By passing the tree and its subtrees siveness of type systems in general. The around explicitly, referential transparency most obvious thing to do is allow types to can be preserved. ‘The problem with this be first class, thus allowing abstraction over approach is that carrying the oracle around them in the obvious way. Through gener- explicitly is cumbersome at best. On the alizations of the type system it is possible other hand, functional programmers al- to model such things as parameterized ready carry around a greater amount of modules, inheritance, and subtyping. This state to avoid problems with side effects, so area has indeed taken on a character of its perhaps the extra burden is not too great. own; a good summary of current work may Another (at least partial) solution was be found in Cardelli and Wegner [1985] proposed by Stoye [1985] in which all of and Reynolds [ 19851. the nondeterminism in a program is forced to be in one place. Indeed to some extent 3.9 Combining Other Programming Language this was the solution adopted in Haskell, Paradigms although for somewhat different reasons. The problem with this approach is that the A time-honored tradition in programming nondeterminism is not eliminated com- language design is to come up with hybrid pletely but rather centralized. It allows rea- designs that combine the best features of soning equationally within the isolated several different paradigms, and functional pieces but not within the collection of programming language research has not pieces as a whole. Nevertheless, the isola- escaped that tradition. I will discuss two tion (at least in Haskell) is achieved syn- such hybrids here, although others exist. tactically, and thus it is easy to determine The first hybrid is combining logic when equational reasoning is valid. A gen- programming with functional program- eral discussion of these issues is found in ming. The “logical variable permits” two- Hudak and Sundaresh [1988]. way matching (via unification) in logic

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages e 403 programming languages, as opposed to the use of expressions to denote a result rather one-way matching (via pattern matching) than a sequence of statements to put to- in functional languages, and thus seems gether a result imperatively and in piece- like a desirable feature to have in a lan- meal. Expressions were an important guage. Indeed its declarative nature fits well feature of FORTRAN, had more of a math- with the ideals of functional programming. ematical flavor, and freed the programmer The integration is, however not easy- of low-level operational detail (this burden many proposals have been made yet none was of course transferred to the compiler). are completely satisfactory, especially in From FORTRAN expressions, to functions the context of higher order functions in Pascal, to expression oriented program- and lazy evaluation. See Degroot and ming style in Scheme-these advances are Lindstrom [1985] for a good summary of all on the same evolutionary path. Func- results. tional programming can be seen as carrying The second area reflects an attempt to this evolution to its logical conclusion- combine the state-oriented behavior of im- everything is an expression. perative languages in a semantically clean The second argument is based on an way. The same problems arise here as they analogy between functional (i.e., side- do with nondeterminism-simply adding effect-free) programming and structured the assignment statement means that equa- (i.e., goto-less) programming. The fact is, it tional reasoning must always be qualified, is hard to imagine doing without either since it is difficult to determine whether or goto’s or assignment statements, until one not a deeply nested call is made to a func- is shown what to use in their place. In the tion that induces a side effect. There are, case of goto, one uses instead structured however, some possible solutions to this, commands, in the case of assignment state- most notably work by Gifford and Lucassen ments, one uses instead lexical binding and [Gifford and Lucassen 1986; Lucassen and recursion. Gifford 19881 in which effects are captured As an example, this simple program frag- in the type system. In Gifford’s system it is ment with goto’s possible to determine from a procedure’s x := init; type whether or not it is side-effect free. It i := 0; is not currently known, however, whether loop: x := f(x, i); such a system can be made into a type i := i+l; inference system in which type declarations if id0 got0 loop; are not required. This is an active area of current research. can be rewritten in a structured style as x := init; 4. Dispelling Myths About Functional i := 0; Programming while i

ACM Computing Surveys, Vol. 21, No. 3, September 1989 404 l Paul Hudak criteria, and the final value of x is intended Thus the analogy between goto-less as the value computed by the loop. This programming and assignment-free pro- disciplined use of assignment can be cap- gramming runs deep. When Dijkstra first tured by the following assignment-free introduced , much Haskell program: of the programming community was aghast-how could one do without goto? loop init 0 But as people programmed in the new style, where loop x i = if i-40 it was realized that what was being imposed then (loop (fx i) (i+l) was a discipline for good programming not else x a police state to inhibit expressiveness. Ex- actly the same can be said of side-effect- Functions (and procedures) can in fact be free programming, and its advocates hope thought of as a disciplined use of goto and that as people become more comfortable assignment-the transfer of control to the programming in the functional style, they body of the function and the subsequent will appreciate the good sides of the disci- return capture a disciplined use of goto, pline thus imposed. and the formal-to-actual parameter binding When viewed in this way functional lan- captures a disciplined use of assignment. guages can be seen as a logical step in the By inventing a bit of syntactic sugar to evolution of imperative languages-thus, of capture the essence of tail recursion, the course, rendering them nonimperative. On above program could be rewritten as the other hand, it is exactly this purity to which some programmers object, and one let x = init i=O could argue that just as a tasteful use of in while i-40 goto here or there is acceptable, so is a begin next x = f(x, i) tasteful use of a side effect. Such small next i = i+l impurities certainly shouldn’t invalidate end the functional programming style and thus result x may be acceptable. Myth 2 is that functional programming [this syntactic sugar is not found in Has- languages are toys. The first step toward kell, although some other functional (es- dispelling this myth is to cite examples of pecially dataflow) languages have similar efficient implementations of functional features, including Id, Val, and Lucid] languages, of which there now exist several. where the form “next x = . . .” is a construct The Alfl compiler at Yale, for example, (next is a keyword) used to express what generates code that is competitive with that the value of x will be on the next iteration generated by conventional language com- of the loop. Note the similarity of this pilers [Young 19881. Other notable compi- program to the structured one given earlier. ler efforts include the LML compiler at In order to enforce a disciplined use of Chalmers University [Augustsson 19841, assignment properly, we can constrain the the Ponder compiler at Cambridge Univer- syntax so that only one next statement is sity [Fairbairn 19851, and the ML compi- allowed for each identifier (stated another lers developed at and Princeton way, this constraint means that it is a triv- [Appel and MacQueen 19871. ial matter to convert such programs into On the other hand, there are still in- the tail recursive form shown earlier). If we herent inefficiencies that cannot be ig- think of “next x” and “next i” as new nored. Higher-order functions and lazy identifiers (just as the formal parameters evaluation certainly increase expressive- of loop can be thought of as new identifiers ness, but in the general case the overhead for every call to loop), then referential of, for example, the dynamic storage man- transparency is preserved. Functional pro- agement necessary to support them cannot gramming advocates argue that this results be eliminated. in a better style of programming, in much The second step toward dispelling this the same way that structured programming myth amounts to pointing to real applica- advocates argue for their cause. tions development in functional languages

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 405

making a recursive call to the function with including real-world situations involving new values as arguments, or (2) in a data nondeterminism, databases, parallelism, structure that is updated nondestructively and so on. Atlhough examples of this sort so that, at least conceptually, the old value are not plentiful (primarily because of the of the data structure remains intact and youth of the field) and are hard to cite can be accessed later. Although declarative (since papers are not usually written about and referentially transparent, this treat- applications), they do exist. For example, ment of state can present problems to an implementation, but it is certainly not a (1) The dataflow groups at MIT and the problem in expressiveness. Furthermore, functional programming groups at Yale the implementation problems are not as have written numerous functional pro- bad as they first seem, and recent work has grams for scientific computation. So gone a long way toward solving them. For have two national labs: Los Alamos and example, a compiler can convert tail recur- Lawrence Livermore. sions into loops and “single-threaded” data (2) MCC has written a reasonably large structures into mutable ones. expert system (EMYCIN) in SASL. It turns out that, with respect to expres- (3) At least one company is currently mar- siveness, one can use higher-order func- keting a commercial product (a CAD tions not only to manipulate state but also package) that uses a lazy functional to make it appear implicit. To see how, language. consider the imperative program given (4) A group at IBM uses a lazy functional earlier: language for graphics and animation. x := init; (5) The LML (lazy ML) compiler at Chal- i := 0; mers was written almost entirely in loop: X := f(x, i); LML, and the new Haskell compilers i := i + 1; at both Glasgow and Yale are being if (i < 10) got0 loop; written in Haskell. We can model the implicit state in this GEC Hirst Research Lab is using a (6) program as a pair (xval, ival) and define program for designing VLSI circuits several functions that update this state that was written by some researchers in a way that mimics the assignment at Oxford using a lazy functional statements: language. x (xval, ival) xval’ = (xval’, ival) There are other examples. In particular, i (xval, ival) ival’ = (xval, ival’) there are many Scheme and Lisp programs x’(x,i)=x that are predominantly side effect free and i’ (x, i) = i could properly be thought of as functional const u s = u programs. Myth 3, that functional languages cannot We will take advantage of the fact that deal with state, is often expressed as a these functions are defined in curried form. question: How can someone program in a Note how x and i are used to update the language that does not have a notion of state, and x ’ and i ’ are used to access the state? The answer, of course, is that we state. For example, the following function, cannot, and in fact functional languages when applied to the state, will increment i: deal with state very nicely, although the \s-+is((i’s)+l) state is expressed explicitly rather than implicitly. So the issue is more a matter of For expository purposes we would like to how one expresses state and manipulations make the state as implicit as possible, and of it. thus we express the result as a composition State in a functional program is usually of higher-order functions. To facilitate this carried around explicitly in one of two and to make the result look as much like ways: (1) in the values of bound variables the original program as possible, we define of functions, in which case it is updated by the following higher-order infix operators

ACM Computing Surveys, Vol. 21, No. 3, September 1989 406 l Paul Hudak and functions”: double the size of this paper. The interes- ted reader can refer to by far the best f:=g =\s-+fs(gs) single reference to sequential implementa- f;g =\s-g(fs) tions, Peyton Jones’ [ 19871, as well as other got0 f = f techniques that have recently appeared f+‘g=\s-+fs+gs viable [Bloss et al. 1988; Burn et al. 1988; f<‘g=\s-+fs

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Functional Programming Languages l 407

Programs. The MIT Press, Cambridge, Mass., BLOSS, A., AND HUDAK, P. 1987. Path semantics. In and McGraw-Hill New York. Proceedings of Third Workshop on the Mathe- ACKERMAN, W. B., AND DENNIS, J. B. 1979. VAL- matical Foundations of Programming Language A value-oriented algorithmic language prelimi- Semantics. Springer-Verlag LNCS (Tulane nary reference manual. Laboratory for Computer Univ., April 1987), 298, 476-489. Science MIT/LCS/TR-218, MIT. BLOSS, A., HUDAK, P., AND YOUNG, J. 1988. Code ANDERSON, S., AND HUDAK, P. 1989. Efficient com- optimizations for lazy evaluation. Lisp and Sym- pilation of Haskell array cqmprehensions. Tech. bolic Computation: An International Journal 1, Rep. YALEU/DCS/RR693. Yale University, 147-164. Department of Computer Science. BOEHM, H.-J. 1985. Partial polymorphic type infer- APPEL, A. W., AND MACQUEEN, D. B. 1987. A stan- ence is undecidable. In Proceedings of 26th Symp- dard ML compiler. In Proceedings of the 2987 soium on Foundations of Computer Science. IEEE Functional Programming Languages and Com- pp. 339-345. puter Architecture Conference (Sept. 1987). BOUTEL, B. E. 1988. Tui language manual. Tech. Springer-Verlag LNCS 274, IFIP pp. 301-324. Rep. CSD-8-021. Victoria University of Welling- ARVIND AND GOSTELOW, K. P. 1977. A computer ton, Department of Computer Science. capable of exchanging processors for time. In BURGE, W. H. 1975. Recursive Programming Tech- Proceedings IFIP Congress, pp. 849-853. niques. Addison-Wesley, Reading, Mass. ARVIND AND GOSTELOW, K. P. 1982. The U-inter- BURN, G. L., PEYTON JONES, S. L., AND ROBSON, preter. Computer 15, 2, 42-50. J. D. 1988. The spineless G-machine. In Pro- ARVIND AND KATHAIL, V. 1981. A multiple processor ceedings 1988 ACM Conference on Lisp and Func- da.ta flow machine that supports generalized pro- tional Programming (Salt Lake Citv, Utah). ACM cedures. In Proceedings of the 8th Annual Sym- SIGPLAN/SIGACT;SIGART, 244-258. posium on Computer Architecture. Vol. 9, No. 3. BURSTALL, R. M., MACQUEEN, D. B., AND SANNELLA, ACM SIGARCH, pp. 291-302. D. T. 1980. HOPE: An experimental applicative ASHCROFT, E. A., AND WADGE, W. W. 1976a. language. In The 1980 LISP Conference. Stanford Lucid-A formal system for writing and proving University, Santa Clara Univ. The USP Co., pp. programs. SIAM J. Comput. 5, 3,336-354. 136-143. ASHCROFT, E. A., AND WADGE, W. W. 1976b. Lucid, BURTON, F. W. 1984. Annotations to control paral- a nonprocedural language with iteration. lelism and reduction order in the distributed Commun. ACM 20, 519-526. evaluation of functional programs. ACM Trans. AUGUSTSSON, L. 1984. A compiler for Lazy ML. In Program. Lang. Syst. 6, 2,159-174. Proceedings 1984 ACM Conference on LISP and BURTON, F. W. 1988. Nondeterminism with refer- Functional Programming (August). ACM, pp. ential transparency in functional programming 218-227. languages. Comput. J. 3I,3,243-247. AUGUSTSSON, L. 1985. Compiling pattern-matching. CARDELLI, L., AND WEGNER, P. 1985. On under- In Functional Programming Languages and Com- standing types, data abstraction, and poly- puter Architecture. Springer-Verlag LNCS 201, morphism. ACM Comput. Surv. 17,4, 471-522. pp. 368-381. CARTWRIGHT, R. 1976. A practical formal semantic BACKUS, J. 1978. Can programming be liberated definition and verification system for typed Lisp. from the von Neumann style? A functional style Tech. Rep. AIM-296. Stanford Artificial Intelli- and its algebra of programs. Commun. ACM 21, gence Laboratory. 8,613-641. CHEN, M. C. 1986. Transformations of parallel pro- BACKUS, J., WILLIAMS, J. H., AND WIMMERS, E. L. grams in crystal. In Information Processing ‘86, 1986. FL language manual (preliminarv ver- Elsevier North-Holland, New York, pp. 455-462. sion). Tech. Rep.-RJ 5339 (54809). Computer CHURCH, A. 1932-1933. A set of postulates for the Science, IBM Almaden Research Center. Alma- foundation of logic. Ann. Math. 2, 33-34, 346- den, CA. 366,839-864. BARENDREGT, H. P. 1984. The Lambda Calculus, CHURCH, A. 1941. The Calculi of Lambda Conversion. Its Syntax and Semantics. Revised ed. North- Princeton University Press, Princeton, N.J. Holland, Amsterdam. CHURCH, A., AND ROSSER, J. B. 1936. Some prop- BERRY, G. 1978. Sequentialite de l’bvaluation erties of conversion. Trans. Am. Math. SOC. 39, formelle des X-expressions. In Proceedings 3-e 472-482. Colloque International sur la Programmation. CURRY, H. B., AND FEYS, R. 1958. Combinatoty BIRD, R., AND WADLER, P. 1988. Introduction to Logic. Vol. 1. North-Holland, The Netherlands. Functional Programming. Prentice Hall, Engle- DAMAS, L., AND MILNER, R. 1982. Principle type wood Cliffs, N.J. schemes for functional languages. In 9th ACM BLOSS, A. 1988. Path analysis: Using order-of- Symposium on Principles of Programming Lan- evaluation information to optimize lazy func- guages. ACM. tional languages. Ph.D. Dept. Computer Science, DARLINGTON, J., AND WHILE, L. 1987. Controlling dissertation, Yale Univ. the behavior of functional language systems. In

ACM Computing Surveys, Vol. 14, No. 3, September 1989 408

Proceedings of 1987 Functional Programming GOLDBERG, B., AND HUDAK, P. 1988. Implementing Languages and Computer Architecture Confer- functional programs on a hypercube multiproces- ence. Springer-Verlag LNCS 274, pp. 278300. sor. In Proceedings of 3rd Conference on Hyper- DAVIS, A. L. 1978. The architecture and system cube Concurrent Computers and Applications. method of DDM-1: A recursively structured data ACM. driven machine. In Proceedings 5th Annual Sym- GORDON, M. J., MILNER, R., AND WADSWORTH, posium on Computer Architecture. IEEE, ACM. C. P. 1979. Edinburgh LCF. Springer-Verlag DEGROOT, D., AND LINDSTROM, G. 1985. Functional LNCS 78, Berlin. and Logic Programming. Prentice-Hall, Engle- GORDON, M., MILNER, R., MORRIS, L., NEWEY, M., wood Cliffs, N.J. AND WADSWORTH, C. 1978. A metalanguage for DELOSME, J.-M., AND IPSEN, I. C. F. 1985. An illus- interactive proof in LCF. In Conference Record of tration of a methodology for the construction of the 5th Annual ACM Symposium on Princiules of efficient systolic architectures in VLSI. In Pro- Programming Languages. ACM, pp. 119-130. ceedings 2nd International Symposium on VLSI GUTTAG, J., HORNING, J., AND WILLIAMS, J. Technology, Systems, and Applications, ITRI, 1981. FP with data abstraction and strong typ- NSC pp. 268-273. ing. In Proceedings of the 1981 Conference on DENNIS, J. B., AND MISUNAS, D. P. 1974. A prelim- Functional Programming Languages and Com- inary architecture for a basic dataflow processor. puter Architecture. ACM, pp. 11-24. In Proceedings of the 2nd Annual Symposium on HANCOCK, P. 1987. Polymorphic type-checking. In Computer Architecture. ACM, IEEE, pp. 126-132. The Implementation of Functional Programming FAIRBAIRN, J. 1985. Design and implementation of Languages, S. L. Peyton Jones, Ed. Prentice-Hall a simple typed language based on the lambda International, Englewood Cliffs, N.J., Chapters 8 calculus. Ph.D. dissertation, Univ. of Cambridge. and 9. Available as Computer Laboratory TR No. 75. HENDERSON, P. 1980. Functional Programming: FAIRBAIRN, J., AND WRAY, S. 1987. Tim: A simple, Application and Implementation. Prentice-Hall, lazy abstract machine to execute supercombina- Englewood Cliffs, N.J. tors. In Proceedings of 1987 Functional Program- HENDERSON, P. 1982. Purely functional operating ming Languages and Computer Architecture systems. In Functional Programming and Its Conference. Springer Verlag LNCS 274, pp. Applications: An Advance Course. Cambridge 34-45. University Press, pp. 177-192. FIELD, A. J., AND HARRISON, P. G. 1988. Functional HENDERSON, P. AND MORRIS, L. 1976. A lazy eval- Programming. Addison-Wesley, Workingham, uator. In 3rd ACM Symposium on Principles of . Programming Languages. ACM, pp. 95-103. FORTUNE, S., LEIVANT, D., AND O’DONNELL, M. HINDLEY, R. 1969. The principle type scheme of an 1985. The expressiveness of simple and second- object in combinatory logic. Trans. Amer. Math. order type structures. J. ACM 30, 1, 151-185. SOC. 146, 29-60. FRIEDMAN, D. P., AND WISE, D. S. 1976. Cons HOLMSTROM, S. 1983. How to handle large data should not evaluate its arguments. In Automata, structures in functional languages. In Proceedings Languages and Programming, Edinburgh Univer- of SERC/Chalmers Workshop on Declarative Pro- sity Press, pp. 257-284. gramming Languages. SERC. GELERNTER, H., HANSEN, J. R., AND GERBERICH, HUDAK, P. 1984. ALFL Reference Manual and Pro- C. L. 1960. A FORTRAN-compiled list process- grammer’s Guide. 2nd ed. Res. Rep. YALEU/ ing language. J. ACM 7, 2, 87-101. DCS/RR-322. Yale University. GIFFORD, D. K., AND LUCASSEN, J. M. 1986. HUDAK, P. 1986a. Arrays, non-determinism, side- Integrating functional and imperative program- effects, and parallelism: a functional perspective. ming. In Proceedings 1986 ACM Conference on In Proceedings of the Santa Fe Graph Reduction Lisp and Functional Programming. ACM SIG- Workshop (Los Alamos/MCC). Springer-Verlag PLAN/SIGACT/SIGART, pp. 28-38. LNCS 279, pp. 312-327. GIRARD, J.-Y. 1972. Interpretation Fonctionelle et HUDAK, P. 198613. Denotational semantics of a para- Elimination des Coupures dans 1’Arithmetique functional programming language. Int. J. Parallel d’Ordre Superieur. Ph.D. dissertation, Univ. of Program. 15, 2, 103-125. Paris. HUDAK, P. 1986. Para-functional programming. GOLDBERG, B. 1988a. Buckwheat: Graph reduction Computer 19,8, 60-71. on a multiprocessor. In Proceed- HUDAK, P., AND ANDERSON, S. 1987. Pomset inter- ings 1988ACM Conference on Lisp and Functional pretations of parallel functional programs. In Programming (Salt Lake City, Utah, August Proceedings of 1987 Functional Programming 1988) ACM SIGPLAN/SIGACT/SIGART. Languages and Computer Architecture Confer- GOLDBERG, B. 1988b. Multiprocessor execution of ence. Springer Verlag LNCS 274, pp. 234-256. functional programs. Ph.D. dissertation, Dept. of HUDAK, P., AND ANDERSON, S. 1988. Haskell solu- Computer Science, Yale Univ. Available as Tech. tions to the language session problems at the 1988 Rep. YALEU/DCS/RR-618. Salishan high-speed computing conference.

ACM Computing Surveys, Vol. 14, No. 3, September 1989 409

Tech. Rep. YALEU/DCS/RR-627. Department LANDIN, P. J. 1964. The mechanical evaluation of of Computer Science, Yale University. expressions. Comput. J. 6, 4, 308-320. HUDAK, P., AND MOHR, E. 1989. Graphinators and LANDIN, P. J. 1965. A correspondence between the dualitv of SIMD and MIMD. In Proceedings ALGOL 60 and Church’s lambda notation. 1988 ACM Conference on Lisp and Functional Commun. ACM 8,89-101, 158-165. Programming (Salt Lake City, Utah, August). LANDIN, P. J. 1966. The next 700 programming lan- ACM SIGPLAN/SIGACT/SIGART. guages. Commun. ACM 9, 3, 157-166. HUDAK, P., AND SUNDARESH, R. 1988. On the ex- LUCASSEN, J. M., AND GIFFORD, D. K. 1988. pressiveness of purely functional I/O systems. Polymorphic effect systems. In Proceedings of Tech. Rep. YALEU/DCS/RR-665. Department 15th ACM Symposium on Principles of Program- of Computer Science. Yale University. ming Z,anguages. ACM, pp. 47-57. HUDAK, P., AND SMITH, L. 1986. Para-functional MARKOV, A. A. 1951. Teoriya algorifmov (Theory programming: A paradigm for programming of algorithms). Trudy Mat. Inst. Steklou 38, multiprocessor systems. In 22th ACM Symposium 176-189. on Principles of Programming Languages. ACM, MCCARTHY, J. 1960. Recursive functions of symbolic pp. 243-254. expressions and their computation by machine, HUDAK, P., AND WADLER, P. Eds. 1988. Report on Part I. Commun. ACM 3,4, 184-195. the Functional Programming Language Haskell. MCCARTHY, J. 1963. A basis for a mathematical Tech. Rep. YALEU/DCS/RR656. Department of theory of computation. In Computer Program- Computer Science, Yale University. ming and Formal Systems. North-Holland, The HUGHES, J. 1984. Why functional programming Netherlands, pp. 33-70. matters. Tech. Rep. 16. Programming Method- MCCARTHY, J. 1978. History of Lisp. In Preprints of ology Group, Chalmers University of Technology. Proceedings of ACM SZGPLAN History of Pro- HUGHES, J. 1985a. An efficient implementation of gramming Languages Conference. SIGPLAN purely functional arrays. Tech. Rep. Program- Notices, Vol. 13, pp. 217-223. ming Methodology Group, Chalmers University MCGRAW, J. R. 1982. The VAL language: Descrip- of Technology. tion and analysis. TOPLAS, 4, 1, 44-82. HUGHES, J. 198513. Lazy memo-functions. In Func- tional Programming Languages and Computer MCGRAW, J., ALLAN, S., GLAUERT, J., AND DOBES, I. Architecture. Springer-Verlag LNCS 201, pp. 1983. SISAL: Streams and Iteration in a Single- 129-146. Assignment Language, Language Reference Manual. Tech. Rep. M-146. Lawrence Livermore IVERSON, K. 1962. A Programming Language. Wiley, National Laboratory. New York. MILNE, R. E., AND STRACHEY, C. 1976. A Theory of JOHNSON, S. D. 1988. Daisy Programming Manual. Programming Language Semantics. Chapman Tech. Rep. Indiana University Computer Science and Hall, , and John Wiley, New York. Department. MILNER, R. A. 1978. A theory of type polymorphism KAES, S. 1988. Parametric polymorphism. In Pro- in programming. J. Comput. Syst. Sci. 17, 3, ceedings of the 2nd Eupropean Symposium on 348-375. Programming. Springer-Verlag LNCS 300. KELLER, R. M. 1982. FEL programmer’s guide. MILNER, R. 1984. A proposal for Standard ML. In AMPS TR 7. . Proceedings 1984 ACM Conference on LISP and Functional Programming. ACM, pp. 184-197. KELLER, R. M., AND LINDSTROM, G. 1985. Approaching distributed implementa- MULLIN, L. R. 1988. A mathematics of arrays. Ph.D tions through functional programming concepts. dissertation, Computer and Information Science In International Conference on Distributed Sys- and CASE Center, Syracuse University. tems. IEEE. NIKHIL, R. S., PINGALI, K., AND ARVIND. 1986. Id KELLER, R. M., AND SLEEP, R. 1986. Applicative nouveau. Computation Structures Group Memo caching. ACM Trans. Program. Lang. Syst. 8, 1, 265. Laboratory for Computer Science, Massa- 88-108. chusetts Institute of Technology. KELLER, R. M., JAYARAMAN, B., ROSE, D., AND PEYTON JONES, S. L. 1987. The Implementation of LINDSTROM, G. 1980. FGL programmer’s guide. Functional Programming Languages. Prentice- AMPS Tech. Memo 1. Department of Computer Hall International, Englewood Cliffs, N.J. Science, University of Utah. PEYTON JONES, S. L., CLACK, C., SALKILD, J., AND KLEENE, S. C. 1936. X-definability and recursive- HARDIE, M. GRIP-A high-performance archi- ness. Duke Math. J. 2, 340-353. tecture for parallel graph reduction. In Proceed- KLEENE, S. C., AND ROSSER, J. B. 1935. The incon- ings of 1987 Functional Programming Languages sistency of certain forms of logic. Ann. Math. 2, and Computer Architecture Conference. Springer- 36,630-636. Verlag LNCS 274, pp. 98-112. KROEZE, H. J. 198661987. The TWENTEL system PFENNING, F. 1988. Partial polymorphic type infer- (version 1). Tech. Rep. Department of Computer ence and higher-order unification. In Proceedings Science, University of Twente, The Netherlands. 1988 ACM Conference on Lisp and Functional

ACM Computing Surveys, Vol. 14, No. 3, September 1989 410

Programming (Salt Lake City, Utah). ACM SIG- driven computer architectures. Comput. Suru. 14, PLAN/SIGACT/SIGART, pp. 153-163. 1,93-143. POST, E. L. Formal reductions of the general com- Tu, H-C. 1988. FAC: Functional array calculator binatorial decision problem. Am. J. Math. 65, and its application to APL and functional pro- 197-215. gramming. Ph.D. dissertation, Dept. Computer REES, J., AND CLINGER, W. Eds. 1986. The revised Science, Yale Univ. Available as Res. Rep. report on the algorithmic language Scheme. YALEU/DCS/RR-468. SIGPLAN Notices 21, 12, 37-79. Tu, H-C., AND PERLIS, A. J. 1986. FAC: A functional REYNOLDS, J. C. 1974. Towards a theory of type APL language. IEEE Software 3, 1, 36-45. structure. In Proceedings of Colloque sur la TURING, A. M. 1936. On computable numbers with Programmation. Springer-Verlag LNCS 19, pp. an application to the entscheidungsproblem. 408-425. Proc. London Math. SOC. 42, 230-265. REYNOLDS, J. C. 1985. Three approaches to type TURING, A. M. 1937. Computability and X-defina- structure. In Mathematical Foundations of Soft- bility. J. Symbolic Logic 2, 153-163. ware Deuelopment, Springer-Verlag LNCS 185, TURNER, D. A. 1976. SASL language manual. Tech. pp. 97-138. Rep. Univ. St. Andrews. ROSSER, J. B. 1982. Highlights of the history of the lambda-calculus. In Proceedings I982 ACM Con- TURNER, D. A. 1979. A new implementation tech- inque for applicative languages. Softw. Pratt. ference on LISP and Functional Programming. ACM, pp. 216-225. Exper. 9,31-49. SCHMIDT, ,D. A. 1985. Detecting global variables TURNER, D. A. 1981. The semantic elegance of ap- in denotational specifications. ACM Trans. plicative languages. In Proceedings of the 1981 Program. Lung. Syst. 7, 2, 299-310. Conference on Functional Programming Lan- guages and Computer Architecture. ACM, pp. SCH~NFINKEL, M. 1924. Uber die bausteine der 85-92. mathematischen logik. Muthematische Annalen 92, 305. TURNER, D. A., 1982. Recursion equations as a pro- gramming language. In Functional Programming SCOTT, D. S. 1970. Outline of a mathematical theory and Its Applications: An Advanced Course. Cam- of computation. Programming Research Group bridge University Press, New York, pp. l-28. PRG-2, Oxford University. TURNER, D. A. 1985. Miranda: A non-strict func- SHAPIRO, E. 1989: Systolic Programming: A Para- digm of Parallel Processing. Department of tional language with polymorphic types. In Func- tional Programming Languages and Computer Applied Mathematics Tech. Rep. CS84-21, The Weizmann Institute of Science. Architecture. Springer-Verlag LNCS 201, pp. 1-16. SRIDHARAN, N. S. 1985. Semi-applicative program- ming: An example. Tech. Rep. BBN Laboratories. VAN HEIJENOORT, J. 1967. From Frege to Godel. Harvard University Press, Cambridge, Mass. STEELE, JR., L. G., AND HILLIS, D. W. 1986. Connection machine lisp: Fine-grained parallel VEGDAHL, S. R. 1984. A survey of proposed archi- symbolic processing. In Proceedings 1986 ACM tectures for the execution of functional languages. Conference on Lisp and Functional-Programming IEEE Trans. Comput. C-23,12, 1050-1071. (Cambridge. Mass.). ACM SIGPLAN/SIGACT/ VUILLEMIN, J. 1974. Correct and optimal implemen- SIGART,pp. 279-297. tations of recursion in a simple programming STOY, J. E. 1977. Denotational Semantics: The Scott- language. J. Comput. Syst. Sci. 9, 3. Strachey Approach to Programming Language WADGE, W. W., AND ASHCROFT, E. A. 1985. Lucid, Theory. The MIT Press, Cambridge, Mass. the Language. Academic STOYE, W. 1985. A New Scheme for Writing Func- Press, London. tional Operating Systems. Tech. Rep. 56. Com- WADLER, P. 1986. A new array operation. In Work- puter Laboratory, . shop on Graph Reduction Techniques, Springer- THAKKAR, S. S. Ed. 1987. Selected Reprints on Verlag LNCS 279. Dataflow and Reduction Architectures. The Com- WADLER, P. 1987a. Efficient compilation of pattern- puter Society Press, Washington, DC. matching. In The Implementation of Functional TOFTE, M. 1988. Operational semantics and poly- Programming Languages, S. L. Peyton Jones, Ed. morphic type inference. Ph.D. dissertation, Dept. Prentice-Hall International, Englewood Cliffs, Computer Science, Univ. of Edinburgh (CST-52- N.J., Chapter 5. 88). WADLER, P. 1987. Views: A way for pattern-match- TRAKHTENBROT, B. A. 1988. Comparing the Church ing to cohabit with data abstraction. Tech. Rep. and Turing approaches: Two prophetic messages. 34. Programming Methodology Group, Chalmers Tech. Rep. 98/88. Eskenasy Institute of Com- Univ. of Technology, March 1987. Preliminary puter Science, Tel-Aviv University. version appeared in-the Proceedings of the 14th TRELEAVEN, P. C., BROWNBRIDGE, D. R., AND HOP- ACM Symposium on Principles of Programming KINS, R. P. 1982. Data-driven and demand- Languages (January 1987).

ACM Computing Surveys, Vol. 14, No. 3, September 1989 411

WADLER, P., AND BLOTT, S. 1989. How to make ad WEGNER, P. 1968. Programming Languages, Infor- hoc polymorphism less ad hoc. In Proceedings of mation Structures, and Machine Organization. 16th ACM Symposium on Principles of Progmm- McGraw-Hill, New York. ming Languages. ACM, pp. 60-76. WIKSTR~M, A. 1988. Standard ML. Prentice-Hall, WADLER, P., AND MILLER, Q. 1988. An Introduction Englewood Cliffs, N.J. to Orwell. Tech. Rep. Programming Research Group, Oxford University. (First version, 1985.) WISE, D. 1987. Matrix algebra and applicative pro- WADSWORTH, C. P. 1971. Semantics and pragmatics gramming. In Proceedings of 1987 Functional Pro- of the lambda calculus. Ph.D. dissertation, Oxford gramming Languages and Computer Architecture Univ. Conference, Springer Verlag LNCS 274, pp. 134-153. WATSON, P., AND WATSON, I. 1987. Evaluating functional programs on the machine. YOUNG, J. 1988. The Semantic Analysis of Func- In Proceedings of 1987 Functional Programming tional Programs: Theory and Practice. Ph.D. dis- Languages and Computer Architecture Confer- sertation, Dept. Computer Science, Yale Univ., ence. Springer-Verlag LNCS 274, pp. 80-97. 130-142.

Received May 1988; final revision accepted May 1989.

ACM Computing Surveys, Vol. 14, No. 3, September 1989