Flow Analysis of Lazy Higher-Order Functional Programs

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Theoretical Computer Science 375 (2007) 120–136 www.elsevier.com/locate/tcs Flow analysis of lazy higher-order functional programs Neil D. Jones∗, Nils Andersen DIKU, Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark Abstract In recent years much interest has been shown in a class of functional languages including HASKELL, lazy ML, SASL/KRC/MIRANDA, ALFL, ORWELL, and PONDER. It has been seen that their expressive power is great, programs are compact, and program manipulation and transformation is much easier than with imperative languages or more traditional applicative ones. Common characteristics: they are purely applicative, manipulate trees as data objects, use pattern matching both to determine control flow and to decompose compound data structures, and use a “lazy” evaluation strategy. In this paper we describe a technique for data flow analysis of programs in this class by safely approximating the behavior of a certain class of term rewriting systems. In particular we obtain “safe” descriptions of program inputs, outputs and intermediate results by regular sets of trees. Potential applications include optimization, strictness analysis and partial evaluation. The technique improves earlier work because of its applicability to programs with higher-order functions, and with either eager or lazy evaluation. The technique addresses the call-by-name aspect of laziness, but not memoization. c 2007 Elsevier B.V. All rights reserved. Keywords: Collecting semantics; Higher-order program; Program flow analysis; Lazy evaluation; Reynolds analysis of applicative LISP programs Term rewriting system; Tree grammar This paper extends [19] (a chapter in the out-of-print [2]) with proofs that the algorithm terminates and yields safe (i.e., sound or correct) approximations to program behavior. Relevance to this festschrift: John Reynolds’ 1969 and 1972 papers developed a related first-order program analysis framework [27], and inspired our method for handling higher-order functions [28]. 1. Introduction It has long been known that a great amount of information about the syntax of the input data to some programs may be found out by examining the source text of those programs. For one example, the program structure of a recursive descent compiler directly reflects the context-free grammar of its input. For another, it is often relatively straightforward to deduce by hand the syntax of both the input to and the output yielded by a given LISP program, simply by examining its code to see which portions of the input are accessed, in relation to how the program results are constructed. ∗ Corresponding address: University of Copenhagen, DIKU (Datalogisk Institut), Department of Computer Science, Universitetsparken 1, 2100 Copenhagen East, Denmark. E-mail address: [email protected] (N.D. Jones). 0304-3975/$ - see front matter c 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2006.12.030 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 121 In this paper we will systematize these ideas, and present analysis methods which can automatically obtain finite descriptions of the input and output sets of lazy or eager functional programs that manipulate structured data. The same methods also work for higher-order functional programs, by a simple device: the closures typically used to implement functional values are themselves regarded as data structures, to be analyzed by just the same methods as used for any other data structures. One motivation for this work is its application in the MIX system, which uses partial evaluation to generate compilers from interpreters given as input [23]. MIX’s algorithms perform partial evaluation by means of abstract interpretation (i.e., data flow analysis) of an input program over various domains. Further, compiling and compiler generation are accomplished (efficiently!) by applying the partial evaluator to itself. MIX is, however, limited when compared to more traditional semantics-directed compiler generators due to its restricted specification language: essentially first-order LISP. A natural goal has thus been to develop methods for extending MIX’s techniques to encompass higher-order functions. Preview: Analysis of higher-order functions by tracing structured data values Our approach “lambda lifts” [17,28] a given lambda expression to translate it into an equivalent term rewriting system, and then flow analyzes the result. Consider as first example the closed lambda expression: [λX.(λF.F(F2))(λY.X ∗ Y )]5. First account for the free X in (λY.X ∗ Y ) by adding an explicit parameter X 0: [λX.(λF.F(F2)){(λX 0λY.X 0 ∗ Y )X}]5. Now make (λX 0λY.X 0 ∗ Y ) into a function i, make λF.F(F2) into a function h, and [λX.(···){· · ·}] into a function g. The given lambda expression can thereby be transformed into an equivalent term rewriting system where X, X 0, Y, F are variables; @ is a defined operator representing function application; and g, h, i identify the subexpression’s functions. (Technically g, h, i will be seen to be constants.) v → g @ 5 g @ X → h @ (i @ X) h @ F → F @ (F @ 2) i @ X 0 @ Y → X 0 ∗ Y. The program’s computation: v ⇒ g @ 5 ⇒ h @ (i @ 5) ⇒ (i @ 5) @ (i @ 5 @ 2) ⇒ i @ 5 @ 10 ⇒ 50. Note that every function is treated as “curried”. Function application is handled by pattern matching on terms containing the “apply operator” @ (written left associatively), so a function of order n is defined by rewrite rules of the form: f @ X1 @ ··· @ Xn → t (whence this program notation is sometimes called “named combinators”). Curried functions applied to incomplete argument lists yield functional values, for example (i @ X 0) above. The conclusion of all this: higher-order functions can be implemented using structured data values. Thus a flow analysis method able to handle structured data values in term rewriting systems can also analyze programs with higher-order functions. 1.1. Outline A simple language of the sort mentioned in the abstract is presented in Section 2, with programs containing constructors and selectors but for simplicity not atomic values such as integers or reals. Formally, a program is regarded as a term rewriting system of a certain restricted form.1 The nondeterministic rewriting semantics naturally 1 Term rewriting gives a convenient framework to express higher-order functional programs’ semantics and flow analysis, but we neither use nor prove nontrivial results from term rewriting. 122 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 includes lazy evaluation, where function calls and data constructions are performed only as needed to continue execution. As a result programs may be written which (at least conceptually) manipulate infinite data structures, streams, etc, and their behavior can be analyzed by the methods to be developed. The framework allows recursively defined functions in “curried” form, where higher-order (“functional”) values are obtained by applying functions to incomplete argument lists. Section 3 reviews an earlier and simpler construction of a regular tree grammar G from a given program, combining notions of Reynolds [27] and Jones and Muchnick [20]. Characteristic of G is that its nonterminals generate supersets of the computational states, function inputs and function outputs computed by the program on given data. Applications include program optimization, transformation, etc. The construction of Section 3 can be proven to give safe analyses of programs using call-by-value; however it can be unsafe when applied to lazy programs, and cannot deal with higher-order functions. In Section 4 a new algorithm is given to construct tree grammars from programs, able to safely approximate the behavior of lazy programs with higher-order functions. Proofs of correctness and termination of the new construction are given in Section 5. Section 6 contains conclusions, directions for further work and acknowledgements. 1.2. Relation to other work Until 1987 Broad overviews of program flow analysis may be found in Hecht [12] and Muchnick and Jones [25]. The first paper with goals similar to ours was Reynolds [27], which analyzed LISP programs to obtain “data set definitions” in the form of (equationally defined) regular sets of trees approximating inputs to and the results of the functions in a given first-order LISP program. The mathematically equivalent tree grammars were used in [20] in order to approximate the behavior of flow chart programs manipulating LISP-like data structures. Turchin independently developed a method using “covering context-free grammars” to analyze programs in the language REFAL (essentially Markov algorithms equipped with variables, see [33]). His techniques require some rather sophisticated machinery, in particular the use of so-called supercompilation and metasystem transition. Our results extend those of Reynolds and Turchin in two directions: we allow delayed evaluation and higher-order functions; and the method described here is conceptually simpler than Turchin’s. Burn, Hughes, Wadler and coauthors [5,15,35] develop methods for strictness analysis of programs with structured data, i.e., ways to describe access patterns to subtrees of tree-structured data, for use in efficient compilation of a lazy language. Their type-based goals and methods differ from this paper’s. Other work concerning higher-order functions Our method is independent of the program’s type structure and so can handle the type-free lambda calculus. In addition it can do analyses such as “constant propagation” which would be difficult by type-based methods due to the need for an infinitely wide approximation lattice. A rather complicated way to flow analyze lambda expressions (and the first way, to our knowledge) was described in [18]. The present techniques extend and simplify those methods and appear to be at least as strong. This paper’s theoretical basis is similar to that of the minimal function graphs in [21], but applies to higher-order programs with structured data.

Flow Analysis of Lazy Higher-Order Functional Programs

CSCI 2041: Lazy Evaluation

Control Flow

210 CHAPTER 7. NAMES and BINDING Chapter 8

Exam 1 Review: Solutions Capture the Dragons

Top Functional Programming Languages Based on Sentiment Analysis 2021 11

CS1101S Studio Session Week 11: Stream

Monads for Functional Programming

The Scala Programming Language

Comparison Between Lazy and Strict Evaluation

CS 251: Programming Languages Fall 2015 ML Summary, Part 5

Programming Language Concepts Subroutines

Lazy and Eager Evaluation