View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Theoretical Computer Science 375 (2007) 120–136 www.elsevier.com/locate/tcs

Flow analysis of lazy higher-order functional programs

Neil . Jones∗, Nils Andersen

DIKU, Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark

Abstract

In recent years much interest has been shown in a class of functional languages including HASKELL, lazy ML, SASL/KRC/MIRANDA, ALFL, ORWELL, and PONDER. It has been seen that their expressive power is great, programs are compact, and program manipulation and transformation is much easier than with imperative languages or more traditional applicative ones. Common characteristics: they are purely applicative, manipulate trees as data objects, use both to determine control flow and to decompose compound data structures, and use a “lazy” . In this paper we describe a technique for data flow analysis of programs in this class by safely approximating the behavior of a certain class of term rewriting systems. In particular we obtain “safe” descriptions of program inputs, outputs and intermediate results by regular sets of trees. Potential applications include optimization, strictness analysis and partial evaluation. The technique improves earlier work because of its applicability to programs with higher-order functions, and with either eager or lazy evaluation. The technique addresses the call-by-name aspect of laziness, but not . c 2007 Elsevier B.V. All rights reserved.

Keywords: Collecting semantics; Higher-order program; Program flow analysis; Lazy evaluation; Reynolds analysis of applicative LISP programs Term rewriting system; grammar

This paper extends [19] (a chapter in the out-of-print [2]) with proofs that the algorithm terminates and yields safe (i.e., sound or correct) approximations to program behavior. Relevance to this festschrift: John Reynolds’ 1969 and 1972 papers developed a related first-order program analysis framework [27], and inspired our method for handling higher-order functions [28].

1. Introduction

It has long been known that a great amount of information about the syntax of the input data to some programs may be found out by examining the source text of those programs. For one example, the program structure of a recursive descent compiler directly reflects the context-free grammar of its input. For another, it is often relatively straightforward to deduce by hand the syntax of both the input to and the output yielded by a given LISP program, simply by examining its code to see which portions of the input are accessed, in relation to how the program results are constructed.

∗ Corresponding address: University of Copenhagen, DIKU (Datalogisk Institut), Department of Computer Science, Universitetsparken 1, 2100 Copenhagen East, Denmark. E-mail address: [email protected] (N.D. Jones).

0304-3975/$ - see front matter c 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2006.12.030 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 121

In this paper we will systematize these ideas, and present analysis methods which can automatically obtain finite descriptions of the input and output sets of lazy or eager functional programs that manipulate structured data. The same methods also work for higher-order functional programs, by a simple device: the closures typically used to implement functional values are themselves regarded as data structures, to be analyzed by just the same methods as used for any other data structures. One motivation for this work is its application in the MIX system, which uses partial evaluation to generate compilers from interpreters given as input [23]. MIX’s algorithms perform partial evaluation by means of abstract interpretation (i.e., data flow analysis) of an input program over various domains. Further, compiling and compiler generation are accomplished (efficiently!) by applying the partial evaluator to itself. MIX is, however, limited when compared to more traditional semantics-directed compiler generators due to its restricted specification language: essentially first-order LISP. A natural goal has thus been to develop methods for extending MIX’s techniques to encompass higher-order functions.

Preview: Analysis of higher-order functions by tracing structured data values Our approach “lambda lifts” [17,28] a given lambda expression to translate it into an equivalent term rewriting system, and then flow analyzes the result. Consider as first example the closed lambda expression: [λX.(λF.F(F2))(λY.X ∗ Y )]5. First account for the free X in (λY.X ∗ Y ) by adding an explicit parameter X 0: [λX.(λF.F(F2)){(λX 0λY.X 0 ∗ Y )X}]5. Now make (λX 0λY.X 0 ∗ Y ) into a function i, make λF.F(F2) into a function h, and [λX.(···){· · ·}] into a function g. The given lambda expression can thereby be transformed into an equivalent term rewriting system where X, X 0, Y, F are variables; @ is a defined operator representing function application; and g, h, i identify the subexpression’s functions. (Technically g, h, i will be seen to be constants.) v → g @ 5 g @ X → h @ (i @ X) h @ F → F @ (F @ 2) i @ X 0 @ Y → X 0 ∗ Y. The program’s computation: v ⇒ g @ 5 ⇒ h @ (i @ 5) ⇒ (i @ 5) @ (i @ 5 @ 2) ⇒ i @ 5 @ 10 ⇒ 50. Note that every function is treated as “curried”. Function application is handled by pattern matching on terms containing the “apply operator” @ (written left associatively), so a function of order n is defined by rewrite rules of the form:

f @ X1 @ ··· @ Xn → (whence this program notation is sometimes called “named combinators”). Curried functions applied to incomplete argument lists yield functional values, for example (i @ X 0) above. The conclusion of all this: higher-order functions can be implemented using structured data values. Thus a flow analysis method able to handle structured data values in term rewriting systems can also analyze programs with higher-order functions.

1.1. Outline

A simple language of the sort mentioned in the abstract is presented in Section 2, with programs containing constructors and selectors but for simplicity not atomic values such as integers or reals. Formally, a program is regarded as a term rewriting system of a certain restricted form.1 The nondeterministic rewriting semantics naturally

1 Term rewriting gives a convenient framework to express higher-order functional programs’ semantics and flow analysis, but we neither use nor prove nontrivial results from term rewriting. 122 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 includes lazy evaluation, where function calls and data constructions are performed only as needed to continue execution. As a result programs may be written which (at least conceptually) manipulate infinite data structures, streams, etc, and their behavior can be analyzed by the methods to be developed. The framework allows recursively defined functions in “curried” form, where higher-order (“functional”) values are obtained by applying functions to incomplete argument lists. Section 3 reviews an earlier and simpler construction of a regular tree grammar G from a given program, combining notions of Reynolds [27] and Jones and Muchnick [20]. Characteristic of G is that its nonterminals generate supersets of the computational states, function inputs and function outputs computed by the program on given data. Applications include program optimization, transformation, etc. The construction of Section 3 can be proven to give safe analyses of programs using call-by-value; however it can be unsafe when applied to lazy programs, and cannot deal with higher-order functions. In Section 4 a new algorithm is given to construct tree grammars from programs, able to safely approximate the behavior of lazy programs with higher-order functions. Proofs of correctness and termination of the new construction are given in Section 5. Section 6 contains conclusions, directions for further work and acknowledgements.

1.2. Relation to other work

Until 1987 Broad overviews of program flow analysis may be found in Hecht [12] and Muchnick and Jones [25]. The first paper with goals similar to ours was Reynolds [27], which analyzed LISP programs to obtain “data set definitions” in the form of (equationally defined) regular sets of trees approximating inputs to and the results of the functions in a given first-order LISP program. The mathematically equivalent tree grammars were used in [20] in order to approximate the behavior of flow chart programs manipulating LISP-like data structures. Turchin independently developed a method using “covering context-free grammars” to analyze programs in the language REFAL (essentially Markov algorithms equipped with variables, see [33]). His techniques require some rather sophisticated machinery, in particular the use of so-called supercompilation and metasystem transition. Our results extend those of Reynolds and Turchin in two directions: we allow delayed evaluation and higher-order functions; and the method described here is conceptually simpler than Turchin’s. Burn, Hughes, Wadler and coauthors [5,15,35] develop methods for strictness analysis of programs with structured data, i.e., ways to describe access patterns to subtrees of tree-structured data, for use in efficient compilation of a lazy language. Their type-based goals and methods differ from this paper’s.

Other work concerning higher-order functions Our method is independent of the program’s type structure and so can handle the type-free . In addition it can do analyses such as “constant propagation” which would be difficult by type-based methods due to the need for an infinitely wide approximation lattice. A rather complicated way to flow analyze lambda expressions (and the first way, to our knowledge) was described in [18]. The present techniques extend and simplify those methods and appear to be at least as strong. This paper’s theoretical basis is similar to that of the minimal function graphs in [21], but applies to higher-order programs with structured data. In a different paper by the same authors [26], a simple and general framework for abstract interpretation of lambda expressions is described, and an important distinction is drawn: between questions that concern “global” program behavior (e.g., type analysis) and those that concern “local” behavior (e.g., what is the range of values a given variable may assume at a particular point in the program?). The framework of [26] applies only to global questions and takes no account of structured data, while it will be seen that the current (quite different and more operational) techniques naturally yield local flow information, and can as well handle tree-structured data. A standard trick for implementing functional values is to use closures, where a closure consists of a function name and the values of some of its parameters. The “lambda-lifting” transformation, i.e., the idea to make the apply function explicit, and to replace function parameters by explicit names for functions, does this at source code level. It has roots in Reynolds’ defunctionalization [28]. N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 123

Since 1987 Since [19] (this paper’s predecessor) much research has been done on flow analysis of higher-order programs, including the following. Untyped functional languages: Shivers’ Ph.D. thesis [29] focused on control-flow analysis (CFA), and Heintze’s Ph.D. thesis [13] focused on sets of data values (set-based analysis, at the start essentially a constraint-based version of regular tree grammars). Both of these have close connections with this paper’s results, and each has led to several papers and much further research, e.g., Jagannathan and Weeks [16], and Stefanescu and Zhou [30]. A “minimal function graph” approach using higher-order functions at analysis time may be seen in [22]. Typed functional languages: Xi’s Ph.D. thesis focused on tracing value flow via data types for termination verification of higher-order programming languages [36], and Abel has another approach to termination checking with higher-order types and recursive data structures [1]. The term rewriting community has begun to study higher-order term rewriting systems for their own sake, including flow analysis aspects. Research in this direction has been done by Giesl et al. [11], Toyama [32] and several others.

Forward or backward analyses? This somewhat technical subsection may be skipped on first reading. The methods of [27,20,35] and this paper correspond to Cousot’s “forward analyses” [6]. They begin with a description of program inputs and derive from these an approximation to the set of all program states reachable during computations on the given input data. In this context a “safe approximation” requires accounting for all reachable computational states. On the other hand, several including Hughes [15] and Turchin [33] have used various forms of “backwards analysis” that extract from the program alone, without input description, the form of the input data that can drive it. In this context “safety” has a different flavor: recognition of all possible patterns of data access, including “must be evaluated” and “may be evaluated”. This approach can be used for strictness analysis. The precise relation between the two approaches is not yet entirely clear, though there are certainly connections with Dijkstra’s strongest postconditions and weakest preconditions [8]. Forward analyses seem more relevant to this paper, for at least three reasons: (1) We are interested in higher-order functions. These are hard to handle in a backwards analysis; for example, assuming a function-valued argument to range over all possible functions will yield far too conservative results. In particular it seems hard to formulate natural preconditions on program inputs when they include functions. (2) An intended application is partial evaluation. In this case one starts with the precise values of some (typically not all) program inputs. It is most natural to propagate this information forwards. (3) The result of analysis by either approach is a description of the program’s data in the form of a grammar (or something similar), which closely resembles a “data type” for structured data. If one accepts the motto that well-typed programs can’t go wrong then forward analysis seems more relevant since its safety concept stresses accounting for all reachable computational states; if the analysis cannot reveal the possibility of an error such as 3+true then that error cannot possibly occur. Forward methods thus seem to provide a more natural framework for type checking.

2. A functional language

The syntax and operational semantics of the functional language discussed in the rest of the paper are defined using terminology from the theory of term rewriting systems. For simplicity of notation we assume only one data sort; the paper’s results extend to the many-sorted case without difficulty.

2.1. Programs: Syntax and operational semantics

−1 A signature is a set Σ furnished with a function arity : Σ → N0. Σn = arity (n) is the set of n-ary operators. Operators of arity 0 are called constants. We assume the signature is partitioned into Σ = Γ + ∆ (where + is disjoint union) and call the operators in Γ constructors and those in ∆, defined operators.2

2 Defined operators correspond to the functions that are defined by a functional program. 124 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

Let TΣ (V ) be the set of terms t over operators in Σ and a denumerable set V of variables (disjoint from ∪Σi ). Thus TΣ (V ) is the smallest set containing V such that if t1,..., tn ∈ TΣ (V ) and n = arity(op) then op t1 ··· tn ∈ TΣ (V ). Terms in TΣ (without variables) are also called ground terms, and terms in TΓ , built from constructors only, are ground constructor terms.A call is a term of form δt1 ··· tn with δ ∈ ∆. We denote elements of TΣ (V ) by t (possibly decorated); elements of TΣ by g (for “ground term”); and elements of TΓ by c (for “constructor term”). The set of variables occurring in a term t is denoted by Vars(t).

Definition 2.1. A term rewriting system over Σ is a set of rewrite rules {pi → ti }i∈I where I is an index set (usually finite), left and right sides pi , ti ∈ TΣ (V ), and each ti is a term using only variables from the left side pi .A program over Σ is a term rewriting system such that each “pattern” pi is a call in which no variable occurs more than once.

Operational semantics Informally: a term rewriting system defines a relation ⇒ over its set of ground terms, and computation may be thought of as the transitive closure of ⇒ (note that it is not necessarily deterministic). The constraints on pi , ti in programs are computationally motivated: The constraint that pi be a call ensures that once a constructor appears outside the scope of any defined operator, then it cannot be further rewritten; the variable name constraints ensure that the rewritten term can be built from ti using only the bindings got from matching with variables in pi , and that equality must be tested for explicitly. More precisely: A context t[] is a “term with a hole”, i.e., a term from TΣ (V ∪ {[ ]}) containing just one occurrence of [] (a 0-ary operator). An occurrence of term t0 in context t[] is a pair (t[], t0). More compactly, we write this as t[t0], and also (ambiguously) use t[t0] to denote the term obtained by replacing the [] in t[] by t0. (In the examples, square brackets are also used in list denotations; hopefully, this doesn’t lead to confusion.) A substitution is a function θ : V → TΣ (V ). We do not differentiate between a substitution and its natural extension to terms, θ : TΣ (V ) → TΣ (V ). θid denotes the identity mapping on terms.

Definition 2.2. The derivation relation ⇒ ⊆ TΣ × TΣ is defined by:

t[θpi ] ⇒ t[θti ] ∗ for any ground context t[], rule pi → ti and ground substitution θ. The relation ⇒ is the reflexive transitive closure of ⇒. A term in TΓ is said to be in normal form. Note that if c is in normal form there is no term c0 such that c ⇒ c0. For programming languages it is desirable that the normal forms derived from initial calls are unique when they exist. A sufficient condition for uniqueness is the Church–Rosser property: if t ⇒ t0 and t ⇒ t00 then there exists t000 with t0 ⇒∗ t000 and t00 ⇒∗ t000. A unifier of two terms t, t0 is a substitution θ such that θt = θt0. It is well known that if two terms are unifiable, then there exists a most general unifier mgu(t, t0), such that any unifier of t, t0 is a refinement of mgu(t, t0). There exists a computable function mgu : T × T → Subst ∪ {fail} (where Subst is the set of all substitutions) such that: mgu(t, t0) = a most general unifier of t, t0 if they are unifiable, else fail. We use only the special case of matching, where t0 is a ground term. It is easy to see that if distinct rules in a program have nonunifiable, left sides, then the Church–Rosser property holds (weaker conditions will also suffice).

2.2. Examples and special cases

The following two list-manipulating programs use the convention of writing nil or [] for the empty list, “cons” as an infix list constructor written :, and [t1, t2,..., tn] to stand for the list t1 : (t2 : (. . . : (tn : nil) . . .)). Variable names begin with upper case letters to make it easier to distinguish them from constants or operators (an idea taken from Prolog). The function append is a well-known example; Reynold’s function ss from [27] yields the list of all subsets of {x1,..., xn} when given a list [x1,..., xn] of distinct atoms. Signature Σ:

Γ0 = {nil, 0, 1,...}, Γ1 = {}, Γ2 = {:}, ∆1 = {ss}, ∆2 = {append}, ∆3 = {aux}. N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 125

Rules: append(nil, Xs) → Xs append(X : Xs, Ys) → X : append(Xs, Ys)

ss(nil) → [nil] ss(U : V ) → aux(U, ss(V ), ss(V )) aux(W, nil, Z) → Z aux(W, X : Y, Z) → (W : X) : aux(W, Y, Z).

Inside-out versus outside-in Consider the following program. Given input n, it produces the first n elements of the infinite list: [nil, [1], [1, 1], [1, 1, 1],...]. Here n is given as a list of n ones, so for example g([1, 1, 1]) = [nil, [1], [1, 1]]. g(N) → first(N, sequence(nil)) first(nil, Xs) → nil first((1 : M), (X : Xs)) → X : first(M, Xs) sequence(Y ) → Y : sequence(1 : Y ). A computation of g([1, 1]): g([1, 1]) ⇒ first([1, 1], sequence(nil)) ⇒ first([1, 1], nil : sequence([1])) ⇒ nil : first([1], sequence([1])) ⇒ nil : first([1], [1]: sequence([1, 1])) ⇒ nil :[1]: first(nil, sequence([1, 1])) ⇒ nil :[1]: nil = [nil, [1]]. The point of this example is that programs may be written which one thinks of as using infinite data structures, streams etc., even though no such concepts are explicitly present in the term rewriting framework. Note that this program’s termination requires use of an “outside-in” evaluation strategy; it will loop infinitely if calls are done “by value”, i.e., inside-out. These terms are now defined precisely:

Definition 2.3. A redex is an occurrence of a term θpi in t[θpi ] where pi → ti is a rule and θ is a substitution. It is called innermost if θpi contains no other redexes, and outermost if the occurrence [θpi ] is contained in no other redex. The innermost and outermost derivation relations ⇒ , ⇒ ⊆ TΣ × TΣ are defined by: t[θpi ] ⇒ t[θti ], provided io oi io [θpi ] is innermost, and t[θpi ] ⇒ t[θti ], provided [θpi ] is outermost. oi Outside-in evaluation is obtained by using ⇒ instead of ⇒, and similarly for inside-out. Clearly ⇒ and oi io ⇒ coincide with ⇒ on programs without nested calls such as g(. . . , f (−−), . . .). Inside-out evaluation has been oi called: call-by-value or applicative-order, and outside-in evaluation has been called: call-by-name or normal-order evaluation. We have declined to give a definition of “lazy evaluation” since there seem to be some disagreements as to just what it is (e.g., what is “maximally lazy?”), and it is not entirely clear how the memoization concept should be expressed in the context of term rewriting systems. One natural suggestion: equate lazy evaluation with leftmost outside-in evaluation. This yields the computation above, and has the natural “pattern-directed” character often seen (e.g., [34]). On the other hand there are objections: the scheme is still nondeterministic; it can reduce expressions that were interior to a λ in the original lambda expression and so does not yield the usual “head normal form”; and there is a potential need to scan the entire term in order to discover the next redex (although schemes to reduce such overhead have been suggested, e.g., [14]). Finally, an observation: it won’t matter that we have no iron clad definition of “lazy”, since the grammar to be constructed will safely approximate all reachable computational states, regardless of evaluation order. Further, an alternative will be proposed that traces just the outside-in derivations. 126 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

Higher-order functions In the popular “named combinator” style for writing functional programs (HASKELL, etc.), every function f is thought of as “curried”, and a definition might be an equation of form: f x1 ... xn = expression. An incomplete function application (such as f e1 ... em with m < n) thus has a mathematical function as its denotation. This program style is easily expressed in a term rewriting context by modelling a standard implementation technique. Traditionally, a functional value is represented by a closure of the form (λx.exp, v1, . . . , vn) where v1, . . . , vn is a list of the values of the free variables in λx.exp. Lambda-lifting [17] converts λx.exp to a function (say f ) of those free variables, so a closure becomes an incompletely applied function: (. . . (( f v1)v2) . . . vn). From this viewpoint, a function name such as f is regarded not as a defined operator but as a constant, and closures will be formed using a binary closure-forming operator @. Following is a traditional example from LISP with defined operator ∆2 = {@} and constructors Γ0 = {map, double, cons, f, nil, 0, 1,...} and Γ2 = {:}. The operator @ (left associating infix) is used for function application. Function map as usual applies a functional argument to a list, so map @ g @ [x1,..., xn] evaluates to [g(x1), . . . , g(xn)]. Function f below thus transforms [x1,..., xn] into:

[(x1 : x1), . . . , (xn : xn)]:[(5 : x1), . . . , (5 : xn)]. Rules: f @ X → (map @ double @ X) : (map @ (cons @ 5) @ X)

double @ X → X : X cons @ X @ Y → X : Y

map @ U @ nil → nil map @ U @ (X : Xs) → (U @ X) : (map @ U @ Xs).

If we write t1t2 ... tn as for (. . . ((t1 @ t2) @ t3) . . .) then the equations above can be written without @, giving a more familiar appearance. The effect of higher-order functions is thus achieved without any change to the syntax or semantics of term rewriting systems. Further, it may be argued that such a representation is entirely relevant for the purpose of analyzing program behavior, since closures are the standard tool for implementing higher-order functions on the computer.

2.3. Tree grammars

The technique to be presented constructs from a given program a tree grammar that safely approximates the collecting semantics of the next section and so can be used reliably to answer questions about program behavior. In contrast to programs intended for computational use, a tree grammar is nondeterministic and has no variables.

Definition 2.4. A tree grammar is a program, G, in which all defined operators are 0-ary. A set A ⊆ TΓ of ground constructor terms is regular iff there is a tree grammar G and a defined operator S such that

A = { ground constructor term g | S ⇒∗ g }. In terms of traditional formal language theory, the 0-ary defined operators (i.e., constants) correspond exactly to nonterminal symbols, and will be so called in the following development. Further, constructors correspond to terminal symbols. Tree grammars are chosen partly because they are able directly to generate the terms that are of computational inter- est, and partly because of their similarity to the well-understood regular string grammars, with all their desirable math- ematical properties. Numerous theorems, constructions, decision procedures and alternate characterizations of regular sets of terms are described in the literature, e.g., by Brainerd [4], Gecseg and Steinby [10], and by Thatcher [31].

Notation. We will follow the grammatical convention of writing N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 127

A → term1 | term2 | ... | termn to stand for the several rules A → term1,..., A → termn.

2.4. A collecting semantics

A classical approach to program analysis begins with an exact collecting semantics [7] that (for flow charts) accumulates with each “program point” the set of program states attainable when control reaches that point during all possible program executions on given initial input data. The next step is to find a finite over-approximation to the collecting semantics. For programs in the form of term rewriting systems, the natural program points are the rewrite rules. By analogy with flow charts, a natural candidate for a “program state” at a rule pi → ti is a substitution θ : V → TΣ , namely the variable bindings got by matching with pi at some time during a computation. On the other hand we also need to account for program results, namely terms derived from ti during computations on given input. Our solution is to associate with rule i a set of pairs (θ, g), where θ is a substitution resulting from a successful match with pi , and g is a term derivable from θti . (This is analogous to the “minimal function graphs” of [21], extended to allow nondeterministic programs.)

Definition 2.5. Let Input ⊆ TΣ be a set of input terms (typically calls), and let P = {pi → ti }i∈{1,...,n}. The collecting semantics is the tuple

Colsem(Input) = (Z0, Z1,..., Zn) where for i = 0, 1,..., n, set Zi ⊆ Subst × TΣ is given by: ∗ Z0 = {(θid, g) | ∃g0 ∈ Input (g0 ⇒ g)} ∗ ∗ Zi = {(θ, g) | ∃g0 ∈ Input ∃g1[] (g0 ⇒ g1[θpi ] and θti ⇒ g)}.

Note: terms g in (θ, g) ∈ Zi include all terms derivable from θti , and not just those in normal form.

Condition (1) in the following lemma corresponds to Z0, and conditions (2) and (3) are closure properties relating the various Zk. Condition (2) says that any rewriteable Z j context may be rewritten and placed in Zi . Condition (3) says that any such Z j context seen in Zi must have come from Z j . Lemma 2.6. The collecting semantics is the smallest tuple of sets of pairs (ordered component-wise) that satisfies the following conditions.

(1) If g0 ∈ Input then (θid, g0) ∈ Z0. 0 0 0 (2) If 1 ≤ i ≤ n and 0 ≤ j ≤ n satisfy (θ, g[θ pi ]) ∈ Z j , then (θ , θ ti ) ∈ Zi . 0 0 00 00 (3) If 1 ≤ i ≤ n and 0 ≤ j ≤ n satisfy (θ, g[θ pi ]) ∈ Z j , and (θ , g ) ∈ Zi , then (θ, g[g ]) ∈ Z j . Proof of the lemma is straightforward.

2.5. Approximating the collecting semantics by regular tree grammars

The flow analysis methods of Sections 3 and 4 will build regular tree grammars G that approximate the collecting semantics. Grammar G will have a nonterminal X for each variable X, and some “result nonterminals” Ri . The tree grammars will safely approximate P’s behavior in the following sense: for any (θ, g) ∈ Zi , grammar G has derivations ∗ 3 ∗ X ⇒ θ X and Ri ⇒ g . Informally, X generates all possible ground terms bound to X in computations on the given input, and Ri generates all possible results derivable from pi during the computations. In case the same variable X is used in several rules, nonterminal X generates the union of all the values bound to variable X.

3 Note the two different meanings of X in the rule X ⇒∗ θ X: On the left side, X is the new nonterminal; on the right side, X is the term rewriting system variable to which substitution θ can be applied. 128 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

3. Program flow analysis by tree grammars: Earlier results

In 1968 Reynolds developed a method for analysis of applicative LISP programs, using a system of set equations derived from the program when applied to a prespecified set of input data [27]. The variables in the least fixpoint solution of these equations represent the sets of values that can be assumed by program variables or returned by program functions (note the analogy with our collecting semantics). The method works by first deriving an equation system containing variables, sets of atoms and the operations: set union, car∗, cdr∗ and cons∗ (the extensions of LISP’s car, cdr, cons to sets of lists, e.g., car∗(X) = {car(x) | x ∈ X}) and a conditional expressed in terms of sets. This equation system is then simplified and transformed into an equivalent system without car∗, cdr∗, etc. by applying various set identities. The final result is a definition of a collection of regular sets of LISP lists. In [20] the analogous problem for flow charts with LISP-like data is solved by a method similar to Reynolds’, using extended tree grammars instead of set equations. The effect of Reynolds’ operations car∗, cdr∗, etc. is achieved by use of production rules with selectors; for example an extended production rule may take the form: A → B.hd where a rule A → B.hd is interpreted: if B derives a term value1 : value2, then A derives value1, and similarly for tl (hd = head, tl = tail). It is then shown that extended rules involving selectors may be eliminated, yielding an ordinary tree grammar without selectors.

3.1. Inside-out program analysis by tree grammars

In the following we briefly illustrate a hybrid of the methods of [20,27], adapted to programs in the form of term rewriting systems. While sound for call-by-value (inside-out evaluation), the approach will be seen unsound for outside-in evaluation. Further, it is inadequate for higher-order programs. Some readers may wish to skip ahead to Section 4, where a new and more powerful technique will be presented for deriving tree grammars. Let P = {pi → ti }i∈{1,2,...,n} be a fixed finite program over signature Σ = ∆ + Γ and let Input ⊆ TΣ . The idea is to construct from program P a tree grammar G that approximates the collecting semantics on given initial input data. Tree grammar G has signature Σ 0 = ∆0 + Γ . Constructors in Γ are used the same way in G and P, and every left side variable and every defined operator in P is a nonterminal of G (0-ary!). Special nonterminals Ω (perhaps together with others), will generate possible input terms, and R0 will generate possible program results. (R1,..., Rn are not used here, but will be seen in Section 4.) The defined operator (nonterminal) set ∆0 thus satisfies: 0 ∆ ⊇ {R0, Ω} ∪ ∆ ∪ {X | X ∈ V occurs in some pi }.

In order to avoid confusion between P’s and G’s rewrite rules we will write the ith rule of the former as: pi → ti and P rules of the latter as: A → t. The derivation relation ⇒∗ for G and P are respectively written: H⇒∗ and H⇒∗ . G P G An example from [27]. Consider the subset program given before:

ss(nil) → [nil] P ss(U : V ) → aux(U, ss(V ), ss(V )) P aux(W, nil, Z) → Z P aux(W, X : Y, Z) → (W : X) : aux(W, Y, Z). P We begin by building an extended tree grammar G with nonterminal symbols: ss, aux, and all program variables, and rules generating (from R0) initial calls to ss. G is then transformed to remove rules with selectors. The grammar “safely” approximates inside-out derivations in the following sense. (1) Nonterminal ss generates a superset of all result values (ground constructor terms) computed by calls to the defined function ss on the given input; and similarly for nonterminal aux and function aux. (2) W generates a superset of all the values assumed by aux’s first argument on the given data. Similarly Z approximates aux’s second argument, U approximates the set of heads of values assumed by ss’s argument, and similarly for V , X, Y . N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 129

Step 1: Build a tree grammar (extended with selectors) The initial call is ss(Ω) where Ω generates all possible lists of (LISP) atoms:

R0 → ss G Ω → nil | Atom : Ω G Atom → 0 | 1 | .... G Rules generating sets of results are obtained by copying P’s rules but without the defined functions’ arguments: ss → [nil] | aux G aux → Z | (W : X) : aux. G Rules generating sets of argument values are obtained from the calls ss(V ) by applying selectors .hd and .tl as needed for argument subterms U, V : U → V.hd G V → V.tl. G The same is done for all calls to ss and aux, including the initial call ss(Ω): U → Ω.hd G V → Ω.tl G W → U | W G X → ss.hd | Y.hd G Y → ss.tl | Y.tl G Z → ss | Z. G Step 2: Eliminate rules containing selectors Consider a selector rule such as U → Ω.hd. This can be replaced by the rule U → Atom. The reasoning (described G G in [20]) is that the grammar has a production Ω → Atom : Ω that generates a value pair Atom : Ω. This implies that G Ω.hd can generate the first component of the pair, so production U → Ω.hd can be “short circuited”, replacing it by G the selector-free rule U → Atom. G The same reasoning can be used to eliminate selectors for V , X, Y , yielding a classical selector-free grammar. Step 3: Simplify the remaining rules Note that ss and aux generate the same terms, since each yields all terms generated by the other. Further, [nil] = nil : nil so the rules can be simplified a bit further, to yield a reasonably informative description of the program’s argument and result values:

R0 → ss G ss → [nil] | (Atom : X) : ss G X → nil | Atom : X. G 3.2. Limitations of the method

While sound for call-by-value (inside-out) evaluation, the method above gives unsafe approximations for the outside-in behavior of some programs. Consider the example with infinite sequences given earlier. g(N) → first(N, sequence(nil)) P first(nil, Xs) → nil P first((1 : M), (Y : Ys)) → Y : first(M, Ys) P sequence(Z) → Z : sequence(1 : Z). P 130 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

Assuming initial call g(Ω), the resulting tree grammar contains result rules: Start → g G g → first G first → nil | Y : first G sequence → Z : sequence G and argument rules: Ω → nil | Atom : Ω G N → Ω G Xs → sequence | Ys G M → N.tl | M.tl G Y → sequence.hd | Ys.hd G Ys → sequence.tl | Ys.tl G Z → nil | 1 : Z. G The only rule rewriting sequence is sequence → Z : sequence, so clearly sequence and Xs, Y, Ys all generate the G empty set. Thus the tree grammar does not account for the program’s possible computations. The underlying reason is that the method just given describes each program part by the set of ground constructor terms it can evaluate to, and sequence(nil) yields no such terms at all as values. The method above also seems hard to extend to higher-order functions (a “potential extension” in [27]).

4. Program flow analysis by tree grammars: A new algorithm

In this section an alternate tree grammar construction is given that yields safe approximations for all programs, even with outside-in semantics. Correctness and termination will be proven in Section 5. The main difference is a more detailed tracking of the results of applying the rewrite rules. Further, the tricky “selector elimination” phase of Section 4 will be circumvented by better exploiting the pattern-matching nature of rewrite rules. The idea is to approximate the collecting semantics by means of a tree grammar G constructed from program P = {pi → ti }i∈I , where I = {1, 2,..., n}. As in Section 3, we regard P’s program variables as 0-ary defined P operators (nonterminals) in G. The set ∆0 of nonterminals of G will satisfy 0 ∆ ⊇ {X | X ∈ Vars(pi ) and 1 ≤ i ≤ n} ∪ {R0, R1,..., Rn}.

Following the structure of Lemma 2.6, Ri (read “the result of rule pi ”) is intended to generate a superset of all the terms derived from left side pi in computations on the given input. As before, R0 generates those terms derived from initial calls to P (perhaps with the aid of other nonterminals in ∆0). If P has signature Σ then G has signature Σ 0 = Σ + ∆0, where defined operators in P are regarded as constructors 4 in G; this is needed so that G can generate the “result sets” Ri of the collecting semantics.

4.1. Safety, and a preview of the algorithm

G’s grammar rules may be divided into three groups:

Input rules of form R0 → ... and auxiliary rules (such as Ω → ...) generate a set of initial calls to P. G G Variable rules of form X → ... generate values bound to P-variable X on the given inputs. A value may now be G any ground term, not just ground constructor terms as for the inside-out derivations of Section 3.

4 Actually, in the remaining part of this paper, the distinction between constructors and defined operators is not needed. For the construction to work, a program could be any set P ⊆ TΣ (V ) × TΣ (V ) of rules p → t such that Vars(p) ⊇ Vars(t). N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 131

Result rules of form Ri → ... yield possible values derived from pi during P’s computations on the given initial G calls. Safety with respect to a given set of input terms is defined using the collecting semantics Colsem(Input) = (Z0,..., Zn) as follows.

Definition 4.1. G is safe with respect to P and Input ⊆ TΣ if for all i = 0,..., n

∗ (1) Ri ⇒ g for all (θ, g) ∈ Zi G∗ (2) X ⇒ θ X for all (θ, g) ∈ Zi . G

As a consequence R0 generates all program results, including intermediate terms. Condition (2) implies a rather gross approximation, since G is an “independent attribute” method: it neglects all coordination among the variable bindings, and merges all bindings of the same variable. (Nonetheless it seems to work well in practice.) G will be constructed iteratively. We begin with G = G0 containing only initial rules R0 → ... etc. We then G progressively add rules to G until it becomes safe with respect to P. Concisely and formally, the construction is defined by the fixpoint process in Section 4.3. For intuition’s sake a more algorithmic presentation is given first, with remarks. The reader may wish to review Lemma 2.6, since the algorithm closely models that construction, representing the sets Ri by grammar rules as in Definition 4.1 of safety.

4.2. Informal version: Construction of G from P

Start with G = G0; while G can be enlarged do begin 0 0 choose a G rule A → g[g ] where g is a call, and a P rule pi → ti ; G P loop 0 unify g against pi ; 0 if there is an operator clash between g and pi then fail and escape the loop; if unification succeeds then escape the loop; 0 if some nonterminal N in call g matches an operator in left side pi then rewrite that N in g0 by some G-rule N → g00 and repeat the loop G end loop;

if unification succeeded with substitution θ : Vars(pi ) → TΣ 0 then add to G the new rules: A → g[Ri ] G Ri → ti G X → θ(X) for all X ∈ Vars(ti ) G end Remarks (1) The first idea is to determine for each right side P-call g0 in a G-rule A → g[g0] the set of P’s rules with which g0 G may be matched.5 The results of all possible calls are then used to construct G rules that model the corresponding bindings of P’s variables to subterms of g0. 0 0 (2) Call applicability is determined by unifying P’s left sides with g . Note that g is a term in TΣ 0 , so that any X occurring in g0 is a 0-ary defined operator. For unification purposes, g0 is thus variable-free, unification is a one-sided “matching”, and the outcome of unification will be either “fail” or a substitution θ : Vars(pi ) → TΣ 0 .

5 and not all rules for the corresponding operator, as assumed in the method of Section 3. 132 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

(3) If unification succeeds, the bindings given by θ will be added to G as new rewrite rules for the variables6 in the term rewriting rule pi → ti . Further, the new rules: A → g[Ri ] and Ri → ti will be added to G to update the P G G result sets it generates (note the close analogy with Lemma 2.6). 0 (4) Suppose g cannot be matched with a P rule’s left side due to the presence of an X or Ri . Then the X or Ri will be rewritten by applying G’s rules, and the result will be checked for unifiability as above. This is appropriate by the intended interpretation of X and Ri (according to the definition of safety). (5) Rewriting according to G is done just enough to allow unification. (This turns out to be essential to ensure termination, see Section 5.) The resulting rules are added to G (if new), and the process is repeated until G stabilizes.

4.3. Formal version: Construction of G from P

G = the smallest set of rules such that G ⊇ G0 and 0 0 0 G ⊇ ExtP (A → g[g ]) for every rule A → g[g ] ∈ G with g a call G G where

G0 = {R0 → ... (input rules of Section 4.1)} G and ( 0 n ) A → g[Ri ], Ri → ti , pi → ti , g H⇒ θpi , and → 0 G G P G ExtP (A g[g ]) = 0 0 n−1 0 . G X → θ X for all X ∈ Vars(ti ) ¬∃θ (g H⇒ θ pi ⇒ θpi ) G G G Remarks (1) As in the informal algorithm, rules are added to G for each call A → g[g0] already in G, and this process is G iterated until G stabilizes. 0 0 (2) The definition of ExtP (A → g[g ]) formalizes the idea that g is rewritten until a definite match with a left side of a G 0 n 0 0 n−1 0 P rule is obtained (the condition g H⇒ θpi ). The condition ¬∃θ (g H⇒ θ pi ⇒ θpi ) expresses minimality: G G G that rewriting is done only just enough for a definite match to occur.

4.4. Example: Lazy evaluation

The given program is:

g(N) → first(N, sequence(nil)) P first(nil, Xs) → nil P first((1 : M), (X : Xs)) → X : first(M, Xs) P sequence(Y ) → Y : sequence(1 : Y ). P The initial rules in G are:

R0 → g(Ω) Initial call G Ω → nil | Atom : Ω Initial data G Atom → 0 | 1 | .... G

Now apply ExtP to R0 → [g(Ω)] (where [...] indicates an empty context around ... and not a list). This yields (since G g(Ω) can be matched with p1 → t1): P

6 Since Vars(pi ) ⊇ Vars(ti ) it is actually sufficient to add rules corresponding to the variables in ti . N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 133

R0 → R1 G R1 → first(N, sequence(nil)) G N → Ω G

ExtP applied to R1 → first(N, [sequence(nil)]) adds rules: G

R1 → first(N, R4) G R4 → Y : sequence(1 : Y ) G Y → nil G and then ExtP applied to R4 → Y :[sequence(1 : Y )] adds: G

R4 → Y : R4 G Y → 1 : Y G

ExtP (R1 → [first(N, R4)]) now adds (after rewriting N to nil or 1 : Ω , and R4 to Y : R4 or Y : sequence(1 : Y )) the G following new rules:

R1 → R2 | R3 G R2 → nil G R3 → X : first(M, Xs) G M → Ω G X → Y G Xs → R4 | sequence(1 : Y ) G and finally ExtP (R3 → X :[first(M, Xs)]) adds rules: G

R3 → X : R2 | X : R3 G G has now stabilized. If we now restrict G to those rules that yield terms containing the only ground constructor terms of P, the grammar can be simplified considerably:

R0 → nil | R3 G R3 → Y : nil | Y : R3 G Y → nil | 1 : Y. G 5. Termination and correctness

Let ∂P denote our extension operation: For a term rewriting system P and a tree grammar G, define S 0 ∂P (G) = ExtP (A → g[g ]). A→g[g0]∈G

For a tree grammar G0 (generating the input) we construct the new tree grammar G as the least set H of rules satisfying

H ⊇ G0 ∪ ∂P (H).

5.1. Termination

Formally, this construction amounts to a denumerably infinite process: For k = 0, 1,... define Gk+1 = S∞ Gk ∪ ∂P (Gk). Then G0 ⊆ G1 ⊆ ..., and G = k=0 Gk where the limit satisfies the defining formula with equality: G = G0 ∪ ∂P (G). 134 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

Fortunately, this process is actually finite:

Theorem 5.1. If G0 and P are finite, then so is G. Proof. Let a variant of a term be the result of replacing zero or more of its subterms by nonterminals. We show that every G rule must be a variant of a subterm of the right side of a G0 rule or a P rule. The ExtP -construction forms new rules of three kinds: A → g[Ri ], Ri → ti and X → θ X. Obviously, the right G G G hand side g[Ri ] of the first kind of rules is a variant (formed by replacing exactly one subterm by a nonterminal) of the right hand side of the generating rule A → g[g0]. G The second kind of rule copies its right hand side from one of the rules of the original term rewriting system. 0 ∗ For the third kind of rule, the minimality requirement on the derivation g H⇒ θpi ensures that the new right hand G side θ X will actually be a subterm of the right hand side of an existing G-rule. This can be seen as follows: 0 n Assume g H⇒ θpi and X ∈ Vars(ti ) ⊆ Vars(pi ). This means that θpi may be written g1[θ X] for some context G g1[]. Since θ X is part of a term generated by A, if θ X is not itself contained in the right hand side of some rule A0 → w G then it must contain such a right hand side. In other words, θ X would have the form g2[w] for a nonempty context 0 n 0 n−1 0 g2[], and the derivation g H⇒ θpi could be seen as g H⇒ g1[g2[A ]] ⇒ g1[g2[w]] = θpi . Since X only occurs G G G 0 0 0 n−1 0 0 once in pi we could define θ as the modification of θ mapping X to g2[A ] so that g H⇒ g1[g2[A ]] = θ pi , thus G violating the last requirement in the definition of ExtP . Note that the set of variants of subterms is closed under the formation of variants and the extraction of subterms. During the construction of G, the right hand sides of the added rules will therefore always be variants of subterms of right hand sides of G0 and P rules. If G0 and P are finite, the set of nonterminals and of right hand sides are also finite, so that G becomes finite.

5.2. Safety

∗ ∗ Lemma 5.2. For a tree grammar H subjected to the ∂P -construction, if A H⇒ g[θpi ], then A H⇒ g[Ri ], H H∪∂ (H) ∗ P Ri → ti , and X H⇒ θ X for all X ∈ Vars(ti ). ∂P (H) H∪∂P (H) ∗ 0 0 00 0 ∗ 0 00 ∗ Proof. The derivation A H⇒ g[θpi ] must use a rule A → h [h ] where h [] H⇒ g [], h H⇒ θpi and H H H H g[ ] = h[g0[ ]]. In other words, the derivation may be reordered as ∗ 0 0 00 ∗ 0 00 00 ∗ A H⇒ h[A ] H⇒ h[h [h ]] H⇒ h[g [h ]] = g[h ] H⇒ g[θpi ]. H H H H 00 ∗ 0 00 n 0 ∗ For the derivation h H⇒ θpi there must be some n and θ such that h H⇒ θ pi H⇒ θpi , but H H H 00 00 n−1 00 0 0 0 00 ¬∃θ (h H⇒ θ pi H⇒ θ pi ). ExtP (A → h [h ]) will therefore add H H H 0 0 A → h [Ri ] , ∂P (H) Ri → ti , and ∂ (H) P 0 X → θ X for all X ∈ Vars(ti ) ∂P (H) and we obtain a derivation ∗ 0 0 ∗ 0 A H⇒ h[A ] H⇒ h[h [Ri ]] H⇒ h[g [Ri ]] = g[Ri ] H ∂P (H) H as desired. 0 ∗ 0 ∗ The fact that θ pi H⇒ θpi means that there must exist derivations θ X H⇒ θ X for all X ∈ Vars(ti ), so that H H 0 ∗ X → θ X H⇒ θ X for all X ∈ Vars(ti ) ⊆ Vars(pi ). ∂P (H) H ∗ ∗ ∗ ∗ Lemma 5.3. If A H⇒ w and w H⇒ g[θpi ], then A H⇒ g[Ri ],Ri → ti , and X H⇒ θ X for all X ∈ Vars(ti ). G0 P G G G N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136 135

n Proof. Let w H⇒ g[θpi ]; the proof is by induction on n. Recall that G = G0 ∪ ∂P (G). P For n = 0 employ Lemma 5.2. ∗ n−1 0 0 0 0 For n > 0, assume the claim for n − 1. By induction, from A H⇒ w H⇒ g [θ p j ] ⇒ g [θ t j ] = g[θpi ] we G0 P P ∗ 0 ∗ 0 therefore get A H⇒ g [ j ], R j → t j and X H⇒ θ X for all X ∈ Vars(t j ) so that the derivation may be continued G G G 0 0 ∗ 0 0 g [R j ] H⇒ g [t j ] H⇒ g [θ t j ] = g[θpi ]. Applying Lemma 5.2 now proves the lemma. G G Theorem 5.4. If A H⇒∗ w and w H⇒∗ g, then A H⇒∗ g. G0 P G n Proof. Assume w H⇒ g. For n = 0 the claim is obvious, using G ⊇ G0. P 0 0 0 Otherwise the last step in the derivation is g [θpi ] ⇒ g [θti ] for some context g [], index i and substitution θ. P ∗ 0 ∗ The derivation A H⇒ g [Ri ] obtained from Lemma 5.3, which also ensures Ri → ti and X H⇒ θ X for all G G G 0 0 ∗ 0 X ∈ Vars(ti ), may therefore be continued g [Ri ] H⇒ g [ti ] H⇒ g [θti ] = g, proving the theorem. G G

6. Conclusions and directions for future development

An algorithm for the flow analysis of applicative programs, using term rewriting systems as a program formalism, has been presented, and a proof was given that the method terminates with a finite grammar, and that it yields safe approximations to program behavior. It has been seen that the method is more powerful than earlier methods in that it gives safe approximations to lazy programs (outside-in evaluation-order) and to programs using higher-order functions. Several extensions are natural and desirable for practical applications (such as optimizing compilers for applicative languages or the MIX system), including the following: (1) Outside-in and inside-out rewriting: Languages as implemented mostly use either a strictly inside-out rewriting order (call-by-value, as in LISP), or strictly outside-in (lazy evaluation), while G as constructed above models all possible reduction sequences. This may be seen in the algorithm, which adds new variable bindings for all possible calls: A → g[g0]. An outside-in rewriting order is easily modeled by changing the algorithm G 0 0 to call ExtP (A → g[g ]) only when [g ] is outermost, and the analogous trick models inside-out rewriting: G 0 0 ExtP (A → g[g ]) is called just in case [g ] is innermost. The resulting grammars would then give more precise G safe descriptions, with smaller sets of computational configurations. (2) Atomic values: Grammar G in essence traces the control flow of the program being analyzed. It assumes that all data values are terms, and does not account for programs manipulating atomic values such as numbers or booleans. On the other hand, traditional flow analysis as used in optimizing compilers (e.g., constant propagation, see [12]) is mainly concerned with atomic values, and makes rather conservative assumptions about flow of control. This type of flow analysis uses a lattice or cpo of abstract values to describe sets of atomic values; for example pos, neg and num could describe nonempty sets containing only positive values, only negative values, and arbitrary values, respectively. It seems clear that the analysis could be extended to handle atomic values as follows. Suppose that certain arguments of the defined operators or constructors are designated as being atomic, for example that the first argument to cons(x, y) will be numeric. Then abstraction may be performed when adding rules to G, so that if G already contains a rule: A → cons(pos, B), and a new rule: A → cons(neg, B) is to be added by ExtP , then the G G original rule is replaced by the least common abstraction of both: A → cons(num, B). G (3) More general result symbols: Turchin’s “basic configurations” resemble a more general version of the patterns used in Section 4.3, and are used to perform program transformation in [33]. The MIX system uses an essentially similar idea to classify function arguments for the purpose of partial evaluation. It would be desirable to put these various ideas in a more general framework. (4) Strictness analysis: It would be interesting to see whether our algorithm could be modified to yield a grammar describing the way the program accesses the substructures of its data, and determine those substructures which are required to run the program. For example, “append” requires all tails of its first argument but no heads, and does not examine its second argument. This topic has been addressed in [15], [33] and [35] for the case of first-order programs. 136 N.D. Jones, N. Andersen / Theoretical Computer Science 375 (2007) 120–136

Acknowledgements

Many people read early drafts of this paper, and their comments helped considerably in clarifying its ideas and tightening it up: Olivier Danvy, Carsten Kehler-Holst, Thomas Johnsson,Torben Mogensen, Alberto Pettorossi, Jakob Grue Simonsen, Harald Søndergaard and Valentin Turchin. Special thanks are due to Klaus Grue, Phil Wadler and the editors of [2] where an earlier version of this paper appeared. We would also like to thank the anonymous referees of this Festschrift for valuable comments.

References

[1] A. Abel, Termination checking with types, in: Fixed Points in Computer Science FICS’03, RAIRO — Theoretical Informatics and Applications 38 (4) (2004) 277–319 (special issue). [2] S. Abramsky, C. Hankin, Abstract Interpretation of Declarative Languages, Ellis Horwood, 1987. [3] A. Aho, Currents in the Theory of Computing, Prentice-Hall, 1973. [4] W.S. Brainerd, Tree generating regular systems, Information and Control 13 (1969) 217–231. [5] G.L. Burn, C.L. Hankin, S. Abramsky, The theory of strictness analysis for higher-order functions, in: [9], 1986, pp. 42–62. [6] P. Cousot, Semantic foundations of program analysis, in: [25], 1981, pp. 303–342. [7] P. Cousot, R. Cousot, Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixpoints, in: POPL ’77: 4th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1977, pp. 238–252. [8] E. Dijkstra, A Discipline of Programming, Prentice-Hall, 1976. [9] H. Ganzinger, N. Jones (Eds.), Programs as Data Objects, in: Lecture Notes in Computer Science, vol. 217, Springer-Verlag, 1986. [10] F. Gecseg,´ M. Steinby, Tree Automata, Akademiai Kiado, Budapest, 1984. [11] J. Giesl, R. Thiemann, P. Schneider-Kamp, Proving and disproving termination of higher-order functions, Technical report, RWTH Aachen, 2005. [12] M. Hecht, Flow Analysis of Computer Programs, North-Holland, 1977. [13] N. Heintze, Set based program analysis, Ph.D. Thesis, Carnegie-Mellon Univ., Pittsburgh, PA, 1992. [14] G. Huet, J.J. Levy,´ Call by need computations in nonambiguous linear term rewriting systems, Technical report, INRIA, France, 1979. [15] J. Hughes, Strictness detection in non-flat domains, in: [9], 1986, pp. 112–135. [16] S. Jagannathan, S. Weeks, A unified treatment of flow analysis in higher-order languages, in: POPL ’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1995, pp. 393–407. [17] T. Johnsson, : Transforming programs to recursive equations, in: Proceedings IFIP Symposium on Languages and Computer Architecture, in: Lecture Notes in Computer Science, vol. 201, 1985. [18] N.D. Jones, Flow analysis of lambda expressions, in: Proceedings of ICALP 1981, in: Lecture Notes in Computer Science, vol. 115, 1981. [19] N.D. Jones, Flow analysis of lazy higher-order functional programs, in: [2], 1987, pp. 103–122. [20] N.D. Jones, S.S. Muchnick, Flow analysis and optimisation of LISP-like structures, in: POPL ’79: 6th ACM SIGPLAN-SIGACT, Symposium on Principles of Programming Languages, 1979, pp. 244–256. [21] N.D. Jones, A. Mycroft, Data flow analysis of applicative programs using minimal function graphs, in: POPL ’86: 13th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, 1986, pp. 296–306. [22] N.D. Jones, M. Rosendahl, Higher-order minimal function graphs, Journal of Functional and Logic Programming 2 (1997). [23] N.D. Jones, P. Sestoft, H. Sondergaard, An experiment in partial evaluation: The generation of a compiler generator, in: J.-P. Jouannoud (Ed.), Rewriting Techniques and Applications, in: Lecture Notes in Computer Science, vol. 202, 1985. [24] S.P. Jones, Haskell 98 Language and Libraries, Cambridge University Press, 2003. [25] S.S. Muchnick, N.D. Jones (Eds.), Program Flow Analysis: Theory and Applications, Prentice-Hall, 1981. [26] A. Mycroft, N.D. Jones, A relational framework for abstract interpretation, in: [9], 1986, pp. 156–171. [27] J. Reynolds, Automatic computation of data set definitions, Information Processing 68 (1969) 456–461. [28] J. Reynolds, Definitional interpreters for higher-order programming languages, in: Proceedings of the ACM Annual Conference, ACM Press, 1972, pp. 717–740. [29] O. Shivers, Control-flow analysis of higher-order languages, Ph.D. Thesis, Carnegie-Mellon Univ., Pittsburgh, PA, USA, 1991. [30] D. Stefanescu, Y. Zhou, An equational framework for the flow analysis of higher order functional programs, in: ACM Symposium on Lisp and Functional Programming, 1994, pp. 318–327. [31] J. Thatcher, Tree automata: An informal survey, in: [3], 1973. [32] Y. Toyama, Termination of S-expression rewriting systems: Lexicographic path ordering for higher-order terms, in: Proceedings of the 15th International Conference on Rewriting Techniques and Applications RTA 2004, in: Lecture Notes in Computer Science, vol. 3091, 2004, pp. 40–54. [33] V. Turchin, The language REFAL, the theory of compilation, and metasystem analysis, Technical report, Courant Institute Report, New York, 1980. [34] P. Wadler, An introduction to Orwell, Technical report, Programming Research Group, Oxford University, 1985. [35] P. Wadler, Strictness analysis on non-flat domains by abstract interpretation over finite domains, 1987, pp. 266–275 (Chapter 12). [36] H. Xi, Dependent types in practical programming, Ph.D. Thesis, Carnegie-Mellon Univ., Pittsburgh, PA, USA, 1998.