Control Flow Analysis in Scheme

Control Flow Analysis in Scheme Olin Shivers Carnegie Mellon University [email protected] Abstract Fortran). Representative texts describing these techniques are [Dragon], and in more detail, [Hecht]. Flow analysis is perhaps Traditional ¯ow analysis techniques, such as the ones typically the chief tool in the optimising compiler writer's bag of tricks; employed by optimising Fortran compilers, do not work for an incomplete list of the problems that can be addressed with Scheme-like languages. This paper presents a ¯ow analysis ¯ow analysis includes global constant subexpression elimina- technique Ð control ¯ow analysis Ð which is applicable to tion, loop invariant detection, redundant assignment detection, Scheme-like languages. As a demonstration application, the dead code elimination, constant propagation, range analysis, information gathered by control ¯ow analysis is used to per- code hoisting, induction variable elimination, copy propaga- form a traditional ¯ow analysis problem, induction variable tion, live variable analysis, loop unrolling, and loop jamming. elimination. Extensions and limitations are discussed. However, these traditional ¯ow analysis techniques have The techniques presented in this paper are backed up by never successfully been applied to the Lisp family of computer working code. They are applicable not only to Scheme, but languages. This is a curious omission. The Lisp community also to related languages, such as Common Lisp and ML. has had suf®cient time to consider the problem. Flow analysis dates back at least to 1960, ([Dragon], pp. 516), and Lisp is 1 The Task one of the oldest computer programming languages currently in use, rivalled only by Fortran and COBOL. Flow analysis is a traditional optimising compiler technique Indeed, the Lisp community has long been concerned with for determining useful information about a program at compile the execution speed of their programs. Typical Lisp pro- time. Flow analysis determines path invariant facts about points grams, such as large AI systems, are both interactive and cycle- in a program. A ¯ow analysis problem is a question of the form: intensive. AI researchers often ®nd their research efforts frus- ªWhat is true at a given point p in my program, in- trated by the necessity of waiting several hours for their enor- dependent of the execution path taken to p from the mous Lisp-based production system or back-propagation net- start of the program?º work to produce a single run. Since Lisp users are willing to Example domains of interest might be the following: pay premium prices for special computer architectures specially designed to execute Lisp rapidly, we can safely assume they are Range analysis: What is the range of values that a given even more willing to consider compiler techniques that apply reference to an integer variable is constrained to lie within? generally to target machines of any nature. Range analysis can be used, for instance, to do array bounds checking at compile time. Finally, the problems addressed by ¯ow analysis are relevant to Lisp. None of the ¯ow analysis problems listed above are Loop invariant detection: Do all possible prior assign- restricted to arithmetic computation; they apply just as well to ments to a given variable reference lie outside its contain- symbolic processing. Furthermore, Lisp opens up new domains ing loop? of interest. For example, Lisp permits weak typing and run time Over the last thirty years, standard techniques have been de- type checking. Type information is not statically apparent, as veloped to answer these questions for the standard imperative, it is in the Algol family of languages. Type information is Algol-like languages (e.g., Pascal, C, Ada, Bliss, and chie¯y nonetheless extremely important to the ef®cient execution of Lisp programs, both to remove run time safety checks of function arguments, and to open code functions that operate on a variety of argument types. Thus ¯ow analysis offers a tempt- ing opportunity to perform type inference from the occurence To appear at the ACM SIGPLAN '88 Conference on Program- of calls to type predicates in Lisp programs. ming Language Design and Implementation, Atlanta, Georgia, June 22-24, 1988. So it is clear that ¯ow analysis has much potential with re- spect to compiling Lisp programs. Unfortunately, this potential has not been realised because Lisp is suf®ciently different from the Algol family of languages that the traditional techniques Page 1 2 Scheme Flow Analysis E developed for them are not applicable. START i<=30? E STOP Dialects of Lisp can be contrasted with the Algol family in c c s:=a[i] E the following ways: i:=0 E a[i]:=cos(s+4) s<0? Binding versus assignment: Both classes of language have the same two mechanisms a[i]:=(s+4)Ã2c for associating values with variables: parameter binding and variable assignment. However, there are differences b[i]:=s+4c ' in frequency of useage. Algol-family languages tend to en- i:=i+1 courage the use of assignment statements; Lisp languages tend to encourage binding. Figure 1: Control ¯ow graph Functions as ®rst class citizens: Functions in modern Lisps are data that can be passed as arguments to procedures, returned as values from function in a temporary, we can eliminate the redundant addition in calls, stored into arrays, etc.. b[i]:=s+4. This information is arrived at through consider- ation of the paths through the control ¯ow graph. Since traditional ¯ow analysis techniques tend to concen- trate on tracking assignment statements, it's clear that the Lisp The problem with Lisp is that there is no static control ¯ow emphasis on variable binding changes the complexion of the graph at compile time. Consider the following fragment of problem. Generally, however, variable binding is a simpler, Scheme code: weaker operation than the extremely powerful operation of side (let ((f (foo 7 g k)) effecting assignments. Analysis of binding is a more tractable (h (aref a7 i j))) problem than analysis of assignments, because the semantics of (if (< i j) (h 30) (f h))) binding are simpler, and there are more invariants for program- Consider the control ¯ow of the if expression. Its graph is: analysis tools to invoke. In particular, the invariants of the f !-calculus are usually applicable to modern Lisps. Note Dif- i<j? E h ®culties with Bindingg On the other hand, the higher-order, ®rst-class nature of Lisp fc functions can hinder efforts even to derive basic ¯ow analysis information from Lisp programs. I claim it is this aspect of After evaluating the conditional's predicate, control can transfer Lisp which is chie¯y responsible for the mysterious absence of either to the function that is the value of h, or to the function ¯ow analysis from Lisp compilers to date. A brief discussion that is the value of f. But what's the value of f? What's the of traditional ¯ow analytic techniques will show why this is so. value of h? Unhappily, they are computed at run time. If we knew all the functions that h and f could possibly be 2 The Problem bound to, independent of program execution, we could build a control ¯ow graph for the code fragment. So, if we wish Consider the following piece of Pascal code: a control ¯ow graph for a piece of Scheme code, we need to FOR i := 0 to 30 DO BEGIN answer the following question: for every function call in the program, what are the possible lambda expressions that call s := a[i]; could be a jump to? But this is a ¯ow analysis question! So IF s < 0 THEN a[i] := (s+4)^2 with regard to ¯ow analysis in Lisp, we are faced with the following unfortunate situation: ELSE a[i] := cos(s+4); In order to do ¯ow analysis, we need a control ¯ow graph. b[i] := s+4; END In order to determine control ¯ow graphs, we need to do ¯ow analysis. Flow analysis requires construction of a control ¯ow graph for the code fragment (®g. 1). Oops. Every vertex in the graph represents a basic block of code: a sequence of instructions such that the only branches into the 3 CPS: The Hedgehog's Representation block are branches to the beginning of the block, and the only branches from the block occur at the end of the block. The The fox knows many things, but the hedgehog knows edges in the graph represent possible transfers of control be- one great thing. tween basic blocks. Having constructed the control ¯ow graph, ± Archilocus we can use graph algorithms to determine path invariant facts The ®rst step towards ®nding a solution to this conundrum about the verteces. is to develop a representation for our Lisp programs suitably In this example, for instance, we can determine that on all adapted to the task at hand. In this section, we will develop an control paths from START to the dashed block (b[i]:=s+4; intermediate representation language, CPS Lisp, which is well i:=i+1), the expression s+4 is evaluated with no subse- suited for doing ¯ow analyis and representing Lisp programs. quent assignments to s. Hence, by caching the result of s+4 We will handle the full syntax of Lisp, but at one remove: Scheme Flow Analysis 3 ªstandardº Lisp is mapped into a much simpler, restricted subset The only expressions that can appear as arguments to a of Lisp, which has the effect of greatly simplifying the analysis. function call are constants, variables, and lambda expres- In Lisp, we must represent and deal with transfers of control sions. caused by function calls. This is most important in the Scheme Standard non-CPS Lisp can be easily transformed into an dialects [R3-Report], where lambda expressions occur with ex- equivalent CPS program, so this representation carries no loss treme frequency.

Control Flow Analysis in Scheme

Language and Compiler Support for Dynamic Code Generation by Massimiliano A

Strength Reduction of Induction Variables and Pointer Analysis – Induction Variable Elimination

CSE 401/M501 – Compilers

Hybrid Analysis of Memory References And

Optimization Compiler Passes Optimizations the Role of The

CS 4120 Introduction to Compilers Andrew Myers Cornell University

Iterative Dataflow Analysis

Linear Scan Register Allocation

Data Flow Analysis

Binary Translation to Improve Energy Efficiency Through Post-Pass

Data-Flow Analysis Schema

PA 4: IC Dataflow Analysis & Optimization