Iterative-Free Program Analysis

Mizuhito Ogawa†∗ Zhenjiang Hu‡∗ Isao Sasano† [email protected] [email protected] [email protected] †Japan Advanced Institute of Science and Technology, ‡The University of Tokyo, and ∗Japan Science and Technology Corporation, PRESTO

Abstract 1 Introduction Program analysis is the heart of modern . Most control Program analysis is the heart of modern compilers. Most control flow analyses are reduced to the problem of finding a fixed point flow analyses are reduced to the problem of finding a fixed point in a in a certain transition system, and such fixed point is commonly certain transition system. Ordinary method to compute a fixed point computed through an iterative procedure that repeats tracing until is an iterative procedure that repeats tracing until convergence. convergence. Our starting observation is that most programs (without spaghetti This paper proposes a new method to analyze programs through re- GOTO) have quite well-structured control flow graphs. This fact cursive graph traversals instead of iterative procedures, based on is formally characterized in terms of tree width of a graph [25]. the fact that most programs (without spaghetti GOTO) have well- Thorup showed that control flow graphs of GOTO-free C programs structured control flow graphs, graphs with bounded tree width. have tree width at most 6 [33], and recent empirical study shows Our main techniques are; an algebraic construction of a control that control flow graphs of most Java programs have tree width at flow graph, called SP Term, which enables control flow analysis most 3 (though in general it can be arbitrary large) [16]. to be defined in a natural recursive form, and the Optimization The- Once a graph has bounded tree width, we can construct a graph in orem, which enables us to compute optimal solution by dynamic an algebraic way [3, 4]. This suggests that finding a fixed point programming. would be computed by recursive traversals on the algebraic struc- We illustrate our method with two examples; dead code detection ture, and the optimal solution would be obtained with a dynamic and register allocation. Different from the traditional standard it- programming. erative solution, our dead code detection is described as a simple Unfortunately, the existing results are not sufficient for our purpose. combination of bottom-up and top-down traversals on SP Term. For instance, the algebraic construction of graphs with bounded tree Register allocation is more interesting, as it further requires opti- width treats only undirected graphs [3]. This problem can be eas- mality of the result. We show how the Optimization Theorem on ily coped with, but a more serious problem is that it has too many SP Terms works to find an optimal register allocation as a certain recursive constructors, k(k + 1)(k + 2)/6 for tree width k, which . makes it hard to write recursive definitions over it. This paper proposes a new algebraic construction SP Term of graphs Categories and Subject Descriptors with bounded tree width, and a new method to analyze programs D.1.1 [Applicative (Functional) Programming]: Functional Pro- through recursive graph traversals instead of iterative procedures, gramming; D.1.2 [Automatic Programming]: Program Genera- based on the fact that most programs (without spaghetti GOTO) tion; D.3.3 [Language Constructs and Features]: Programming have well-structured control flow graphs. with Graphs Our main theoretical result (Theorem 2) is that a (directed) graph G can be represented by an SP Term in SPk if and only if G has tree General Terms width at most k (and has at least k nodes). Note that SP Term con- struction reduces the number of recursive constructors to 2 (regard- Algorithms, Language less of the size of tree width k), at the cost of increase of k2 − k + 1 constants. These constants express either diedges from the i-th spe- Keywords cial node (called terminal) to the j-th, or a graph with no edges, and Program Analysis, Control Flow Graph, Register Allocation, Tree they can be treated in a uniform way. This makes writing recursive Width, SP Term, Dynamic Programming, Catamorphism. definitions on SP terms feasible. We illustrate our methodology with two examples: dead code detec- tion and register allocation. Different from the traditional standard iterative solution, our dead code detection is described as a simple combination of bottom-up and top-down traversals on an SP Term. Register allocation is more interesting, as it further requires opti- mality of the result. We solve it as an instance of maximum marking Permission to make digital or hard copies of all or part of this work for personal or problems [26, 27, 5]; mark the nodes of a control flow graph under classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation a certain condition such that the sum of weight of marked nodes is on the first page. To copy otherwise, to republish, to post on servers or to redistribute maximum (or, minimum). We make use of Optimization Theorem to lists, requires prior specific permission and/or a fee. from our previous work [26, 27], and show how it works to find an ICFP’03, August 25–29, 2003, Uppsala, Sweden. optimal register allocation as a certain dynamic programming on Copyright 2003 ACM 1-58113-756-7/03/0008 ...$5.00 1 1 1 SP Terms. l1 l1 l1 ( +,( , )) ( −,( , )) 6 The rest of the paper is organized as follows. We start by an Constants e l1 l2 ? e l1 l2 2 l2 l2 l2 overview of our basic idea in Section 2, through an example of 2 2 2 dead code detection on a simple flowchart program without GOTO. Its control flow graph has tree width at most 2, i.e., the class of q ) q ) series-parallel graphs. 6 6 S 6 6 Series composition  2 1  l l ~ 2 ~ 2 Section 3 presents an optimal register allocation with the fixed num- 1 2 1  1  l1 l2 l1 l2 ber of registers for a flowchart program. The core of our technique G1 G2 S2 G1 G2 is Optimization Theorem [26, 27], which automatically gives an ef- ficient solution for maximum marking problems by certain generic   dynamic programming. The advantage and the problem of our l 1 l 1 match(l1,l ) 1 1 P 1 method are also briefly discussed. Parallel composition  I  I N  ?  2  ( ,  ) Section 4 introduces the general definition of SP Term, and demon- l2 2 l2 match l2 l2 strates how to extend dead code detection to a program that has a G1 G2 P2 G1 G2 control flow graph with larger tree width. We show that once the Figure 1. Interpretation of e+, e−,2,S, and P. reachability description is given, the description of dead code de- tection will be uniformly extended to larger tree width. Section 5 discusses related work, and Section 6 concludes the pa- ( , ) 1 per. Throughout the paper, we consider only intra-procedural con- tuple l1 l2 of labels, defined as the following. trol flow analyses (0-CFA), and describe algorithms in Haskell-like + SP2 :=(e , (l1,l2)) notations. − | (e , (l1,l2)) | (2, (l1,l2)) | (SSP2 SP2, (l1,l2)) | (PSPSP , (l ,l )) 2 Dead Code Detection without Iteration 2 2 1 2 In this section, we explain our idea through a simple case study, An SP Term is interpreted as a pair of a 2-terminal series-parallel di- dead code detection of flowchart programs. This class of graphs graph and a tuple of 2-labels; a 2-terminal digraph is a digraph with corresponds to the control flow graphs of structured (in strict sense) a tuple of two nodes, called terminals. We can regard the first ter- programs, i.e., programs that consist of single-entry and single-exit minal as the single-entry, and the second terminal as the next node blocks. of the single-exit. Labels (l1,l2) are the identifiers of terminals. Let ( , ) The syntax of flowchart programs is described below. At the end of match l l be the function that returns the whole program, the end statement is assumed to be added.   l if l = l or l = ∗ l if l = ∗  ⊥ Prog := x := e assignment otherwise | input x input statement | (i.e., accept the special label ∗ as a wild card during matching). output x output statement + | The constant (e , (l1,l2)) is interpreted as a diedge from the first Prog; Prog sequence − | if e then Prog else Prog fi conditional statement terminal to the second terminal, (e , (l1,l2)) as a diedge from | while e do Prog od while loop the second to the first terminal, and 2 as two isolated terminals. ( ,( , )) ( ,(  ,  )) The series composition S t1 l1 l2 t2 l1 l2 fuses the sec- ( ,  ) =⊥ ond terminal in t1 and the first terminal of t2 if match l2 l1 , Our key to the dead code detection without an iterative procedure is and regard the first terminal in t1 as the first and the sec- ond terminal in t2 as the second. The parallel composition the algebraic construction of control flow graphs, called SP Term. ( ,( , )) ( ,(  ,  )) P t1 l1 l2 t2 l1 l2 fuses each first and second terminals in After translation from a flowchart program to an SP Term, we show ( ,  ), ( ,  ) =⊥ ( ,  ) t1 and t2 if match l1 l1 match l2 l2 , and label match l1 l1 how to compute the sets of used and newly defined variables in on the first terminal and match(l ,l ) on the second terminal. each program fragment by a single bottom-up traversal over an SP 2 2 Term, and explain how to compute the set of live variables at each The interpretation of each function symbol and constant is de- scribed in Fig. 1; a terminal is presented as a double circle, and terminal in each (sub) SP Term by a single top-down traversal over , an SP Term. labels l1 l2 are associated to terminals. We prepare the function chT that exchanges the order of the two terminals of a graph.

chT :: SP2 → SP2 + − 2.1 SP Terms for Control Flow Graphs of chT (e , (l1,l2)) = (e , (l2,l1)) − + Flowchart Programs chT (e , (l1,l2)) = (e , (l2,l1)) Algebraic Construction of Series-Parallel Graphs chT (2, (l1,l2)) = (2, (l2,l1)) ( , ( , )) = ( ( )( ), ( , )) Control flow graphs of flowchart programs are graphs with tree chT Sxy l1 l2 S chT y chT x l2 l1 ( , ( , ))=(( )( ), ( , )) width at most 2, which are known as series-parallel directed graphs chT Pxy l1 l2 P chT x chT y l2 l1 (digraphs) [32]. Such graphs can be specified in terms of SP Term. 1 + − Note that the following definition is somewhat simplified compared In Section 4.1, e , e , P2 are denoted by e2(1,2), e2(2,1), and to that in Section 4.1 for general cases. P2 respectively. For readability, we set St1 t2 = S2 (chT t2) t1, DEFINITION 1. An SP Term is a pair of a ground term t and a where Sk is uniformly defined in Section 4.1 for k ≥ 2. 1 1 l 1 l + 1 1 l 1 ∗ 1 l 1 l + l chT P S 2 Pe ? ? j j j 2 ∗ 2 ∗ 2 ∗ 2 l + 1 2 l + 1 l + 1 l + 1 ? + + ∗ ∗ e trans p e chT (trans p) 2 2

Figure 2. Translation of while statement to an SP Term.

Translation from Programs to SP Terms Leaf nodes in an SP Term are either e+, e−, and 2. Each edge in + − We add labels to each statements in Prog to identify each node in a control flow graph uniquely corresponds to either e or e , and a control flow graph. We denote the set of such labeled programs each while loop uniquely corresponds to 2. Thus, the number of by LProg. The implementation trans of the transformation from a leaves in an SP Term is equal to the sum of the number of edges labeled program to an SP Term is given below. and while loops, which is proportional to the size of a program.2 This concludes that transformation from a program to an SP Term trans :: LProg → SP2 has (at most) linear growth in size. trans (l : x := e)=(e+,(l,∗)) Fig. 3 describes the control flow graph of the example program (in trans (l : input x)=(e+,(l,∗)) Section 1), which computes the sum of 1,2,···,n for an input n, trans (l : out put x)=(e+,(l,∗)) and its transformation to an SP Term by trans. In Fig. 3, a tuple trans (p1; p2)=(S (trans p1)(trans p2), (l,∗)) associated to each subtree is a tuple of terminals at the interpretation where l is the starting line of p1 of the subtree. trans (l : if e then p1 else p2 fi) Note that the description of a control flow graph by an SP Term + =(P (S (e ,(l,∗)) (trans p1, (l,∗))) is not unique. For instance, gives an alternative description of the + (S (e ,(l,∗)) (trans p2, (l,∗))), (l,∗)) same program in Fig. 3 (Transformation trans is already nondeter- trans (l : while e do p od) ministic for p1; p2. Fig. 3 is obtained by the left most decomposi- =(P (e+,(l,∗)) tion, and Fig. 4 is by the righter most decomposition) (S (P (e+,(l,∗)) (chT (trans p)(∗,l + 1)), (l,l + 1)) ( ,(∗,∗)), ( ,∗)), ( , ) 2 l S 1 10 (l,∗)) (1,5) S P (5,10) (4,5) (1,4) S e+ e+ S (5,10) (5,10) Program CFG SP-term (1,3) + (6,10) S e (5,6) P 2 (3,4) e+ e+ e+ S (5,6) 1 ( , ) ( , ) ( , ) ( , ) S 1 10 1 2 2 3 5 6 − (9,6) ? (1,2) e S 1: input n; + ( , ) ( , ) 2 e S 2 10 5 9 − (8,6) 2: i := 0; e S ? + ( , ) (9,8) 3: S := 0; (2,3) e S 3 10 e− e− = 3 + ( , ) (8,7) (7,6) 4: c : True; ? (3,4) e S 4 10 5: while c + ( , ) e P (5,10) do 4 4 5 Figure 4. Another equivalent SP Term description ? + ( , ) 6: i := i + 1;  (5,10) e S 5 10 = 5 ( , ) 7: c : False; (5,6) P 2 6 10 2.2 Dead Code Detection of Flowchart Pro- 8: S := S + i; ? ( , ) + (5,6) 9: c := i <= n; 6 5 6 e S grams − od; ? (5,7) S e (7,6) Our target is dead code detection, i.e., whether defined variables are 8 used before redefined. We use the following functions to extract 10 : out put S ( , ) S e− (8,7) ? 5 8 information from a node labeled l. − − (5,9) e e (9,8) 9 = { } = R defv l x if the node is an assignment x : e or 10 a input statement input x. = if the node is just an expression. usev l = FV(e) if the node is either an assignment Figure 3. An example of control flow graph and its transforma- x := e or an expression e. tion to SP Term = {x} if the node is an output statement out put x. For instance, the translation of while-statement proceeds as in Detecting Used and Defined Variables in a Fragment Fig. 2. Intuition behind the wild character label “∗” is; for each We first prepare the functions use1 g, use2 g, def1→2 g, and fragment of a program, the first label denotes the entry of the frag- def2→1 g that detect which variables are used and/or defined in a ment, and the second label, which is always “∗” during transforma- sub SP Term of g. use1 g returns the set of variables that are used tion, denotes the next control point. Note that each program frag- ment has the unique node labeled with “∗”. At the end, ∗ is replaced 2Assuming that tree width is at most k, |E|≤k|V| where V,E with the label for the end statement, i.e., the end of the program. are the set of nodes and edges, respectively [23]. S ({n},φ,{i,S,c},Var) S (φ,φ) (φ,φ,{i},Var) 1 : input n; {n} + + (φ,φ) e S ({n,i},φ,{S,c},Var) bottom-up ({n},{n,i}) e S top-down 2 : i := 0; {n,i} + + 3 S = 0 { , , } (φ,φ,{S},Var) e S ({n,i,S},φ,{c},Var) ({n,i},{n,i,S}) e S (φ,φ) : : ; n i S 4 : c := True; {n,i,S,c} (φ,φ,{ }, ) + ({n,i,S,c},φ,φ,Var) ({ , , },{ , , , }) + (φ,φ) c Var e S n i S n i S c e S 5 : while c {n,i,S} + + (φ,φ) ({c},φ,φ,Var) e P ({n,i,S},φ,φ,Var) ({n,i,S,c},{n,i,S}) e P do + + ({ },φ) 6 i = i + 1 { , , } ({S},φ,φ,Var) e S ({n,i,S},φ,Var,Var) ({n,i,S},φ) e S S : : ; n i S 7 : c := False; {n,i,S} ({ , , },φ,{ },{ , }) P 2 (φ,φ, , ) ({ },φ) P 2 n i S i S c Var Var S ({n,i,S},{n,i,S}) 8 : S := S + i; {n,i,S} + (φ,{ , , }, ,{ , }) + ({i},φ,{i},Var) e S n i S Var S c ({n,i,S},{n,i,S}) e S ({n,i,S},φ) 9 : c := i <= n; {n,i,S,c} − − od; (φ,{n,i,S},Var,{S,c}) S e (φ,φ,Var,{,c}) ({n,i,S},φ) S e ({n,i,S},{n,i,S}) 10 : output S φ (φ,{n,i},Var,{c}) S e−(φ,{ , }, ,{ }) ({ , , },φ) S e− S c Var S n i S ({n,i,S},φ) − − − − (φ,{c},Var,φ) e e (φ,{n,i},Var,{c}) e e ({n,i,S},{n,i,S,c}) ({n,i,S,c},{n,S,c}) ( , , , ) ( , ) use1 use2 def1→2 def2→1 vs1 vs2 Detected live variables Figure 5. Examples of use1, use2, def1→2, def1→2, and addLive

before being redefined in some path in g starting from terminal 1.3 1: input n; ⇐ defv(l1)={n}⊆{n} def1→2 g returns the set of variables that are newly defined in all 2: i := 0; ⇐ defv(l2)={i}⊆{n,i} paths in g from terminal 1 to terminal 2 (if terminal 2 is reach- 3: S := 0; ⇐ defv(l )={S}⊆{n,i,S} able from terminal 1); and returns Var (the set of all variables)4, 3 4: c := True; ⇐ defv(l4)={c}⊆{n,i,S,c} otherwise. We omit the complementary definitions for use2 and 5: while c def2→1 g. do 6: i := i + 1; ⇐ defv(l6)={i}⊆{n,i,S} + = ⇐ ( )={ } ⊆ { , , } use1 (e , (l1,l2)) = usev l2 7: c : False; defv l7 c n i S − = + ⇐ ( )={ }⊆{ , , } use1 (e , )=φ 8: S : S i; defv l8 S n i S = <= ⇐ ( )={ }⊆{ , , , } use1 (2, )=φ 9: c : i n; defv l9 c n i S c use1 (Sxy, )=use1 x ∪ (use1 y \def1→2 x) od; use1 (Pxy, )=use1 x ∪ (use2 y \def1→2 x) ∪ 10 : out put S use y ∪ (use x \def → y) 1 2 1 2 Figure 6. Dead code detection of a flow chart program + def1→2 (e , (l1,l2)) = defv l2 − def1→2 (e , )=Var ( , ( , )) def1→2 (2, )=Var addLive Sxy l1 l2 vs1 vs2 =( ( ( ∪ ( \ ))) def1→2 (Sxy, )=def1→2 x ∪ def1→2 y S addLive x vs1 use1 y vs2 def1→2y ( ( ∪ ( \ )) ), def1→2 (Pxy, )=def1→2 x ∩ def1→2 y addLive y use2 x vs1 def2→1x vs2 (l1, l2, vs1, vs2)) Note that by tupling use , use , def → , and def → , we can com- 1 2 1 2 2 1 addLive (Pxy, (l1,l2)) vs1 vs2 pute the sets of live variables at terminal 1 and 2 in each sub SP =(P (addLive x Term of g by a single bottom-up traversal on g [18]. (vs1 ∪ use1 y ∪ ((vs2 ∪ use2 x) \def1→2 y)) Live Variable Detection without Iteration (vs2 ∪ use2 y ∪ ((vs1 ∪ use1 x) \def2→1 y))) ( After the computation of use , use , def → , and def → , we as- addLive y 1 2 1 2 2 1 ( ∪ ∪ (( ∪ ) \ )) sume that each sub SP Term in g has additional information of the vs1 use1 x vs2 use2 y def1→2 x ( ∪ ∪ (( ∪ ) \ ))), results of these functions. vs2 use2 x vs1 use1 y def2→1 x (l , l , vs , vs )) Next we give a function addLive that associates the information 1 2 1 2 of live variables to each terminal in each sub SP Term in g. The With the assumption that use1, use2, def1→2, and def2→1 are com- function addLlive takes an SP Term g and two sets of variables puted and their results are stored, addLive is done in a single top- vs1 and vs2 (both with the initial value of φ), where vs1 denotes down traversal on g. live variables outgoing from g at terminal 1 and vs denotes live 2 Fig. 5 shows computation of (use1,use2,def1→2,def1→2) and variables outgoing from g at terminal 2. It returns a pair of an SP addLive on a control flow graph in Fig. 3. At the terminals in a Term and a tuple of the sets of variables that are alive at the terminal leaf in an SP Term, the detected set of live variables at each node is 1 and 2. obtained. + addLive (e , (l1,l2)) vs1 vs2 + =(e , (l1, l2, vs1 ∪ usev l2 ∪ (vs2 \defv l2), vs2)) − Dead Code Detection of Flowchart Programs addLive (e , (l1,l2)) vs1 vs2 − Now that the set of live variables at each node in a control flow =(e , (l1, l2, vs1, vs2 ∪ usev l1 ∪ (vs1 \defv l1))) ( , ( , )) =(, ( , , , )) graph has been computed, dead code detection is straightforward. addLive 2 l1 l2 vs1 vs2 2 l1 l2 vs1 vs2 A variable is dead if it is not live. Dead code is an assignment that 3 use1 g omits the used variables at terminal 1. assigns a value to a dead variable. Thus, in the example in Fig. 5, 4Var satisfies X ∩Var = X, X ∪Var = Var, and X \Var = φ for the assignment Z:=X+2 at line 8 is a dead code, since Z is dead as each set X of variables. in shown in Fig. 6. 3 Register Allocation for Flowchart Program instruction register live variables In this section, we show how to find an optimal register allocation 1: input n; ( , , ) {n} as an instance of a maximum marking problem. Our strategy is, first 2: i := 1; (n, , ) {n,i} write down the finite mutumorphic specification checking whether 3: S := 0; (n,i, ) {n,i,S} marking represents correct register allocation, and the weight w that STORE S (n,i,S) counts the number of required LOAD/STORE instructions. Second, 4: c := True; (n,i, ) {n,i,S,c} transform checking to the form with foldSP by tupling transfor- 5: while c (n,i,c) {n,i,S} mation [18]. Then, if w is homomorphic, Optimization Theorem do (Theorem 1 [27, 26]) automatically gives how to detect an opti- 6: i := i + 1; (n,i,c) {n,i,S} mal register allocation with certain generic dynamic programming 7: c := False; (n,i,c) {n,i,S} (i.e., a single traversal on an SP Term), assuming the live variables LOAD S (n,i,c) are pre-computed. Note that we restrict ourselves to control flow 8: S := S + i; (n,i,S) {n,i,S} graphs with bounded tree width, and do not intend P = NP, where STORE S (n,i,S) the conventional optimal register allocation based on graph color- 9: c := i <= n; (n,i, ) {n,i,S,c} ing [10] is NP-complete. od; For simplicity, we consider a flowchart programs (without GOTO) LOAD S (n,i,c) as in Section 2. We assume that functions defv l, usev l, and live 10 : out put S (n,i,S) {} variables at terminal labeled l are pre-computed (as in Section 2). 3.1 Register Allocation Figure 7. An example of optimal register allocation In a real computer, an instruction is executed with values on limited number of registers. If needed inputs are not on registers, then they must be loaded from memory; and if there are no room for them, MMP includes many interesting problems, such as knapsack prob- some values on registers must be stored. These LOAD/STORE in- lems, and optimized range problems in data mining [28]. Of course, structions are usually expensive, and register allocation is an opti- it is not expected that every MMP problem can be solved efficiently. mization that under fixed number of registers, find an optimal regis- In fact, MMP includes NP-hard problems, such as the knapsack ter usage, i.e., a program execution with the minimum use of LOAD problem. However, for instance, the knapsack problem restricted to (from memory to register) and STORE (from register to memory). integer weight can be computed in linear time. Basic operations on registers are either LOAD, STORE, move,or Let us consider more formally. The specification of MMP is de- execution of an instruction. When the number of registers is 4, for scribed as follows, where constraints are expressed by a boolean- instance, we have valued function p and a weight function w. → LOAD x r2 y z y x z mmp w p = selectmax w ◦ f ilter p◦gen → STORE x r2 y x z y z The function gen generates all possible marking on elements: = → r4 : r1 (move) y x z x z y gen : D →{D∗} = + → r4 : r1 r2 y x z y x z u D∗ is the data structure derived from D where each node is attached where u = y + x with a mark. The function f ilter p takes a set of marked data and selects ones that satisfy the property p. The function selectmax w For simplicity, we only concentrate on the number of takes a set of marked data and select one that has the maximum LOAD/STORE instructions, and do not care on the number value with respect to the weight function w. Then, we can derive of move. a linear time algorithm mechanically if the property p is defined Fig. 7 shows the optimal register allocation for a simple program by finite mutumorphisms, and the weight function w is homomor- (appeared in Section 1), which computes the sum of 1 to n with 3 phic [26]. 5 registers. Here, each tuple of three variables represents a regis- Mutumorphism is a set of mutually recursive functions, among ter allocation just before each instruction is executed. The special which no nested function calls occur and each argument of recur- symbol means that the register is either empty or permitted to sive call is a sub-structure of the input [15]. Note that by tupling overwritten. Note that between line 7 and 8, STORE c needs not transformation, mutumorphism is transformed to a single catamor- to be inserted; instead we just overwrite S on c. This is correct, phism [18]. Although mutumorphism is defined on more general because c is dead at line 7. data structures, from now on, we will consider SP Terms only. 3.2 Maximum Marking Problem DEFINITION 2(FINITE MUTUMORPHIC PROPERTY [27]). Maximum marking problem (MMP for short) can be specified as A property p is finite mutumorphic if it is defined by follows: Given a data structure x, the task is to find a way to mark p : SP∗ → Bool elements in x such that the marked data satisfies a certain property + p (e , a)=φe+ a p and has the maximum (or, equivalently, minimum) value with re- − ( , )=φ − spect to certain weight function w. This means that no other mark- p e a e a ing of x satisfying p can produce a larger value with respect to w. p (2, a)=φ2 a p (Sx1 x2, a)=φS (hx1)(hx2) a 5Strictly speaking, LOAD/STORE instruction must be inserted p (Px1 x2, a)=φP (hx1)(hx2) a at the level, but for simplicity we just insert LOAD/STORE instructions into a flowchart program. where h x =(px, f1 x, f2 x,..., fm x), which may use auxiliary  functions f1,..., fm each of which has finite range of Ci. has an O(|C |·n) algorithm described as ∗ fi : SP → Ci + opt ψ + ψ − ψ ψS ψP fstp + p − p pS pP fi (e , a)=φie+ a e e 2 e e 2 − fi (e , a)=φie− a ( , )=φ  fi 2 a i2 a where C = C1 ×···×Cm and n is the size of an input. fi (Sx1 x2, a)=φiS (hx1)(hx2) a f (Px x , a)=φ (hx )(hx ) a The core of Optimization Theorem is a generic dynamic program- i 1 2 iP 1 2 ming. The idea is; during data traversal, compute intermediate max- If p is finite mutumorphic, tupling transformation [18] will yield a ima for all possible states that may contribute to the finial maxi- ,..., catamorphism for h. Therefore a finite mutumorphic property p can mum. Finite mutumorphisms f1 fm describe state transition, be described in the form of and finiteness of their ranges guarantee that such states are finite. For the definition of opt and detail, refer to [27, 26]. = ◦ + − p fst foldSP pe pe p2 pS pP It follows from this theorem that we can detect all dead codes with the property md and the weight function nd by the following pro- where fst is the function that takes the first element in a tuple and gram. the fold (catamorphism) operation foldSP on SP terms is defined below. opt ψ1 ψ1 ψ1 ψ2 ψ2 id ϕ1 ϕ1 ϕ1 ϕ2 ϕ2

foldSP ϕe+ ϕe− ϕ2 ϕS ϕP = ϕ + We will see a more interesting application of the theorem in the next where ϕ (e , a)=ϕe+ a Section. − ϕ (e , a)=ϕe− a ϕ (2, a)=ϕ2 a ϕ ( , )=ϕ (ϕ )(ϕ ) Sx1 x2 a S x1 x2 a 3.3 Optimal Register Allocation as MMP ϕ ( , )=ϕ (ϕ )(ϕ ) Px1 x2 a P x1 x2 a As an application of Optimization Theorem, we demonstrate the register allocation problem. To be concrete, recall the dead code detection in Section 2.2, where we have reached the point that each node is added with a set of live variables. Assume that some nodes in the graph are marked (which Check Whether Each Terminal Has a Correct Mark can be checked by isM.) Now we may define the property md by foldSP that all marked nodes in the graph are dead. Let Var be the set of variables that appears in a program, and let be the special symbol that means a register is either empty or ready md = foldSP ϕ1 ϕ1 ϕ1 ϕ2 ϕ2 to overwrite. The set Reg of register allocations (we consider the where size of registers is three) is defined as: ϕ1 (l1,l2,vs1,vs2)=valid (l1,vs1) ∧ valid (l2,vs2) ϕ2 p1 p2 a = p1 ∧ p2 ∧ ϕ1 a Reg = {(v1,···,vn) | vi ∈ Var ∪{ }, = ∨ = = = }. Here valid is to determine whether a marked terminal node is dead, vi v j vi v j if i j i.e., valid (l,vs)=if isM l then defv l ⊆ vs else True. DEFINITION 3(HOMOMORPHIC WEIGHT FUNCTION [26]). An element in Reg is labeled to each terminal in an SP Term as a A weight function w is homomorphic if w is defined as a fold mark, which represents the register allocation state just before the instruction at the terminal being executed. Below, we will describe w : SP∗ → Weight • checking, which checks whether each terminal has a correct w = foldSP ψ + ψ − ψ ψ ψ e e 2 S P mark, and ψ ψ where S and P is described as a summation in a form like • w, which counts the number of required LOAD/STORE in- structions under a certain marking of the program. ψS r1 r2 a = r1 + r2 + vS a ψ = + + A mark in Reg is a tuple, and we use the following operations (as P r1 r2 a r1 r2 vP a =( ,···, ),  =(  ,···,  ) ∈ analogy to set operations). Let r v1 vn r v1 vn for some functions vS and vP. Reg. Continuing with the dead code detection problem, we may define a 6 \\  =( ,···, )  = =  weight function nd to count the number of the marked dead nodes r r v1 vn where vi if vi vi then else vi RV r = {v1,···,vn}\{ } nd = foldSP ψ1 ψ1 ψ1 ψ2 ψ2 where ψ1 (l1,l2,vs1,vs2)=cl1 + cl2 The function checking g takes a marked SP-term associated with ψ2 p1 p2 a = p1 + p2 the line numbers in a program (denoted by l1,l2), the sets of live variables (denoted by vs1 and vs2), and the marking that represents Here clreturns 1 if the node l is marked, and 0 otherwise. (pre-execution) register status (denoted by m1 and m2) at terminal THEOREM 1(OPTIMIZATION THEOREM [26, 27]). 1 and 2, and returns a Boolean value. If the property p is finite mutumorphic and the weight function w is homomorphic, MMP specified by + checking (e , ((l1,l2,vs1,vs2), (m1,m2))) ∗ → = ch ((l1,l2, vs1,vs2), (m1,m2)) spec : SP SP − spec = mmp w p checking (e , ((l1,l2,vs1,vs2), (m1,m2))) = ch ((l ,l , vs ,vs ), (m ,m )) 6 1 2 1 2 1 2 This is not exactly true. In fact, all marked dead nodes except checking (2, ((l1,l2, vs1,vs2), (m1,m2))) for the two terminal nodes of the whole graph are counted twice. = ch ((l1,l2, vs1,vs2), (m1,m2))  checking (Sxy, ((l1,l2, vs1,vs2), (m1,m2))) count l vs m m = (  ,  )= = =( ∩ ) ∪ let m1 m2 getMarks x let V RV m vs defv l ( , )= ⊆  m1 m2 getMarks y in i f V RV m in checking x ∧ checking y then max|(V \RV m) ∩ vs| + |RV m \V |      ∧ m1 = m ∧ m2 = m ∧ m = m else i f |V | < n then |RV m \V | 1 2 2 1  checking (Pxy, ((l1,l2,vs1,vs2), (m1,m2))) else i f RV (m \\ m)=φ then 0 = (  ,  )= let m1 m2 getMarks x else 2 ( , )= m1 m2 getMarks y ∧ The function w counts the number of required LOAD/STORE at in checking x checking y + − ∧ =  ∧ =  ∧ =  ∧ =  each edge (uniquely represented by e or e ), and just sums up for m1 m1 m1 m1 m2 m2 m2 m2 ch ((l1,l2, vs1,vs2), (m1,m2)) recursive constructors S and P. = usev l1 ⊆ RV m1 ⊆ (vs1 ∪ usev l1) ∧ The intuition for count is: V is the set of variables that are placed on usev l ⊆ RV m ⊆ (vs ∪ usev l ) ∧ the registers after an instruction is executed. If V is not included in 2 2 2 2  |defv l1| + |RV m1 ∩ vs1 \defv l1|≤n ∧ the next register status m , then their difference must be stored and |defv l2| + |RV m2 ∩ vs2 \defv l2|≤n loaded. Between STORE and LOAD operations, we can reorder the getMarks (t,a)=snd a positions of variables by move operations. Assume V is included in the next register status m.If|V| < n, this means there exists ⊆ The judgment usev l1 RV m1 corresponds to the pre-condition of a register with , and we can reorder variables in V. Otherwise, the instruction l1 at terminal 1 in g, i.e., each variable used in the V = RV m, and if RV (m \\ m) = φ we need to make room by a | | + |( ∩ instruction must be in some register in m1, and defv l1 m1 pair of STORE and LOAD operations for reordering. vs ) \ defv l |≤n corresponds to the post-condition, i.e., m has 1 1 1 The above definition of the weight function w is homomorphic, and a room to write defined variables (defv l ); otherwise, some live 1 w is defined by foldSP as follows. variables in m1 except for those defined at l1 will be overwritten before stored. Notice the obvious optimizing conditions ψe+ a = let ((l1,l2,vs1,vs2), (m1,m2)) = a in count l1 vs1 m1 m2 RV m ⊆ vs ∪ usev l 1 1 1 ψe− a RV m ⊆ vs ∪ usev l 2 2 2 = let ((l1,l2,vs1,vs2), (m1,m2)) = a in count l2 vs2 m2 m1 ψ2 a = 0 in ch ((l ,l ,vs ,vs ),(m ,m )), which mean that live variables in 1 2 1 2 1 2 ψS xya = x + y registers are as many as possible. ψP xya = x + y The checking property is defined as finite mutumorphisms with the function getMarks. By tupling transformation, we get the following form: Applying Optimization Theorem, and Discussion With all the above, Theorem 1 automatically derives the solution = ◦ + − checking fst foldSP pe pe p2 pS pP for optimal register allocation. where At last, two points are worth remarking: 1. In real compilers, there are often practical requirements of p + (x, a)=(ch a, snd a) e hardware, such as, some instruction must use some specific p − (x, a)=(ch a, snd a) e registers, some register must be used together with some spe- p (x, a)=(ch a, snd a) 2 cific registers, or the result of some instruction must be writ- pS xya     ten in a different register. These requirements are hard for the = let (m ,m )=snd x (m ,m )=snd y (m1,m2)=snd a 1 2 1 2    conventional method [10], but our method is in ( fstx∧ fsty ∧ m = m ∧ m = m ∧ m = m , 1 1 2 2 2 1 easy to handle them by modifying the function checking. (m1,m2)) pP xya 2. We obtain an optimal register allocation without iteration. The = (  ,  )= ( , )= ( , )= let m1 m2 snd x m1 m2 snd y m1 m2 snd a core of the technique is dynamic programming on SP Terms. ( ∧ ∧ ( =  ∧ = ) in fstx fsty m1 m1 m1 m1 The cost to pay is huge marking space, which grows expo- ∧ ( =  ∧ = ), m2 m2 m2 m2 nentially to the number of registers. However, since checking (m1,m2)) can be judged locally (like forall in Haskell), most of marking is avoided by default. We expect demand-driven computa- Here we include obvious optimizing conditions tion.helps the situation.

RV m1 ⊆ vs1 ∪ usev l1 ⊆ ∪ 4 Analyzing Control Flow Graphs with RV m2 vs2 usev l2 Larger Tree Width In this section, we discuss how our method can be extended to wider to ch (l1,l2,vs1,vs2,m1,m2), which mean that live variables in reg- isters are as many as possible. class of programs. The example is again dead code detection; but for a program that has a control flow graph with tree width larger Weight Counts the Number of Required LOAD/STORE than 2. For simplicity, we mostly consider control flow graphs with The weight function w is defined as follows. tree width at most 3. Our construction is uniform and extension for larger tree width is straight forward, if we assume the description + on reachability among terminals. w (e , ((l1,l2,vs1,vs2), (m1,m2))) = count l1 vs1 m1 m2 − w (e , ((l1,l2,vs1,vs2), (m1,m2))) = count l2 vs2 m2 m1 Due to lack of space, we do not explain tree decomposition, which w (2, ((l1,l2,vs1,vs2), (m1,m2))) = 0 gives the original definition of tree width [25]. Instead, we treat a w (Sxy, ((l1,l2,vs1,vs2), (m1,m2))) = wx+ wy (di)graph with tree width at most k as a (di)graph denoted by an SP w (Pxy, ((l1,l2,vs1,vs2), (m1,m2))) = wx+ wy Term in SPk. 4.1 SP Terms G3 In Section 2.1, we show how an SP Term of a control flow graph 1 + ?3 of a flowchart program (i.e., a series-parallel graph) is computed in 2 4 G G linear time. In this section, we give definition of general SP Term 3 > } 2 G G 1 1 4 1 1I 2 (for graphs with larger tree width) and show that translation will be q ) ? ©- done in linear time. 6 2 6 6 3 6 2 ~ 1 2 3 } 2 2 4 4 3 DEFINITION 4. An SP Term is a pair of a ground term t and a 1 3 2 G2 G1 3 tuple (l(1),···,l(k)), defined as the following. >-?- 12 1 4 3 G I2  SPk :=(ek(i, j), (l(1),···,l(k))) (i = j) 1 ? | (k, (l(1),···,l(k))) G1 | ( ... , ( ( ),···, ( ))) Sk SPk SPk l 1 l k shift k G3 | (Pk SPk SPk, (l(1),···,l(k))) 1 + ?4 ( ),···, ( ) 2 5 Here, l 1 l k are labels, Pk is the parallel composition, and G3 G2 > } G4 G2 Sk is the series composition. 1 1 1 1I q ) ? ©- - SP Terms (ek(i, j),(l(1),···,l(k))) and (k,(l(1),···,l(k))) 6 6 6 4 6   3 ~3 4 } 2 5 5 4 are interpreted as k-terminal digraphs G, G with terminals 1 2 2 3 3 3 ( ( ),···, ( ))) G2 G1 4 l 1 l k and >-?- 23 2 5 4 ( )={ ( ),···, ( )}, ( )={( ( ), ( )}, I  V G l 1 l k E G l i l j G1 3 ? ( )={ ( ),···, ( )}, ( )=φ. V G l 1 l k E G G1 fuse i.e., k-nodes l(1),···,l(k) with one diedge from l(i) to l( j), and i.e., k isolated nodes l(1),···,l(k), respectively (See Fig. 8). 1 1 > } I q ) ©+- ? 6 6 ? 2 4 1 3 ~ 6 4 6 I5  i 2 k 12 >-?}- ? 23 3 3 remove N 4 j 5 1 1 > } I e (i, j) k q ) ©+-? k 6 6 ? 2 4 $ ~ 6 $ 6 $ ( , ) > } I ? Figure 8. Interpretation of constants ek i j and k 12 -?-  23 3 ( ··· , ( ( ),···, ( ))) Series composition Sk t1 tk l 1 l k is interpreted in S2 G1 G2 S3 G1 G2 G3 S4 G1 G2 G3 G4 3 steps. See Fig. 9 (In Fig. 9 and 10, a double circle expresses a  =( , ( ),···, ( )) Figure 9. Interpretation of series composition S2, S3, S4 terminal). Let ti ti li 1 li k . 1. Shift the numbering of terminals in ti, i.e., the j-th terminal to the j + 1-th terminal for each j with i ≤ j ≤ k. 1  I 2. Fuse each terminal of the same numbering and put a label N 1 K 2  match (l (i − 1),···,l − (i − 1),l + (i),···,l (i))  I R 1 i 1 i 1 k N © 1 K 2  3  to the i-th terminal if  I R N  © q match(l1(i − 1),···,li−1(i − 1),li+1(i),···,lk(i)) =⊥ . 2 3 4 G1 G2 G1 G2 G1 G2 3. Remove the last terminal (labeled $ in Fig. 9). P2 P3 P4 where match (l(1),···,l(k)) is an abbreviation of match (l(1),match (l(2),···,match (l(k − 1),l(k))···). 1 1 1  I  I  I ? ? ? Parallel composition (Pk t1 t2, (l(1),···,l(k))) is interpreted similar 2 K2 K2 R R to P in Section 2; fuse each terminal of the same numbering in t1 = (  , ( ),···, ( )) =( , ( ),···, ( )) P2 G1 G2 © © t1 l1 1 l1 k and t2 t2 l2 1 l2 k , and put a label 3 3  match (l (i),l (i)) to the i-th terminal if match (l (i),l (i)) =⊥. 1 2 1 2 P G G See Fig. 10. 3 1 2 q 4 Intuition behind is; like that the parallel composition pk constructs P4 G1 G2 any subgraph in a complete graph Kk, the series composition Sk combines such components and produces a clique of the size k + 1 Figure 10. Interpretation of parallel composition P2, P3, P4 (i.e., an embedding of Kk+1). ( ( , ), ( , )) EXAMPLE 1. In Fig. 9, the digraph s (G ,G ,G ) has tree width addLive e2 1 2 l1 l2 vs1 vs2 3 1 2 3 =( ( , ), ( , , ∪ ∪ ( \ ), )) 3, and G ,G .G have tree width 2. The SP Terms of G ,G ,G e2 1 2 l1 l2 vs1 usev l2 vs2 defv l2 vs2 1 2 3 1 2 3 ( ( , ), ( , )) are described as addLive e2 2 1 l1 l2 vs1 vs2 =(e2(2,1), (l1, l2, vs1, vs2 ∪ usev l1 ∪ (vs1 \defv l1))) addLive (2, (l ,l )) vs vs =(2, (l ,l , vs ,vs )) G1 = S3(P3(e3(2,3),e3(3,1)),P3(e3(1,2),e3(1,3)),3) 1 2 1 2 1 2 1 2 addLive (S xy, (l ,l )) vs vs G2 = S3(P3(e3(1,2),e3(1,3)),e3(3,1),3) 2 1 2 1 2 =(S (addLive y vs (use x ∪ (vs \def → x))) G3 = S3(3,e3(1,2), 2 1 2 2 2 1 (addLive x vs (use y ∪ (vs \def → y))), S3(P3(e3(1,2),e3(1,3)),P3(e3(1,2),e3(3,1)),3)) 2 2 1 2 1 (l1, l2, vs1, vs2)) addLive (P2 xy, (l1,l2)) vs1 vs2 REMARK 1. SP (series parallel graphs) allows several choices 2 =(P2 (addLive x for definition of the series composition. For instance, definition of (vs1 ∪ use1 y ∪ ((vs2 ∪ use2 x) \def1→2 y)) S in Section 2 is different from S here; they can be related as 2 (vs2 ∪ use2 y ∪ ((vs1 ∪ use1 x) \def2→1 y))) (addLive y = ( ( )). Sxy chT S2 x chT y (vs1 ∪ use1 x ∪ ((vs2 ∪ use2 y) \def1→2 x)) (vs2 ∪ use2 x ∪ ((vs1 ∪ use1 y) \def2→1 x))), S2 is the part of uniform way to define the series composition for (l1, l2, vs1, vs2)) each SPk; however, for readability, we used the simplified version S The alternative definition below contains some redundant computa- in Section 2. tion. This is because generality of the definition, if one considers to ∩ ∪ REMARK 2. The definition of series composition Sk and parallel extend to general k. Note that distributivity of wrt and inclusion \ ⊆ composition Pk are given by Arnborg, et.al. in a different aspect of like use2 g def1→2 g use1 g can absorb the differences. an algebraic construction of graphs with bounded tree width [3]. use (e (1,2), (l ,l )) = usev l Note that if once an SP Term t is given, tree decomposition [25] 1 2 1 2 2 use (e (2,1), (l ,l )) = φ of a corresponding graph is straightforward: a backborn tree T as 1 2 1 2 use (2, (l ,l )) = φ V(T)={s | s ⊆ t} and a covering X for s ∈ V (T) as the set of 1 1 2 s use g@(S xy, (l ,l )) terminals in s. 1 2 1 2 = use1 x ∪ (use1 y \def1→2 g) ∪ ((use2 x ∪ use2 y) \def1→2 x) ( ) ≤ THEOREM 2. Let G be a digraph with twd G k and use1 g@(P2 xy, (l1,l2)) | ( )|≥ ≥ V G k for k 2. Then, an SP Term is computed in linear =(use1 x ∪ use1 y) ∪ ((use2 x ∪ use2 y) \def1→2 g) time (wrt |V(G)|) such that its interpretation is a pair of k-terminal ˜ = ˜ digraph G and a tuple of k-terminals with G G by neglecting ter- addLive (e2(1,2), (l1,l2)) vs1 vs2 minals. =(e2(1,2), (l1, l2, vs1 ∪ usev l2 ∪ (vs2 \defv l2), vs2)) ( ( , ), ( , )) In general, deciding the tree width of a graph is NP-complete; how- addLive e2 2 1 l1 l2 vs1 vs2 =( ( , ), ( , , , ∪ ∪ ( \ ))) ever, for fixed k, whether a graph has tree width at most k is decid- e2 2 1 l1 l2 vs1 vs2 usev l1 vs1 defv l1 ( , ( , )) =(, ( , , , )) able in linear time [7, 24]. Fortunately, we already know the upper addLive 2 l1 l2 vs1 vs2 2 l1 l2 vs1 vs2 ( , ( , )) bound of the tree width of control flow graphs of some specific pro- addLive g@ S2 xy l1 l2 vs1 vs2 =( ( ∪ (( ∪ ) \ )) gramming languages. S2 addLive y vs1 use2 x vs2 use1 x def2→1x (addLive x vs use y ∪ ((vs ∪ use y) \def → y)), This shows the general method to compute an SP Term from a con- 2 2 1 1 2 1 (l , l , vs ,vs )) trol flow graph via tree decomposition. This is done in linear-time, 1 2 1 2 addLive g@(P2 xy, (l1,l2)) vs1 vs2 but not so efficient linear time. However, a direct translation (such  = let vs = vs1 ∪ ((vs2 ∪ use2 x ∪ use2 y) \def1→2 g) as trans in Section 2.1) from a program will be much more effi- 1 vs = vs2 ∪ ((vs1 ∪ use1 x ∪ use1 y) \def2→1 g) cient, because a control flow graph loses the parsing information of 2   in (P (addLive x (use y ∪ vs )(use y ∪ vs )) an original program. For a simple imperative language with GOTO, 2 1 1 2 2 (addLive y (use x ∪ vs )(use x ∪ vs )), such a translation is shown in Appendix A. 1 1 2 2 (l1, l2, vs1, vs2)) 4.2 Alternative Definition of Dead Code De- 4.3 Dead Code Detection for Larger Tree tection on SP2 Width Before the extension, we give the alternative definitions of the func- tions use1 g, use2 g, addLive g vs1 vs2 for SP2 in Section 2). Recall Used/Defined Variables in a Fragment in SP3 that the original definition is as follows (taking account into the modification of S instead of S2): usev l if i = 1 use (e (i, j), (l ,l ,l )) = j 1 3 1 2 3 φ otherwise use (e (1,2), (l ,l )) = usev l 1 2 1 2 2 use1 (3, (l1,l2,l3)) = φ use (e (2,1), )=φ 1 2 use1 g@(S3 xyz, (l1,l2,l3)) use (2, )=φ 1 = use1 y ∪ use1 z ∪ use (S xy, )=use y ∪ (use x \def → y) 1 2 1 2 1 2 ((use1 x ∪ use2 z) \def1→2 g) ∪ use (P xy, )=use x ∪ (use y \def → x) ∪ 1 2 1 2 1 2 ((use2 x ∪ use2 y) \def1→3 g) ∪ use y ∪ (use x \def → y) 1 2 1 2 ((use3 x ∪ use3 y ∪ use3 z) \def1→$ g) use1 g@(P3 xy, (l1,l2,l3)) → ( ( , ), ( , )) = def1 2 e2 1 2 l1 l2 defv l2 =(use1 x ∪ use1 y) ∪ → ( ( , ), )= def1 2 e2 2 1 Var ((use2 x ∪ use2 y) \def1→2 g) ∪ → ( , )= def1 2 2 Var ((use3 x ∪ use3 y) \def1→3 g) def1→2 (S2 xy, )=def1→2 y ∪ def2→1 x def1→2 (P2 xy, )=def1→2 x ∩ def1→2 y The basic idea is the same as that in use1 g for SP2 except ( , ( , , )) for the modification ((use x ∪ use y ∪ use z) \ def → g) in addLive g@ P3 xy l1 l2 l3 vs1 vs2 vs3 3 3 3 1 $ =  =( ∪ ∪ ) ∪ use g@(Sxyz, (l ,l ,l )). $ is the newly introduce symbol that let vs1 vs1 use1 x use1 y 1 1 2 3 (( ∪ ∪ ) \ ) ∪ represents the removed terminal in Sxyz, i.e., terminal 3 in x,y,z. vs2 use2 x use2 y def1→2 g (( ∪ ∪ ) \ ) For SP , def → (Sxy) coincides with def → x, because paths vs3 use3 x use3 y def1→3 g 2 1 $ 1 2  =(( ∪ ∪ ) \ ) ∪ from terminal 1 to terminal 2 in x are only paths from terminal 1 to vs2 vs1 use1 x use1 y def2→1 g (vs ∪ use x ∪ use y) ∪ $ing without loops. Thus, for SP2, the need for $ is hidden. 2 2 2 ((vs3 ∪ use3 x ∪ use3 y) \def2→3 g)  vs =((vs1 ∪ use1 x ∪ use1 y) \def3→1 g) ∪ ( ( , ), ( , , )) 3 def1→2 e3 i j l1 l2 l3 ((vs2 ∪ use2 x ∪ use2 y) \def3→2 g) ∪ = = ( ∪ ∪ ) = defv l2 if i 1, j 2 vs3 use3 x use3 y ( (    )(    ), Var otherwise in P addLive x vs1 vs2 vs3 addLive y vs1 vs2 vs3 ( , , , , , )) def1→2 (3, (l1,l2,l3)) = Var l1 l2 l3 vs1 vs2 vs3 def1→2 (S3 xyz, (l1,l2,l3)) = def1→2 z ∩ Discussion (def1→2 y ∪ def2→1 x) ∩ (def1→3 z ∪ def3→1 x) ∩ Here, we present our study only for SP3, i.e., tree width at most 3. (def1→3 y ∪ def3→2 z) ∩ (def1→3 y ∪ def3→1 x) ∩ It is worth mentioning the analogy between defi→ j and reachi→ j, (def1→3 z ∪ def3→2 y ∪ def2→1 x) ∩ and the difficulty to extend to graphs with larger tree width is fo- (def1→2 y ∪ def2→3 x ∪ def3→2 z) cused on the reachability description among terminals. Current our def1→2 (P3 xy, (l1,l2,l3)) description is not parametric wrt tree width, i.e., we must describe, = def1→2 x ∩ def1→2 y ∩ say, dead code detection for each SPk. However, we have a basic (def1→3 y ∪ def3→2 x) ∩ (def1→3 x ∪ def3→2 y) feel that there would be some generic skeleton-like structure regard- def1→$ (Sxyz, (l1,l2,l3)) ing reachability. That is, with the description of reachability for SPk = def1→3 y ∩ def1→3 z ∩ and the description of an analysis for SP2, we can generate the de- (def1→2 y ∪ def2→3 x) ∩ (def1→3 z ∪ def3→3 x) ∩ scription of an analysis for general SPk. (def1→2 y ∪ def2→1 x ∪ def2→3 z) ∩ Of course, with the increase of tree width, the number of functions (def1→2 z ∪ def1→2 x ∪ def2→3 y) rapidly grows. But, recall that most JAVA programs (and possi- bly other imperative programs) have a control flow graph with tree The definition of def1→2 in SP3 is quite complex especially for ∪,∩ ∧,∨ width at most 3 [16]. Thus, even for relatively small tree width, our Sxyz. The intuition can be obtained by replacing with , method would cover quite large portion of real programs. respectively. Then, by setting values for base cases as True if (i, j)=(1,2) 5 Related Work reach → (e (i, j), (l ,l ,l )) = 1 2 3 1 2 3 False otherwise Many researches have been devoted to the declarative approaches reach1→2 (3, (l1,l2,l3)) = False to program analyses. Steffen and Schmidt [31, 30] showed that tem- poral logic is well suited to describe data dependencies and other the same definition gives us the judgment of reachability from ter- properties exploited in classical optimization. Lacey, minal 1 to terminal 2 in g. et.al. [21] formalized program optimization as rewriting systems with temporal logic side conditions (described in CTL-FV) and Live Variables Detection for SP3 shows that CTL-FV plays a crucial role in the proofs of correct- Now, we give definition of addLive to detect live variables for SP3. ness of classical optimizations. Instead of temporal logic, de Moor, et.al. [12] proposed another functional approach to control flow ( ( , ), ( , , )) addLive g@ e3 1 2 l1 l2 l3 vs1 vs2 vs3 analyses. Their specification language is the regular path condition, =( ( , ), e3 1 2 but the efficiency of derived programs is not discusses. (l , l , l , vs ∪ usev l ∪ (vs \defv l ), vs , vs )) 1 2 3 1 2 2 2 2 3 There are several functional approaches for computation on graphs. addLive (3, (l ,l ,l )) vs vs vs 1 2 3 1 2 3 For instance, Fegaras and Sheard [14] treat graphs with embed- =(3, (l , l , l , vs , vs , vs )) 1 2 3 1 2 3 ded functions, i.e., graphs are treated as functions that generates addLive g@(S3 xyz, (l1,l2,l3)) vs1 vs2 vs3  all paths in a graph. Erwig introduces the active pattern matching, = let vs =(vs ∪ use y ∪ use z) ∪ 1 1 1 1 which is a conditional pattern matching mechanism [13]. Their ap- ((vs ∪ use x ∪ use z) \def → g) ∪ 2 1 2 1 2 proaches are interesting in description, but the existence of strong ((vs ∪ use x ∪ use y) \def → g) ∪ 3 2 2 1 3 side conditions limits the chance to optimize. Instead, we restrict ((use3 x ∪ use3 y ∪ use3 z) \def → g)  1 $ ourselves to graphs with bounded tree width, in which many NP- vs =((vs ∪ use y ∪ use z) \def → g) ∪ 2 1 1 1 2 1 hard graph problems are solved in linear-time [11, 9]. (vs2 ∪ use1 x ∪ use2 z) ∪ The concept of a graph with bounded tree width [25] independently ((vs3 ∪ use2 x ∪ use2 y) \def2→3 g) ∪ appeared from early 80’s; partial k-tree in terms of cliques, some ((use3 x ∪ use3 y ∪ use3 z) \def → g)  2 $ vs =((vs ∪ use y ∪ use z) \def → g) ∪ algebraic construction of k-terminal graphs [4, 3], and in terms of 3 1 1 1 3 1 separators, and they are all equivalent. The class of graphs with ((vs2 ∪ use1 x ∪ use2 z) \def3→2 g) ∪ (vs ∪ use x ∪ use y) ∪ bounded tree width is quite restrictive; but the significant tread- 3 2 2 off is: The class of graphs with bounded tree width frequently ((use3 x ∪ use3 y ∪ use3 z) \def → g)  3 $ vs =((vs ∪ use y ∪ use z) \def → g) ∪ has a linear time algorithm for graph problems. For graphs with 1 1 1 $ 1 bounded tree width, there have been lots of work on automatic gen- ((vs2 ∪ use1 x ∪ use2 z) \def$→2 g) ∪ ((vs ∪ use x ∪ use y) \def → g) ∪ eration of linear-time algorithms from specification in monadic sec- 3 2 2 $ 3 ond order formulae, which are frequently NP-complete for general ((use3 x ∪ use3 y ∪ use3 z) ( (   )(   ) graphs [11, 9]. in S addLive x vs2 vs3 vs addLive y vs1 vs3 vs (   ), Our starting observation is that most programs (without spaghetti addLive z vs1 vs2 vs (l1, l2, l3, vs1, vs2, vs3)) GOTO) have control flow graphs with bounded tree width, and many control flow analyses can be specified in temporal logic (such only if the left-hand-side figure in Fig. 11 is contained [17]. How- as CTL-FV). By combining them, it seems easy to obtain (almost) ever, that graph is easy to treat from tree width point of view; it is linear-time algorithm for control flow analyses. This is true in the- described in SP2 as ory, but not in practice; each existence of quantifiers in a formula ( , )( ( ( ( , ) ( , )) ( , )) ( , )) causes the exponential explosion of the constant factor. Our ap- S2 e2 1 2 P2 S2 P2 e2 1 2 e2 2 1 e2 1 2 e2 2 1 proach is, directly write functional specification on the simple data (with terminal 1 at the top and terminal 2 at the rightmost node). structure, SP Term. This approach drastically reduces the constant In contrast, a complete directed acyclic graph (DAG) with m-nodes factor [27]. Further, an SP Term is more approachable especially (as in the right-hand-side figure in Fig. 11) is described in SPm, from programming point of view, and it does not refuse to capture proportional to the size m. However, any DAG is reducible. better algorithmic ideas. For an algebraic construction of graphs, one of the early work for flowchart scheme is found in [29]. Bauderon and Courcelle [4] are ? ? also pioneers, and our SP Term is greatly in debt to the work by ¬ reducible reducible ∈ SP2  -^ ∈ SP2  -^N Arnborg, et.al. [3]. However, their constructions do not fit to our k purpose; for instance, the construction by Arnborg, et.al. [3] re- Figure 11. Difference between reducible flow graph and SP quires the recursive constructors li ,r ,s , p with 1 ≤ i ≤ j ≤ k for j j j j Term graphs with tree width at most k. Thus, the number of their re- cursive constructors becomes k(k + 1)(k + 2)/6, and this makes us difficult to write recursive definitions. We proposed another con- 6 Conclusion and Future Work struction, SP Term, which has only 2 recursive constructors Sk,Pk In this paper, we proposed an iterative-free approach to program regardless of the size of k. The number of constants ek(i, j) has analysis, based on the fact that control flow graphs of most prac- square growth, but they are interpreted as diedges from the i-th to tical programs are well structured. Our main contributions can be the j-th terminal. For these constants, writing functional specifica- summarized as follows. tion (base cases) is easy; even in uniform way. • We defined a simple but powerful algebraic construction of Thorup [33] showed that a structured imperative program have a digraphs called SP Terms, on which program analyses can be control flow graph with relatively small tree width. He also inves- naturally described as catamorphisms (or mutumorphism). As tigated on finding near optimal register allocation by the conven- catamorphism enjoys many nice algebraic rules such as fusion tional graph coloring on an intersection graph. It is well known and tupling for algorithmic optimization [6], this catamorphic that register allocation is equivalently reduced to the graph coloring formalization of program analyses makes it possible to sys- problem [10], which is known to be NP-complete. For precise solu- tematically derive efficient analysis algorithms, which has not tion, it seems pessimistic; Kannan and Proesbsting showed that the been really recognized so far. number of minimal coloring (thus deciding the minimum number of registers that can be allocated without spilling) is NP-complete • We identified that many program analyses can be considered even for SP2 (series parallel graphs) [19]. as the maximum marking problems. By making use of the However, if we further assume that the number of registers is fixed, optimization theorem for them, we are able to obtain efficient we can obtain an efficient solution. Bodlaender, et.al. showed a analysis algorithms. linear-time algorithm to decide whether a program can be executed • As demonstrated by two examples, our method is quite power- without spilling for a fixed number of registers [8]. This is elegant ful. In fact, many program analysis examples in the compiler in theory; however, their estimation includes the blow up of tree textbook can be cast into this framework. width of an intersection graph of a family of subgraphs. Thus, their This research is just at the beginning, and there are lots of subjects constant factor explodes. to conquer. Our method based on Optimization Theorem could also have a huge • As pointed in Section 3, the table for dynamic programming constant factor, which can grow to the power of the number of live technique can easily explode. Although our method drasti- variables. However, there is possibility to tame it. For instance, we cally improves constant factor compared to starting from for- could expect the number of live variables at each program point are mulae [27], still this is quite true. However, we only need the not so large, and most of markings would be avoided immediately. computation that can reach to the result satisfying given con- These observation suggest that, in practice, there seems a room to straints, and, from our experience, computation in most part improve the constant factor drastically by demand-driven computa- of the table does not contribute to obtain such results. There- tion and other program transformation techniques, which are avail- fore, we hope demand-driven computation will improve the able for functional programs. For instance, Ohori proposed another situation, and would like to confirm it by experiments. register allocation by proof transformation on typed assembly lan- guage, which reduces the number of candidates of optimal register • Currently, our description of analyses is not parametric wrt allocations [22]. The combination with such methods would be tree width k. However, as Section 4 suggests, the reachability worth exploring. description would work as a generic skeleton-like structure. We should mention another classical efficient solution under cer- We have a strong feel about it, but it must be more concrete. tain restriction of control flow graphs: reducible flow graph [2, 1]. • The set of SP Terms is an initial algebra; however, a k-terminal If a program has a reducible control flow graph, one can con- graph may have multiple representations by SP Terms. This struct n (log n) algorithms for program analyses, such as com- means whether the user defined functional specification is mon subexpression detection. Knuth also showed that most FOR- consistent with the interpretation of SP Terms to k-terminal TRAN programs have reducible control flow graphs by an empirical graphs is up to the user’s responsibility. For instance, in Fig. 5, study [20]. different occurrences in the SP Term of a node in the control SP Term is independent to the concept of a reducible flow graph; flow graph have the same set of live variables. This is guar- for instance, Hecht and Ullman showed a graph is reducible if and anteed by user’s semantic consideration. From its own the- oretical interest and possible better support, we hope to give the complete axiomatization of SP Terms under this interpre- [16] J. Gustedt, O.A. Mæhle, and A. Telle. The of Java tation. programs. In Proc. 4th Workshop on Algorithm Engineering and Experiments, ALENEX 2002, pages 86–97, 2002. Lecture Notes in Computer Science, Vol. 2409, Springer-Verlag. Acknowledgments The authors thank Oege de Moor and Jeremy Gibbons for stimu- [17] M.S. Hecht and J.D. Ullman. Characterizations of reducible lating discussions during their visit at the University of Tokyo, and flow graphs. Journal of the ACM, 21(3):367–375, 1974. thank Aki Takano for his helpful suggestions. We also thank anony- [18] Z. Hu, H. Iwasaki, M. Takeichi, and A. Takano. Tupling cal- mous referees and Fritz Henglein for their valuable comments and culation eliminates multiple data transversals. In Proc. 2nd suggestions. Last but not least, we thank Masato Takeichi for his ACM SIGPLAN International Conference on Functional Pro- continuous support. gramming, pages 9–11. ACM Press, 1997. [19] S. Kannan and T. Proebsting. Register allocation in structured 7 References programs. Journal of Algorithms, 29:223–237, 1998. [1] A. Aho, R. Sethi, and J.D. Ullman. Compilers Ð Principles, [20] D.E. Knuth. An empirical study of FORTRAN programs. Techniques, and Tools. Addison-Wesley, 1986. Software Practice and Experience, 1(2):105–134, 1971. [2] F.E. Allen. Control flow analysis. ACM SIGPLAN Notices, [21] David Lacey, Neil D. Jones, Eric Van Wyk, and Carl Christian 5(7):1–19, 1970. Frederiksen. Proving correctness of compiler optimizations [3] S. Arnborg, B. Courcelle, A. Proskurowski, and D. Seese. An by temporal logic. In Proc. 29th ACM SIGPLAN-SIGACT algebraic theory of graph reduction. Journal of the Associa- Symposium on Principles of Programming Languages, pages tion for Computing Machinery, 40(5):1134–1164, 1993. 283–294. ACM Press, 2002. [4] M. Bauderon and B. Courcelle. Graph expressions and graph [22] A. Ohori. Register allocation by program transformation. In rewritings. Mathematical System Theory, 20:83–127, 1987. Programming Languages and Systems, 12th European Sym- posium on Programming, ESOP03, pages 399–413, 2003. [5] R. Bird. Maximum marking problems. Journal of Functional Lecture Notes in Computer Science, Vol. 2618, Springer- Programming, 11(4):411–424, 2001. Verlag. [6] R. Bird and O. de Moor. Algebra of Programming. Prentice [23] L. Perkovi´c and B. Reed. An improved algorithm for finding Hall, 1996. tree decompositions of small width. In Widmayer et al., edi- [7] H.L. Bodlaender. A linear-time algorithm for finding tree- tor, WG’99, pages 148–154, 1999. Lecture Notes in Computer decompositions of small treewidth. SIAM Journal Computing, Science, Vol. 1665, Springer-Verlag. 25(6):1305–1317, 1996. [24] L. Perkovi´c and B. Reed. An improved algorithm for finding [8] H.L. Bodlaender, J. Gustedt, and J.A. Telle. Linear-time reg- tree decompositions of small width. International Journal of ister allocation for a fixed number of registers. In Proc. 9th Foundations of Computer Science, 11(3):365–371, 2000. ACM-SIAM Symposium on Discrete Algorithms, SODA 1998, [25] N. Robertson and P.D. Seymour. Graph minors II. algorith- pages 574–583. ACM Press, 1998. mic aspects of tree-width. Journal of Algorithms, 7:309–322, [9] R.B. Borie, R.G. Parker, and C.A. Tovey. Automatic genera- 1986. tion of linear-time algorithms from predicate calculus descrip- [26] I. Sasano, Z. Hu, and M. Takeichi. Generation of efficient tions of problems on recursively constructed graph families. programs for solving maximum multi-marking problems. In Algorithmica, 7:555–581, 1992. Proc. 2nd International Workshop in Semantics, Applications, [10] G.J. Chaitin. Register allocation & spilling via graph coloring. and Implementation of Program Generation, volume 2196, In Proc. ACM Symposium on Compiler Construction, pages pages 72–91. Springer-Verlag, 2001. Lecture Notes in Com- 98–105. ACM Press, 1982. puter Science. [11] B. Courcelle. Graph rewriting: An algebraic and logic ap- [27] I. Sasano, Z. Hu, M. Takeichi, and M. Ogawa. Make it prac- proach. In J. van Leeuwen, editor, Handbook of Theoretical tical: A generic linear-time algorithms for solving maximum- Computer Science, volume B, chapter 5, pages 194–242. El- weightsum problems. In Proc. 5th ACM SIGPLAN Interna- sevier Science Publishers, 1990. tional Conference on Functional Programming, pages 137– 149. ACM Press, 2000. [12] O. de Moor, D. Lacey, and E. van Wyk. Universal regular path queries. to appear in High Order Symbolic Computation, [28] I. Sasano, Z. Hu, M. Takeichi, and M. Ogawa. Derivation of 2002. linear algorithm for mining optimized gain association rules. Computer Software, 19(4):39–44, 2002. [13] M. Erwig. Functional programming with graphs. In Proc. 1997 ACM SIGPLAN International Conference on Functional [29] H. Schmeck. Algebraic characterization of reducible Programming, pages 52–65. ACM Press, 1997. SIGPLAN flowcharts. Journal of Computer System Science, 27(2):165– Notices 32(8). 199, 1983. [14] L. Fegaras and T. Sheard. Revisiting catamorphisms over [30] D.A. Schmidt. Data flow analysis is model checking of ab- datatypes with embedded functions (or, programs from outer stract interpretations. In Proc. 25th ACM SIGPLAN-SIGACT space).InProc. 23rd ACM SIGPLAN-SIGACT Symposium on symposium on Principles of Programming Languages, pages Principles of Programming Languages, pages 284–294. ACM 38–48. ACM Press, 1998. Press, 1996. [31] B. Steffen. Data flow analysis as model checking. In Theo- [15] M. Fokkinga. Tupling and mutumorphisms. Squiggolist, 1(4), retical Aspects of Computer Science, volume 526 of Lecture 1989. Notes in Computer Science, pages 346–364. Springer-Verlag, → 1991. permT :: SPn+2 SPn+2 permT (Sn+2 x1 ··· xn+2, (l1,···,ln+2)) [32] K. Takamizawa, T. Nishizeki, and N. Saito. Linear-time =(Sn+2 (permT xn+1)(permT x1) computability of combinatorial problems on series-parallel ··· (permT xn)(permT xn+2), (ln+1,l1,···,ln,ln+2)) graphs. Journal of the Association for Computing Machinery, permT (Pn+2 xy, (l1,···,ln+2)) 29:623–641, 1982. =(Pn+2 (permT x)(permT y), (ln+1,l1,···,ln,ln+2)) ( ( , ), ( ,···, )) [33] M. Thorup. All structured programs have small tree width permT en+2 i j l1 ln+2 =( (( ), ( )), ( , ,···, , )) and good register allocation. Information and Computation, en+2 perm i perm j ln+1 l1 ln ln+2 ( + , ( ,···, )) 142:159–181, 1998. permT n 2 l1 ln+2 =( + , ( , ,···, , )) A Computing SP Terms directly from Imper- n 2 ln+1 l1 ln ln+2 ative Programs perm :: Nat → Nat To show the direct translation from an imperative program with perm m = if m== (n+ 2) then m GOTO to an SP Term, we define a simple imperative language. The else i f m == (n+ 1) then 1 else m + 1 definition, given in Figure 12, is similar to that in [21], except for the additional “while” construct. For simplicity, this language has addG :: SPn+2 → SPn+2 no exceptions or procedures. addG (Sn+2 x1 ··· xn+2, (l1,···,ln+2)) = addE (Sn+2 (addG x1) ··· (addG xn+2), (l1,···,ln+2)) addG (Pn+2 xy, (l1,···,ln+2), (l1,···,ln+2)) P ::= I; P program = addE (Pn+2 (addG x)(addG y), (l1,···,ln+2)) I ::= l : C instruction addG (en+2(i, j), (l1,···,ln+2)) C ::= x := e assignment = addE (en+2(i, j), (l1,···,ln+2)) | input x input statement addG (n + 2, (l1,···,ln+2)) = addE (n + 2, (l1,···,ln+2)) | out put x output statement | if e then P else P fi conditional statement addE :: SPn+2 → SPn+2 | while e do P od while loop  addE x@(t, (l,so1,···,son,l )) | goto l goto statement  = if (l == desj) || (l == desj) | break break statement  then (Pn+2 x (en+2(so j,desj), (l,so1,···,son,l )), l : label  (l,so1,···,son,l )) Figure 12. A Simple Imperative Language else x The function nlift insert labels of goto statements as new n- Let us consider a program in this language with at most n-GOTO. terminals between the first and the second (original) terminal in an The translation transG to an SP Term is given below. The basic SP Term; permT permutes except for the last terminal to adapt to the idea is; construct an SP Term by ignoring GOTO and memorize series composition. Next, addG adds an edge between the source their source nodes as additional terminals. Then, scan the SP Term and destination nodes of each goto-statement. again, and add an edge by the parallel composition at some subterm Note that if each block has at most m-goto then instead of n (the in which the destination node eventually becomes a terminal. (Note sum of the numbers of goto) we can similarly transform a control that each node in a control flow graph becomes a terminal of some flow graph to an SP Term in SP + . subterm of an SP Term.) m 2 Let prog be a program written in the language in Fig. 12. As in Section 2.1, we first preprocess prog to lprogby labeling each line of prog. Let ((so1,des1),···,(son,desn)) be the tuple of n-pairs of the source and destination nodes of each goto in lprog(We assume soi = desi for each i). Let lprog be a program obtained from lprog by replacing goto with a null command skip. Then, lprog is regarded as a flowchart program in Section 2.1, and  addG (nlift (trans l prog )) where functions addG, nli ft, and trans are defined below.

nlift :: SP2 → SPn+2 + nlift (e , (l1,l2)) = (en+2(1,n+ 2), (l1,so1,···,son,l2)) − nlift (e , (l1,l2)) = (en+2(n+ 2,1), (l1,so1,···,son,l2)) nlift (2, (l1,l2)) = (2, (l1,so1,···,son,l2)) nlift (S2 xy, (l1,l2)) =(Sn+2 (permT (nli ft x)) (n + 2, (l1,so2,so3,···,son,l2,$)) (n + 2, (l1,so1,so3,···,son,l2,$)) ··· (n + 2, (l1,so1,so2,···,son−1,l2,$)) (nli ft y), (l1,so1,···,son,l2)) nlift (P2 xy, (l1,l2) =(Pn+2 (nlift x)(nlift y), (l1,so1,···,son,l2))