A Loop Optimization Technique Based on Quasi- Invariance

Litong Song1, Yoshihiko Futamura1, Robert Glück1, Zhenjiang Hu2 1 Dept. of Information and Computer Science, Graduate School of Science and Engineering, Waseda University, Okubo 3-4-1, Shinjuku-ku, Tokyo 169-0072, Japan, E-mail: {slt, futamura}@futamura.info.waseda.ac.jp, [email protected] 2 Dept. of Information Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan, E-mail: [email protected]

Techniques for code motion are basically limited Abstract to loop-invariant computations, this can be a problem Loop optimization plays an important role in in automatically produced programs. Consider the optimization and program transformation. digital circuit in Fig. 1, which is simulated by the Many sophisticated techniques such as loop- program in Fig. 2, where the simulation is carried out invariance code motion, loop restructuring and loop for T steps. We use f, g and gi to denote functions fusion have been developed. This paper introduces a simulating blocks. novel technique called loop quasi-invariance code y motion. It is a generalization of standard loop- invariance code motion, but based on loop quasi- f invariance analysis. Loop quasi-invariance is similar to standard loop-invariance but allows for a finite g number of iterations before computations in a loop become invariant. In this paper we define the notion x1 x2 x3 of loop quasi-invariance, present an algorithm for statically computing the optimal unfolding length in g1 g2 g3 While-programs and give a transformation method. Our method can increase the accuracy of program analyses and improve the efficiency of programs by c1 c2 c3 making loops smaller and faster. Our technique is Fig. 1. A sequential digital circuit. well-suited as supporting transformation in , partial evaluators, and other program transformers. while t

1 if t

x3:=g3(x2,c3); unfolding length for a loop, x2:=g2(x1,c2); • a static quasi-invariance analysis for computing the y:=f(g(x1,x2,x3),y); optimal unfolding length for any given loop in a t:=t+1; program which allows to remove all quasi-invariant if t

2 There are two assignments to variable x inside the endwhile. loop. After one iteration variable x in assignment Fig. 7 An original loop. “x:=1” becomes invariant, but we can not move the assignment out of the loop because the value of x in Using SSA technique, the loop is transformed into the first use after the assignment would become equal SSA form (Fig. 8). Note that we assume any loop in to the value of x defined in assignment “x:=i+2” SSA form is in the form of while [ss1] e do ss2 which is not correct. endwhile, where ss1 is a series of inserted The purpose of the SSA form is to represent data assignments which should be executed before testing flow properties of a program in a normalized form. the entry condition and ss2 is the body of the loop Many compilers use SSA or intermediate after SSA transformation. representation where a program is transformed into φ SSA form, optimized, and then transformed back to while [ i1:= (i0,i2); φ the original syntax. We follow the same approach. x1:= (x0,x2); Let us summarize the SSA form. First, the variable φ y1:= (y0,y12); on the left-hand side of each assignment is given a φ u1:= (u0,u2); unique name, and all of its uses are renamed φ z1:= (z0,z2); correspondingly (Fig.5). Second, if the use of a φ w1:= (w0,w2); variable x can be reached by two or more definitions, ≤ ] i1 0 do which maybe the case after a conditional, a special w:=w +103×x 3+102×x 2+10×x +1; form of assignment called a φ-function, is added at 2 1 1 1 1 x2:=u1+1; the join point. The operands of the φ-function if even(i1) then y2:=z1+1; else y3:=v; endif; indicate which assignments to x reach the join point. φ y4:= (y2,y3); Subsequent uses of x become uses of φ-function y5:=x2; value. We call the values assigned by φ as -function if z1>0 then y6:=1; virtual variables. This is illustrated in Fig. 6. if z1>v then y7:=z1+1; endif; φ y8:= (y6,y7); x:= 1; … := x+2; x1:= 1; … := x1+2; endif; x:= 3; … := x+4; x2:= 3; … := x2+4; φ y9:= (y5,y8); Fig.5. Straight-line code and its SSA form. if z1>1000 then y10:=y9+1; else y11:=i1; endif; φ y12:= (y10,y11); if e then x:= 1; if e then x1:= 1; u2:=z1–1; else x:= 2; endif else x2:=2; endif; z:=v+1; φ 2 x3:= (x1, x2); i2:=i1+1; Fig. 6 An if-statement and its SSA form. endwhile. Fig. 8 The SSA form of the loop in Fig. 7. An efficient algorithm that converts a program into SSA form and works essential linear in the size of the 3. Variable Dependency Graph original program has been proposed in [6]. The The loop transformation we want to perform depends following example (Fig. 7) briefly explains the on the loop quasi-invariant variables and the unfolding transformation. We are not interested in what the length of loops. Before we define these notions, we program does, only in illustrating the SSA form. introduce two variable dependency relations and variable dependency graph to perform our quasi- while i≤0 do 3 3 2 2 invariance analysis. First, we assume a loop contains w:=w+10 ×x +10 ×x +10×x+1; only assignment, then we extend our discussion to x:=u+1; conditionals and nested loops. Here and in the if even(i) then y:=z+1; else y:=v; endif; remainder of this paper, we assume that all source y:=x; programs are represented in SSA form. if z>0 then y:=1; if z>v then y:=z+1; endif endif; 3.1 Variable Dependency Graph if z>1000 then y:=y+1; else y:=i; endif For any assignment v:=e, we assume that the value u:=z–1; of v directly depends on all variables used in e. We z:=v+1; can define a variable dependency relation as follows. i:=i+1; Definition 1 (∠ relation). Let o be a loop, x, y be two

3 variables assigned to inside o. If the value of x that x will be turned into loop-invariant variable after depends on that of y, then the dependency relation IL(x,o) iterations of o. between x and y, is denoted by x∠y (called ∠ Proposition 1. Let o be a loop and x be LQIV of o. If relation). x(n) is used to represent the value of x after n Within relation ∠, we can distinguish another iterations of o, then ∀n≥IL(x,o).(x(n+1)=x(n)). variable dependency relation. The proposition is proven by induction over n; proof omitted. Based on the invariant lengths of the LQIV Definition 2 (! relation). Let o be a loop, x, y be two variables defined inside o, and x∠y. If y is assigned variables inside a loop, we can directly infer how many to inside o after being used by x inside o, then the times the loop should be unfolded. dependency relation between x and y, is denoted by Definition 6 (Unfolding length). Let o be a loop. If there exists LQIV variable inside o, then the x!y (called ! relation). To formalize loop quasi-invariance, we introduce a unfolding length of o UL(o)=def max{IL(x,o) | x is directed graph called variable dependency graph. LQIV and not a virtual variable of o}. Definition 3 (Variable dependency graph). Let o be a For Example, in the program of Fig. 2, we have loop. The variable dependency graph (VDG) of o is LQIV variables x1, x2, x3 and IL(x1,o)=1, IL(x2,o)=2, a directed graph where Node(o)={x | variable x is IL(x3,o)=3, UL(o)=3. Virtual variables are newly inserted variables and defined in o} and Edge(o)={xy | (y∠x)∧¬(y x)}∪ ! they will be removed at unfolding time. Therefore, ➙ {x y | y!x}. the invariant lengths of virtual variables are ignored ∠ For example, for the loop in Fig. 2, we have the in the definition of the unfolding length. ∠ ∠ ∠ ∠ ∠ ∠ ∠ relations: {x2 x1, x3 x2, y x1, y x2, y x3, y y, t t}, According to Definition 4, there exists no circular the ! relations: {x2!x1, x3!x2, y!y, t!t}, and the VDG path ending in a LQIV node x. Therefore, the number ∠ in Fig. 9, where relations and ! are indicated by thin of edges in all paths ending in x is finite, and the and thick edges, respectively. Note that there is only one invariant length of x is finite. This ensures the assignment to one variable in Fig. 2, for concision we termination of the analysis and loop unfolding. After ignore the SSA form of Fig. 2 but only use the original a loop o is unfolded UL(o) times, all LQIV variables form of Fig. 2. in the residual loop of o become invariant and can be removed from the residual loop.

x1 4. Conditionals in Loops x2 x3 The variable dependency relation defined above is induced by assignment statements. As in other static t y analysis techniques, conditionals induce new variable dependencies. New dependencies are induced between the variables used in the test of a conditional Fig. 9 The variable dependency graph of Fig. 2. and the variables defined in the branches of the conditional. We distinguish two cases:

3.2 Loop Quasi-Invariant Variable CASE 1: if op(y1,…,yn) then … xi:=…; … Based on VDG, we now give a formal definition of else … xj:=…; … endif; loop quasi-invariant (LQIV) variable. φ xk:= (xi, xj); Definition 4 (LQIV). Let o be a loop. For any node x Whether the assignment to xi and xj can be on the VDG of o, variable x is loop quasi-invariant removed from the conditional depends on the (LQIV) variable of loop o, if among all the paths variables used in expression op(y1,…,yn). If one of the ending in x, there is no path which contains a node variables y1,…,yn is not LQIV, which means that the that is a node on a circular path. value of op(y1,…,yn) may change at run time, none of Furthermore, we can give a formal definition of the assignments to xi and xj can be removed even if invariant length of LQIV variable. they would otherwise be LQIV variables. This kind Definition 5 (Invariant length). Let o be a loop. For of dependency relation between xi, xj and y1,…,yn can any LQIV node x (x is a LQIV variable of o) on the be expressed by adding dependency relations VDG of o, the invariant length of x IL(x,o)=def ∠ ∠ between xi, xj and y1,…,yn: xj y1,…,xj yn; 1+max{n | n=the number of ➙ edges on a path ending ∠ ∠ xi y1,…,xi yn. We say, these relations are induced in node x}. by the conditional. For any LQIV variable x of a loop o, IL(x,o) means CASE 2: xj:=…;

4 ∠ ∠ if op(y1,…,yn) then … xi:=…; … x1 i, x1 j; ∠ ∠ ∠ ∠ else … endif; x2 i, x2 j, x2 k, x2 x1 φ ∠ ∠ ∠ ∠ xk:= (xi, xj); x3 i, x3 j, x3 k, x3 x1; This is similar to CASE 1, except that the assignment ∠ ∠ ∠ x4 i, x4 j, x4 x1 to xj is laid out of the conditional but not in one branch. We now present the VDG (Fig. 10) of the program For the same reason as in CASE 1, we need to establish in Fig. 8, where white nodes indicate variant the dependency relations between xi, xj and y1,…,yn: variables, grey nodes indicate LQIV variables, and ∠ ∠ ∠ ∠ xi y1, …, xi yn; xj y1, …, xj yn; names of virtual variables are written italic. Relations Before x becomes loop-invariant, assignment ∠ j and ! are shown as thin and thick edges, xi:=… can not be removed. (otherwise xk will always respectively. According to Fig. 10, LQIV variables be equal to xj, which is not correct.) Therefore we add and their invariant lengths can be easily derived. dependency relation x ∠x . (we also say, this relation i j LQIV variables: {z1, z2, x1, x2, y5, y6, y7, y8, y9, y10, u1, is induced by the conditional.) u2}; IL(z1)=2, IL(z2)=1, IL(x1)=4, IL(x2)=3, IL(y5)=3, After we get the variable dependency relations IL(y6)=3, IL(y7)=3, IL(y8)=3, IL(y9)=3, IL(y10)=3, induced by the conditionals in a loop, we must transfer IL(u1)=3, IL(u2)=2. Because x1 is a virtual variable, the same relations from virtual variables to their unfolding length UL(o)=3. φ operands via the corresponding -function. This is For any loop in the form of while [ss1] e do ss2 necessary because virtual variables will be removed endwhile, only the variables defined in ss1 depend on afterwards, (as discussed in Section 6.) and the the variables that will be statically assigned dependency relations (from virtual variables to other afterwards. For example, in Fig. 8, y1 depends on y0 variables) induced by conditionals are actually the ones and y12, where y12 will be assigned afterwards. between their operands and other variables. Moreover, if the operands of virtual variables are also virtual variables, the dependency relations have to be y10 y12 y1 transferred further. We now define how the dependency relations induced by conditionals in a loop o are transferred via virtual variables. For any a dependency y9 y8 y11 relation x∠y directly induced by the conditionals like

CASE 1 and CASE 2, we use a function called CS to y5 y7 evaluate the above-mentioned transitive closure of relation x∠y. y6 { } if x and y are identical ∪ CSo(x1, y) CSo(x2, y) w2 x2 z1 z2 y4 y3

CSo(x, y)=def if x is a virtual variable of o φ defined by x:= (x1,x2) ∠ w x u u2 y2 i i2 {x y } otherwise 1 1 1 1 Let Cond_Rel(o) be the set of variable dependencies directly induced by the conditionals in loop o, (as Fig. 10 The variable dependency graph of Fig. 8. defined in CASE 1 and CASE 2.) and Vars(o) be the set of all variables defined in o. Then we define VS(o) (the 5. An Algorithm for LQIV Analysis set of all the relations derived from Cond_Rel(o), LQIV analysis consists of two phases. The first including Cond_Rel(o)) as follow: phase detects the dependency relations between the   variables defined in a loop, the second phase finds   all LQIV variables, computes their invariant lengths VS(o)=def  CS o(x, y) x∈∈Vars(o)"" y∈Vars(o)∧x∠y Cond_ Re l(o)  and the loop’s unfolding length. For example, let a loop contain the conditional: The second phase is the heart of the analysis. The x:=1; algorithm for the second phase is given below. It is 1 based on the classical algorithms by Warshall [22] if i>j then if k>5 then x2:=2; else x3:=3; endif; x:=φ(x ,x ); endif; and Floyd [8]. The time complexities of Warshall 4 2 3 algorithm and Floyd algorithm are O(n3) in the worst x:=φ(x ,x ); 5 1 4 case where n is the number of variables in the given We can derive all the dependency relations induced loop. The first phase is not shown here; it can be by the conditional as follows: done while parsing a program.

5 We assume that there are n variables (denoted 6. Loop Transformation ∠ with v1,…,vn) in a loop o, the relations and ! The purpose of loop transformation is to remove all between v1,…,vn have been stored in a Boolean n×n LQIV variables from a given loop. The unfolding ∠ length computed by the algorithm in the previous matrix R and an integer n×n matrix R!, respectively, ∠ section tells us how many iterations are needed where for any two variables vi and vj, if vi vj then before all LQIV variables become invariant. The R∠[i,j]=1 else R∠[i,j]=0 and if v v then R [i,j]=1 i! j ! loop transformation is based on decomposing the else R [i,j]=0. ! loop into two loops where the first loop iterates Algorithm (for LQIV analysis to singular loop): UL(o) times to compute the values of the LQIV Input: R∠, R . ! variables and the second loop is the remained loop Output: the set (Lqivs) of all the LQIV variables, the after code motion. In the process of code motion, invariant lengths of these LQIV variables, and variable renaming is necessary. the unfolding length (UL) of o. Begin 6.1 A Note on Variable Renaming { Based on Warshall algorithm, computing all the When generating code for a loop represented in SSA LQIV variables assigned to inside loop o. } form, we have to remove all assignments to virtual R1∠:=R∠; variables and to rename their uses correspondingly. for i=1 to n do Below we define how to rename variables. for j=1 to n do For any virtual variable x, (assume it is defined as if R1∠[j,i]=1 φ then for k=1 to n do x:= (y,z).) all the uses of x are actually the uses of y or z. If x:=φ(y,z) is removed then all the uses of x will R1∠[j,k]:=R1∠[j,k]∨R1∠[i,k]; endfor become undefined. Therefore, we must use same endif name for x, y and z so that all uses of x will naturally endfor endfor become the uses of y and z. We are going to discuss ∀ ∀ → ≠ the problem in the following 2 cases. Lqivs:={ i | i(1≤i≤n) j(1≤j≤n).(R1∠[i,j]=1 R1∠[j,j] 1)}; { Based on Floyd algorithm, computing the invariant CASE 1: y or z is also virtual variable. lengths of all LQIV variables and the unfolding length of When y or z is a virtual variable, we have the same o. Remark: all the variables in Lqivs and all dependency situation as with x and its operands should also be relations among these variables can be viewed as a renamed using the same name. The process continues directed graph, where any node represents a LQIV recursively until no new virtual variables are met. variable and any edge between two nodes represents ∠ CASE 2: x is an operand of another virtual variable. or relation between the two variables. Therefore, the When x is an operand of another virtual variable ! φ invariant length of any LQIV variable is actually (e.g., w:= (v,x)), by the same reason, w, v and x equivalent to the longest path ending in the variable, if should also be renamed using the same name. The the length of any ∠ edge is defined as 0 and that of any process continues recursively until no new virtual edge is defined as 1. } variables are met. ! We now define which variables should be renamed R1 :=R ; ! ! using the same name in a loop o. For this purpose, we for any i∈Lqivs do ∈ define function RE which determines the set of for any j Lqivs do variables that should be renamed using the same name. for any k∈Lqivs do {x} if x is not a virtual variable if R1∠[j,i]=1∧R1∠[i,k]=1∧R1∠[j,k]=1 RE (x)= {x}∪RE (x )∪RE (x )∪RE’ (x) then if R1 [j,i]+R1 [i,k]>R1 [j,k] o def o 1 o 2 o ! ! ! if x is a virtual variable defined then R1 [j,k]:=R1 [j,i]+R1 [i,k]; endif; φ ! ! ! by x:= (x1,x2) endif; endfor endfor endfor { } if x is not an operand of another ∈ for any i Lqivs do IL[i]:=0; endfor; virtual variable ∈ for any i Lqivs do RE’ (x)= {y}∪RE (z)∪RE’ (y) ∈ o def o o IL[i]:=1+max{ R1![j,i] | j Lqivs }; if x is an operand of another endfor; virtual variable y defined by UL:= max{ IL[i]] | i∈Lqivs∧i∉virtual variables }; y:=φ(x,z) or y:=φ(z,x) End. For example, in Fig. 8 there are five φ-function φ φ φ assignments to y: y1:= (y0, y12), y4:= (y2, y3), y8:= (y6,

6 φ φ 3 3 2 2 y7), y9:= (y5, y8), y12:= (y10, y11). The variables in set w:=w+10×x +10 ×x +10×x+1; REo(y1)=REo(y12)={y1, y0, y12, y10, y11}, REo(y4)= {y4, y2, x:=u+1; y3}, REo(y8)=REo(y9)={y8, y6, y7, y9, y5} should be if even(i) then Y2:=z+1; else Y2:=v; endif; renamed using the same name respectively (e.g., y, Y2, Y3:=x; Y3). Similarly, the variables in each of the sets {x1, x0, if z>0 then Y3:=1; x2}, {i1, i0, i2}, {u1, u0, u2}, {z1, z0, z2}, and {w1, w0, if z>v then Y3:=z+1; endif; w2} should be renamed using the same names (e.g., x, endif; i, u, z, w). if z>1000 then y:=Y3+1; else y:=i; endif; u:=z–1; 6.2 Loop Transformation z:=v+1; In transforming a loop, there are two methods. On the i:=i+1; one hand, we can unfold a loop by adding unfolding counter:=counter+1; length copies (strictly, not full copies) of the loop to endwhile residual loop. On the other hand, these copies can while i≤0 do also be organized into a loop, and thus the residual w:=w+103×x3+102×x2+10×x+1; code never contains more than two loops. We discuss if even(i) then Y2:=z+1; else Y2:=v; endif; here only loop decomposition in the following. if z>1000 then skip; else y:=i; endif; i:=i+1; Code Motion by Loop Decomposition endwhile Concretely, if there is a loop o in Fig. 11(a), then it Fig. 12 The residual code of the loop in Fig. 8. can be transformed into two loops in Fig. 11(b). Remark: as a special case, if UL(o)=1 then a Compared with the original loop, in the residual conditional can be used instead of the first loop. code, some new variables are introduced and thus the other parts of the program containing the original while e do ss; endwhile; loop must be modified for consistence. For any (a). An original loop. variable (e.g., x) defined in an original loop, there φ must be an assignment like x1:= (x0,xn) in the corresponding SSA form and only x1 is visible to the counter=0; outside of the loop, where, x0 refers to the x defined while e∧(counter

ss1; the loop. According to variable renaming, x1, x0, xn counter:=counter+1; will be renamed with same name. If we use x for the endwhile; name, then the variable consistence problem above

while e do ss2; endwhile; can be easily solved. However, the other new (b). The transformed residual code. variables (e.g., Y2 and Y3) have to be declared. Fig. 11 The transformation based on LQIV code motion. To give an indication of the speedup, Table 2 shows

Note that ss1 indicates the full copy (except variable the runtime of the original loop and the residual code, 8 3 renaming) of ss, and ss2 indicates the ss after LQIV where i=–10 and x=y=z=u=v=w=10 . code motion. Table 2. The comparison of the runtime of the original Obviously, the space taken up by the residual code loop in Fig. 8 and the residual code in Fig. 12. will be less than twice the space taken up by the Gateway E5250 Sun ULTRA 5 original loop, and thus has a modest impact on the Pentium II xeon×2 SunOS 5.6 overall code size. This is useful when the optimal System Speed: 450MHZ Speed: 270MHZ unfolding length is large. Memory: 1GB Memory: 192MB Visual C++ 6.0 Gcc 2.7 For example, we can transform the loop in Fig. 8 into the residual code in Fig. 12 where expression Runtime Speed up runtime speed up 103×x3+102×x2+10×x+1 can be moved outside the Source 9sec 1 52sec 1 residual loop. A formalization for transformation is Result 3sec 3 30sec 1.73 given in Appendix 2.

counter:=0; 7. About Nested Loops while i≤0∧counter<3 do In the previous section we considered the treatment

7 of conditionals inside loops. Nested loops can be means any static analysis can in principle benefit from handled in a similar fashion. Here we give just a the loop optimization. short description because the methods introduced so To illustrate the effect on a concrete program far can also handle the analysis of nested loops. analysis, consider offline partial evaluation, a well- From the viewpoint of an outer loop o, only linear known program transformation method based on dependencies between the variables of an inner loop constant propagation which is controlled by a q are interesting. A variable which is variant in q separate binding-time analysis (BTA). (We assume may well be quasi-invariant in o (e.g., an inductive the reader is familiar with the principles of partial variable of q is variant in q, but can be quasi- evaluation; for more details see the book [13].) invariant with respect to o). This means we are not Let us consider a pointwise BTA where the interested in cyclic dependencies local to inner loops information computed by the analysis (“static”, when analysing o. Conceptually, this can be done by “dynamic”) is merged at each program point. In building a separate VDG for each loop o from a addition, let the BTA dynamize static values under program where all inner loops of o are viewed as dynamic control as done in the TEMPO specializer conditionals with empty else-branch (CASE 2 in for C [12]. In fact, the same technique is used when Section 4.), thereby avoiding the representation of inserting explicators in online partial evaluation (cf, cycles local to the inner loops. In practice, more [15]). efficient construction methods may exist. To compare the effect of loop quasi-invariant code motion on the analysis, we specialize the 8. Discussion of Applications programs in Fig. 2 and Fig. 3. For simplicity we A program analysis has to compute a safe shall treat functions f, g, g1, g2, g3 as primitive approximation of information about the dynamic operators at specialization time. behavior of a program. To compute such Consider specializing the programs with respect information an analysis must consider all possible to the static values c1=c2=c3=true. The residual computation paths and determine a safe programs are shown in Fig. 14(a) and (b) approximation of the collected information at each respectively. The difference should be obvious. In point where control flow meets (usually done by a particular, observe that static value of g=true is least upper bound/greatest lower bound operation). inlined in the residual loop Fig. 14(b) (possibly This situation is illustrated in Fig. 13(a) for the triggering further static computations in a loop). program in Fig. 2 where information A and B is This is possible because all computations of LQIV merged to form a safe approximation C at the join variables have been moved out of the loop. Since all point. As long as information is propagated along LQIV variables are invariant in the loop, their static code sequences without join points, no such values be treated as constants inside the loop even approximation occurs. though the BTA strategy dynamizes static values under dynamic control.

A while t

8 then y:=f(true,y); conditionals and computes the optimal unfolding t:=t+1; length (e.g., to make maximal use of known while t

9 1994. 4): 213-232, 1993. [3] Bacon D. F., Graham S. L., Compiler [15] Meyer U., Techniques for partial evaluation of transformations for high-performance imperative language, Symposium on Partial computing, ACM Computing Surveys, Vol.26, Evaluation and Semantics-Based Program No.4, 345-420, December 1994. Manipulation, (Sigplan Notices, vol. 26, no.9, [4] Bodik R., Gupta R., Soffa M. L., Complete September 1991), 94-105, ACM Press,1991. removal of redundant expressions, Proceeding [16] Nielson F., Nielson H. R., Hankin C., of the ACM Conference on Programming Principles of Program Analysis. Springer- Language Design and Implementation, 1-14, Verlag 1999. ACM Press 1998. [17] Rosen B. K., Wegman M. N., Zadeck F. K., [5] Bulyonkov M. A., Kochetov D. V., Practical Global value numbers and redundant aspects of specialization of Algol-like computations. Conference Record of the 15th programs, Danvy O., Glück R., Thiemann P. ACM Symposium on Principles of (eds.), Partial Evaluation. Proceedings. LNCS, Programming Languages, 12-27, ACM Press, Vol. 1110, 17-32, Springer-Verlag 1996. 1988. [6] Cytron R., Ferrante J., Efficiently computing [18] Song L. T., Futamura Y., Quasi-invariant static single assignment form and the control variable and loop unfolding, The 15th National dependence graph, ACM TOPLAS, Vol. 13, Conference of Software Society of Japan, B6-2, No. 4, 451-490, October, 1991. 221-224, Sept. 1998. [7] Cytron R., Lowry A., Zadeck F. K., Code [19] Steffen B., Property oriented expansion, motion of control structures in high-level Symposium on Static Analysis, LNCS 1145, 22- languages, Conference Record of the 13th ACM 41, Springer-Verlag 1996. Symposium on Principle of Programming [20] Steffen B., Knoop J., Rüthing O., The value flow Languages, 70-85, ACM Press 1986. graph: a program representation for optimal [8] Floyd R. W., Algorithm 97: shortest path, program transformations, Jones N. D. (ed.) Comm. ACM 5:6, 345, 1962. ESOP’90, LNCS 432, 389-405, Springer-Verlag [9] Glenstrup A. J., Jones N. D., BTA algorithms to 1990. ensure termination of off-line partial evaluation, [21] Warshall S., A theorem on Boolean matrices, Bjørner D., Broy M., Pottosin I. V. (eds.), Journal of the ACM, 9(1): 11-12, January 1962. Perspectives of System Informatics. [22] Wegman M. N., Zadeck F. K., Constant propag- Proceedings. LNCS, Vol. 1181, 273-284, ation with conditional branches, ACM TOPLAS, Springer-Verlag 1996. Vol. 13, No. 2, 181-210, April, 1991. [10] Glenstrup A., Makholm H., Secher J. P., C- [23] Zima H., Chapman B., Supercompiler for Parallel Mix: specialization of C programs, Hatcliff J., and Vector Computers, Frontier, Series, ACM Mogensen T., Thiemann P. (eds.), Partial Press, 1990. Evaluation: Practice and Theory, LNCS, Vol.

1706, 108-153, Springer-Verlag 1999. Appendix 1 (The definitions of g1, g2, g3, g, f, t and T)

[11] Gupta R., Berson D. A., Fang J. Z., Path profile g1(c1)=c1, ¬ ∧ ∨ ∧ guided partial redundancy elimination using g2(x1,c2)=( x1 c2) (x1 c2), ¬ ∧ ∧ ∨ ∧¬ ∧ ∨ ∧ ∧ speculaton, IEEE International Conference on g3(x1,x2,c3)=( x1 x2 c3) (x1 x2 c3) (x1 x2 c3), ¬ ∨ ∨ ∧ ∨¬ ∨ ∧ ∨ ∨¬ Computer Languages, 230-239, IEEE Society g(x1,x2,x3)=( x1 x2 x3) (x1 x2 x3) (x1 x2 x3), Press 1998. f(x,y)=x∧y, [12] Hornof L., Noyé J., Accurate binding-time t=1, analysis for imperative languages: flow, context, T=108. and return sensitivity, Symposium on Partial Evaluation and Semantics-Based Program Appendix 2 (The algorithm for transformation)

Manipulation, 63-73, ACM Press 1997. Let us assume: there is a loop of the form while [ss1] [13] Jones N. D., Gomard C. K., Sestoft P., Partial e do ss2 endwhile, R indicates the store memorizing Evaluation and Automatic Program Generation, the new names of the variables defined in the loop, L Prentice Hall, 1993. indicates the set of all the LQIV variables, U [14] Metzger R., Stroud S., Interprocedural constant indicates the unfolding length of the loop. Then the propagation: an empirical study, ACM Letters algorithm of loop transformation is given by the on Programming Languages and Systems, 2(1- rules as follow:

10

: − ⇒ Loop (R)| E : constant constant (R)|− :e ⇒ e' − E ⇒ (R)| SS1 :ss2 ss2' (L,R)|− : ss ⇒ ss '' SS 2 2 2 − ⇒ − ⇒ (R) | E : v R(v) (U)| Loop : while [ss1 ] e do ss2 endwhile counter: = 0; − ⇒ while e' ∧ counter < UL do (R)| E :ei ei' = + ss2'; count: counter 1 ; (R)|− : p(e ,...,e ) ⇒ p(e ',...,e ') endwhile E 1 n 1 n while e' do ss2'' endwhile;

Statements: (R)|− :s ⇒ s' −S1 ⇒ (R)| SS1 :ss ss' − ⇒ (R)| SS1 : s; ss s';ss'

(L,R)|− :s ⇒ s' −S 2 ⇒ (L,R)| SS2 :ss ss' − ⇒ (L,R)| SS2 : s; ss s';ss'

− = φ ⇒ (R)| S1 : v: (...) skip

− ⇒ (R)| E :e e' − = ⇒ = (R)| S1 : v: e R(v): e'

− = φ ⇒ (L,R)| S 2 : v: (...) skip

∈ v L − = ⇒ (L,R)| S 2 : v: e skip

∉ − ⇒ v L (R)| E :e e' − = ⇒ = (L,R)| S 2 : v: e R(v): e'

Conditional: − ⇒ (R)| E :e e' (R)|− :ss ⇒ ss ' −SS1 1 ⇒ 1 (R)| SS1 :ss2 ss2' − ⇒ (R)| S1 : if e then ss1 else ss2 endif if e' then ss1'else ss2' endif

− ⇒ (R)| E :e e' (L,R)|− :ss ⇒ ss ' −SS1 1 ⇒ 1 (L,R)| SS 2 :ss2 ss2' − ⇒ (L,R)| S1 : if e then ss1 else ss2 endif if e' then ss1'else ss2' endif

Expression:

11