Iterative-Free Program Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Iterative-Free Program Analysis Mizuhito Ogawa†∗ Zhenjiang Hu‡∗ Isao Sasano† [email protected] [email protected] [email protected] †Japan Advanced Institute of Science and Technology, ‡The University of Tokyo, and ∗Japan Science and Technology Corporation, PRESTO Abstract 1 Introduction Program analysis is the heart of modern compilers. Most control Program analysis is the heart of modern compilers. Most control flow analyses are reduced to the problem of finding a fixed point flow analyses are reduced to the problem of finding a fixed point in a in a certain transition system, and such fixed point is commonly certain transition system. Ordinary method to compute a fixed point computed through an iterative procedure that repeats tracing until is an iterative procedure that repeats tracing until convergence. convergence. Our starting observation is that most programs (without spaghetti This paper proposes a new method to analyze programs through re- GOTO) have quite well-structured control flow graphs. This fact cursive graph traversals instead of iterative procedures, based on is formally characterized in terms of tree width of a graph [25]. the fact that most programs (without spaghetti GOTO) have well- Thorup showed that control flow graphs of GOTO-free C programs structured control flow graphs, graphs with bounded tree width. have tree width at most 6 [33], and recent empirical study shows Our main techniques are; an algebraic construction of a control that control flow graphs of most Java programs have tree width at flow graph, called SP Term, which enables control flow analysis most 3 (though in general it can be arbitrary large) [16]. to be defined in a natural recursive form, and the Optimization The- Once a graph has bounded tree width, we can construct a graph in orem, which enables us to compute optimal solution by dynamic an algebraic way [3, 4]. This suggests that finding a fixed point programming. would be computed by recursive traversals on the algebraic struc- We illustrate our method with two examples; dead code detection ture, and the optimal solution would be obtained with a dynamic and register allocation. Different from the traditional standard it- programming. erative solution, our dead code detection is described as a simple Unfortunately, the existing results are not sufficient for our purpose. combination of bottom-up and top-down traversals on SP Term. For instance, the algebraic construction of graphs with bounded tree Register allocation is more interesting, as it further requires opti- width treats only undirected graphs [3]. This problem can be eas- mality of the result. We show how the Optimization Theorem on ily coped with, but a more serious problem is that it has too many SP Terms works to find an optimal register allocation as a certain recursive constructors, k(k + 1)(k + 2)/6 for tree width k, which dynamic programming. makes it hard to write recursive definitions over it. This paper proposes a new algebraic construction SP Term of graphs Categories and Subject Descriptors with bounded tree width, and a new method to analyze programs D.1.1 [Applicative (Functional) Programming]: Functional Pro- through recursive graph traversals instead of iterative procedures, gramming; D.1.2 [Automatic Programming]: Program Genera- based on the fact that most programs (without spaghetti GOTO) tion; D.3.3 [Language Constructs and Features]: Programming have well-structured control flow graphs. with Graphs Our main theoretical result (Theorem 2) is that a (directed) graph G can be represented by an SP Term in SPk if and only if G has tree General Terms width at most k (and has at least k nodes). Note that SP Term con- struction reduces the number of recursive constructors to 2 (regard- Algorithms, Language less of the size of tree width k), at the cost of increase of k2 − k + 1 constants. These constants express either diedges from the i-th spe- Keywords cial node (called terminal) to the j-th, or a graph with no edges, and Program Analysis, Control Flow Graph, Register Allocation, Tree they can be treated in a uniform way. This makes writing recursive Width, SP Term, Dynamic Programming, Catamorphism. definitions on SP terms feasible. We illustrate our methodology with two examples: dead code detec- tion and register allocation. Different from the traditional standard iterative solution, our dead code detection is described as a simple combination of bottom-up and top-down traversals on an SP Term. Register allocation is more interesting, as it further requires opti- mality of the result. We solve it as an instance of maximum marking Permission to make digital or hard copies of all or part of this work for personal or problems [26, 27, 5]; mark the nodes of a control flow graph under classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation a certain condition such that the sum of weight of marked nodes is on the first page. To copy otherwise, to republish, to post on servers or to redistribute maximum (or, minimum). We make use of Optimization Theorem to lists, requires prior specific permission and/or a fee. from our previous work [26, 27], and show how it works to find an ICFP’03, August 25–29, 2003, Uppsala, Sweden. optimal register allocation as a certain dynamic programming on Copyright 2003 ACM 1-58113-756-7/03/0008 ...$5.00 1 1 1 SP Terms. l1 l1 l1 ( +,( , )) ( −,( , )) 6 The rest of the paper is organized as follows. We start by an Constants e l1 l2 ? e l1 l2 2 l2 l2 l2 overview of our basic idea in Section 2, through an example of 2 2 2 dead code detection on a simple flowchart program without GOTO. Its control flow graph has tree width at most 2, i.e., the class of q ) q ) series-parallel graphs. 6 6 S 6 6 Series composition 2 1 l l ~ 2 ~ 2 Section 3 presents an optimal register allocation with the fixed num- 1 2 1 1 l1 l2 l1 l2 ber of registers for a flowchart program. The core of our technique G1 G2 S2 G1 G2 is Optimization Theorem [26, 27], which automatically gives an ef- ficient solution for maximum marking problems by certain generic dynamic programming. The advantage and the problem of our l 1 l 1 match(l1,l ) 1 1 P 1 method are also briefly discussed. Parallel composition I I N ? 2 ( , ) Section 4 introduces the general definition of SP Term, and demon- l2 2 l2 match l2 l2 strates how to extend dead code detection to a program that has a G1 G2 P2 G1 G2 control flow graph with larger tree width. We show that once the Figure 1. Interpretation of e+, e−,2,S, and P. reachability description is given, the description of dead code de- tection will be uniformly extended to larger tree width. Section 5 discusses related work, and Section 6 concludes the pa- ( , ) 1 per. Throughout the paper, we consider only intra-procedural con- tuple l1 l2 of labels, defined as the following. trol flow analyses (0-CFA), and describe algorithms in Haskell-like + SP2 :=(e , (l1,l2)) notations. − | (e , (l1,l2)) | (2, (l1,l2)) | (SSP2 SP2, (l1,l2)) | (PSPSP , (l ,l )) 2 Dead Code Detection without Iteration 2 2 1 2 In this section, we explain our idea through a simple case study, An SP Term is interpreted as a pair of a 2-terminal series-parallel di- dead code detection of flowchart programs. This class of graphs graph and a tuple of 2-labels; a 2-terminal digraph is a digraph with corresponds to the control flow graphs of structured (in strict sense) a tuple of two nodes, called terminals. We can regard the first ter- programs, i.e., programs that consist of single-entry and single-exit minal as the single-entry, and the second terminal as the next node blocks. of the single-exit. Labels (l1,l2) are the identifiers of terminals. Let ( , ) The syntax of flowchart programs is described below. At the end of match l l be the function that returns the whole program, the end statement is assumed to be added. l if l = l or l = ∗ l if l = ∗ ⊥ Prog := x := e assignment otherwise | input x input statement | (i.e., accept the special label ∗ as a wild card during matching). output x output statement + | The constant (e , (l1,l2)) is interpreted as a diedge from the first Prog; Prog sequence − | if e then Prog else Prog fi conditional statement terminal to the second terminal, (e , (l1,l2)) as a diedge from | while e do Prog od while loop the second to the first terminal, and 2 as two isolated terminals. ( ,( , )) ( ,( , )) The series composition S t1 l1 l2 t2 l1 l2 fuses the sec- ( , ) =⊥ ond terminal in t1 and the first terminal of t2 if match l2 l1 , Our key to the dead code detection without an iterative procedure is and regard the first terminal in t1 as the first and the sec- ond terminal in t2 as the second. The parallel composition the algebraic construction of control flow graphs, called SP Term. ( ,( , )) ( ,( , )) P t1 l1 l2 t2 l1 l2 fuses each first and second terminals in After translation from a flowchart program to an SP Term, we show ( , ), ( , ) =⊥ ( , ) t1 and t2 if match l1 l1 match l2 l2 , and label match l1 l1 how to compute the sets of used and newly defined variables in on the first terminal and match(l ,l ) on the second terminal.