Compiler Optimization Verification and Maintenance
Total Page:16
File Type:pdf, Size:1020Kb
Compiler Optimization Verification and Maintenance Abstract 1. checking whether each optimization phase produces a legal intermediate representation. Thus it will not af- Due to the complexity of a compiler, it is difficult for com- fect to other phases and, it will produce the correct out- piler developers to prove the correctness of output of in- put, termediate representations or profiling information manu- 2. checking whether annotations between phases are cor- ally after applying optimizations. Moreover, it is important rect and produce the correct profiling information for to choose good test cases to test the optimization code’s developers, and changes, however, it is not simple to build separated test cases to cover every possible optimizations. This proposal 3. generating a small but effective test case from real ap- proposes automatic techniques to prove the correctness of plications to test optimization that developers have just an IR, the correctness of its annotations for correct pro- applied. filing information, and to generate test cases according to the kinds of optimizations. Moreover, the test case size can 2. Background be reduced by using bug isolation techniques. We focus our techniques on loop nest optimizations in the GNU C com- This section explains the IR, the optimization phase, and piler with SPEC-CPU2006 benchmark test suites. the bug isolation technique targeted by this proposal. 2.1. Intermediate Representation 1. Introduction An IR(Intermediate Representation) is an internal form to represent the code being analyzed and translated by a Applying the proper compiler optimizations to improve compiler [13]. In most cases, optimizing a program means application’s performance is as important as writing good that we optimize and transform the IR of a code instead of code. For compiler users, the most important reason to use the source code itself. Thus, it is important to check the cor- compiler optimizations is to improve performance under rectness of the IR during such optimizations. Different com- certain metrics. Thus, a compiler writer’s key concern is pilers use different formats of IR such as graphical IR, lin- to write efficient and reliable optimizations that produce a ear IR, or hybrid IR using both graphical and textual infor- more efficient program without producing incorrect results. mation. Particularly, compiler writers should insure that each phase The SSA(Static Single Assignment) form is one of the which has new optimizations that produce legal intermedi- hybrid IRs [13]. In this form, we can have only one single ate representations and it does not affect other phases or the definition for the each variable uses. This simplifies analy- correctness of the output of compilation. However, a com- sis for the compiler and eventually it helps compiler writ- piler is one of the most complex software frameworks, thus ers to apply optimizations. Nowadays, several open source it is neither easy nor simple to prove the correctness of the compilers are using SSA form as an IR, such as, the SGI compiler manually. Alongside the intermediate representa- compiler [3], the JikesRVM [2], the GCC compilers version tions between two phases of the compiler, there are some 4.0 and above [1], and so on. annotations which are visible to each phase, but invisible to compiler writers. Moreover, it is not easy to choose proper 2.2. Loop Nest Optimization Phase test cases reflecting the characteristics of each optimization to help compiler developers to insure the correctness of their Loop Nest Optimization (LNO) performs transforma- changes effectively and accurately. tions on a loop nest for optimizations [5]. This phase does The major goal of this proposal is to build an optimiza- not build any control flow graph and optimizations are tion development maintenance tool that helps compiler de- driven by data dependency analysis. LNO includes loop un- velopers by: rolling, loop fusion, loop fision, loop interchange, and so on. Compiler Verification for(i=0; i<5; i++) for(j=0; j<10; j++) for(j=0; j<10; j++) for(i=0; i<5; i++) a[i][j] = i+j; Loop Interchange a[i][j] = i+j; Dynamic Approaches Static Approaches Translation Compiler Translation Validation Instrumentation Validation Figure 1. Example of Loop Interchange Safety Correctness Correctness & Safety Correctness for (i=0; i<100; i++) for (i=0; i<100; i+=2) Soundness { { Test Case Symbolic stmt1; stmt1; stmt2; Runtime Safety Generation Debugging stmt2; Loop unrolling stmt1; stmt2; Testing } with factor = 2 } Test Case Symbolic Generation Simulation Execution Optimization Simulation Optimization Runtime Runtime Analysis Analysis Testing Testing Figure 2. Example of Loop Unrolling with Fac- tor 2 Figure 3. Diagram of the State of the Art in For example, it is good to exchange the outer loop and the Compiler Verification and Validation inner loop like Figure 1 by using loop interchange for bet- ter locality. Another example is to use unroll loops to make loop bodies bigger to improve the performance like Figure Translation validation is one of methods that is used to 2. This proposal only focuses on this phase to prove the cor- test the correctness of a compiler both in static and dynamic rectness of the output IR and annotations. methods, but more frequently in dynamic approaches. It does not require any compiler modification. Without com- 2.3. Automated Bug Finder piler instrumentation, independent tools are used to analyze the compiler’s results and check the correctness of results. This technique is introduced in LLVM (Low Level Vir- There are two projects related to translation validation - the tual Machine) used to reduce test case size by throwing out verifix projects and the TVOC projects. The Verifix projects portions of a program which are checked to be correct. This have proposed techniques and formalisms for compiler re- will locate the portions of the program that might cause the sult checkers, decomposition of compilers and notions of crash and simplifies the debugging process[4]. In a similar semantical equivalence of source and target programs [10]. way, since we only focus on loop optimization in this pro- The TVOC projects are built on translation validation. It posal, we find the proper loop body affected by such opti- proves the correctness of compiler components instead of mizations and generate test cases. the entire compiler. The input of their tool is an intermedi- ate representation in which several optimizations have been 3. Related Work applied and the output shows whether applied optimizations are valid or not. Figure 3 shows the diagram of the state of the art in com- piler verification and validation. There are two approaches - 3.1. Static Approaches static and dynamic approaches. Static approaches do not in- volve any runtime information. In the beginning of compiler Static approaches have been used to prove the safety, verification research, many people used static approaches, correctness, and both. Xu et al. [49] determines statically only focused on specific features of the code and proved the whether it is safe for untrusted machine code to be loaded correctness manually. For example, McCarthy et al. [37] is into a host system. Denney et al. [16] demonstrated a frame- considered to be the first work in compiler verification and work to prove the safety property and the correctness of pro- they used static approaches that only focused on arithmetic grams statically. expression. There are several approaches to prove correctness of pro- Dynamic approaches involve runtime information from grams statically using translation validation. Benton [8] in- either the compiler or external tools. These approaches have terpreted program properties as relations, rather than pred- been developed by modifying a compiler infrastructure or icates. They used static analyses for imperative programs using translation validation. and they expressed and proved the correctness of optimiz- ing transformation using elementary logical and denotation execution for the GNU C compiler. They manu- techniques. Cousot et al. [14] describe how they prove the ally compared the intermediate representation of correctness by using source-to-source transformation with programs between each pass and prove the cor- abstract interpretation. rectness of semantics. This work is done manually, Some people used symbolic debugging information to thus it has the limitation to apply to bigger applica- prove the correctness of a program statically. McNerney tions. We also focus on the correctness of intermediate [38] implements a specialized program equivalence prover representation, but our approach uses a different ap- using abstract interpretation within functional blocks in- proach to prove the correctness. Instead of focusing stead of tree comparison to validate the register alloca- on rules, we focus on real instance of intermedi- tion of compiler. Hennessy [23] checked the correctness ate representation and prove the correctness of them of optimization by using symbolic debugging. Coutant et automatically. Our approach also proves the correct- al. [15] developed DOC, a prototype solution for Debug- ness of annotations for the correct profiling informa- ging Optimized Code. DOC is a modified C compiler and tion which helps compiler developers in the debugging source level symbolic debugger. Brooks et al. [11] intro- process. duced new compiler-debugger interfaces to provide visual There are several approaches to prove com- feedback and debugging of optimized code. piler correctness by using semantic equivalence [24][50], different approaches in translation valida- 3.2. Dynamic Approaches tion [19][20][22], prove carrying code techniques [40], mechanical verification [28][17], stage compi- Most research for compiler validation and verification lation frameworks [43], and certificate proof or cer- are dynamic approaches. We can classify these into two tifying compilation [21][33][10]. Some approaches main approaches - one is using compiler instrumentation, used formal proof of the correctness of specific com- another is using translation validation without any compiler piler optimizations or issue specific compiler prob- modification.