Program Analysis for Compiler Validation ∗

Program Analysis for Compiler Validation ¤ Anna Zaks Amir Pnueli New York University New York University 251 Mercer Street 251 Mercer Street New York, New York New York, New York [email protected] [email protected] ABSTRACT (CoVaC) framework is a two-step solution to the program Translation Validation is an approach of ensuring compila- equivalence problem. First, one has to construct a compar- tion correctness in which each compiler run is followed by ison system that represents simultaneous execution of the a validation pass that proves that the target code produced source and target programs. Second, one has to check if by the compiler is a correct translation (implementation) the comparison system satis¯es a given speci¯cation. The of the source code. It has been previously shown that the general framework has been described in [15] along with the problem of translation validation can be reduced to check- algorithm for the comparison system construction. Unlike ing if a single system - the corss-product of the source and the other translation validation frameworks [16, 9, 12], Co- target, satis¯es a speci¯c property. In this paper, we show VaC does not rely on any compiler input (such as the com- how to adapt the existing program analysis techniques in the piler debugging information). In order to make the validator setting of translation validation. In addition, we present a of non-cooperative compilers feasible and e®ective, the set novel invariant generation algorithm which strengthens our of optimizations under consideration is limited to intrapro- analysis when the input programs contain dynamically allo- cedural optimizations in which each branch (or a loop) in cated data structures. Finally, we report on the prototype the target program corresponds to a branch (or a loop) in tool that applies the developed methodology to veri¯cation the source program. Many of the classical compiler opti- of the LLVM compiler. The tool handles many of the classi- mizations such as constant folding, reassociation, induction cal intraprocedural compiler optimizations such as constant variable optimizations, common subexpression elimination, folding, reassociation, common subexpression elimination, code motion, register allocation, instruction scheduling, and code motion, dead code elimination, and others. others fall into this category. However, as described in [15], the completeness of the Co- VaC framework as well as its e®ectiveness depends on the 1. INTRODUCTION methods for generation of the comparison system invariants. Optimizing compilers are quite large applications and are As one of the bene¯ts from following the CoVaC approach, bound to have bugs, some of which may alter the behavior we can choose any existing invariant generation technique of programs being compiled. In safety critical and high- developed for a single system and plug it into the compiler assurance software, where the e®ort of program correctness veri¯cation framework. In this paper, we describe what ex- veri¯cation is extensive, it is highly advisable to ensure that isting methods we found to be e®ective and how they can the transformations performed by a compiler preserve the be used in the CoVaC setting. We also present a novel semantics of a program. That is precisely the goal of Trans- technique for generating the invariants required for check- lation Validation (TV) [10] - it ensures that compiler trans- ing equivalence of dynamically allocated data structures, as formations preserve program semantics. In essence, instead there were no existing suitable method. Finally, we report of attempting veri¯cation of a given compiler, each com- on the experimental results which have been obtained by piler run is followed by a validation pass that automati- applying the CoVaC tool to veri¯cation of optimizing trans- cally checks if the target code, produced by the compiler, formations performed by LLVM 1.9 [7, 2] - a very aggressive is semantically equivalent to the source code. The Com- open-source compiler. piler Veri¯cation by Program Analysis of the Cross-Product The rest of the paper is organized as follows. Section 2 gives an overview of the CoVaC framework. Section 3 fo- ¤This research has been supported in part by a grant from the Microsoft Phoenix Academic Program and the NSF cuses on the main contributions of this paper. It shows CSR{EHS grant CNS-0720581. how the existing program analysis techniques can be ap- plied in the CoVaC framework. It also presents the novel approach to generating invariants required for support of optimizations that involve dynamically allocated data struc- Permission to make digital or hard copies of all or part of this work for tures. Finally, Section 4 presents the experimental results. personal or classroom use is granted without fee provided that copies are We discuss the related work in Section 5. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 2. THE COVAC FRAMEWORK permission and/or a fee. PASTE’08, Atlanta, Georgia USA Here we briefly describe the CoVaC equivalence checking Copyright 2008 ACM 978-1-60558-382-2/08/11 ...5.00. framework, which is formally presented in [15]. 1 2.1 Transition Graphs The edge labels should be either exactly the same as the Our model is similar to that presented in [11] for veri¯- corresponding labels of the input systems or, alternatively, cation of procedural programs. A program (application) A an assignment in one of the systems may be coupled with consists of m + 1 procedures: MAIN, P1,..., Pm, where an ²-transition (a skip) in the other. The later signi¯es the MAIN represents the main procedure, and P1,..., Pm are lack of progress in one of the systems. In addition to the procedures which may be called from MAIN or from other structural requirements, no computation of C may contain procedures. We use Pi(~x; &~z) to denote the signature of a an in¯nite sequence of source (or target) ²-transitions; thus, procedure. Here, call-by-value parameter passing method is every computation of the comparison graph has the corre- used for ~x, and call-by-reference is used for ~z. A procedure sponding computations in both source and target. In the may return a result by means of ~z variables. We use ~y to other direction, each source and target computation must denote the typed variables of a module. ~y = (~x; ~z; ~w), i.e. be represented in C. the variables in ~y are partitioned into ~x, ~z, and ~w, where An example of a comparison graph is presented in Fig. 1. ~x and ~z are the input parameters and ~w denotes the local We use capital letters to denote the variables of the source variables of the module. and their lower case counterparts for the target. First, the Each procedure is presented as a transition graph. Its source procedure increments Y by 25. Second, both the nodes are connected by directed edges labeled by instruc- source and target read a number from an I/O device. Third, tions. There are four types of instructions: guarded assign- the target catches up with the source by incrementing y by ments, procedure calls, and read/write operations. Consider 25. Finally, both systems print out the products Y ¤ X and a procedure Pi(~x;&~z) with ~y = (~x;~z; ~w). Let ~u include y ¤ x. variables from ~y; and E(~y) be a list of expressions over ~y. Y := Y + 12 + 13; read(X); ²; write(Y ¤ X); ² read(x) y := y + 25 write(y ¤ x) ² A guarded assignment is an instruction of the form 0,0 1,0 2,1 2,2 3,3 c ! [~u := E(~y)], where guard c is a boolean expression. When the assignment part is empty, we abbreviate the Figure 1: A comparison transition graph for label to a pure condition c?. C(&(Y; y)) = S(&Y ) £ T (&y). ² Read and write instructions are denoted by read(~u) and write(~u). They are used to express the interac- A comparison transition graph C is called a witness of tion of the procedure with the outside world; e.g. I/O correct translation if there exists a set of program invariants instructions. f 'l : l 2 nodes of C g such that the following holds. ² Procedure call instruction Pk(E(~y); ~u) denotes a call to ² For every edge e from node n to node m labeled by S T the procedure Pk(~xf ;&~zf ), passing input parameters (write(~u ); write(~u )), S T E(~y) by value and ~u by reference. 'n ! (~u = ~u ). Transition graphs can be used to model programs in pro- ² If n is the exit node of the comparison transition graph S S T T cedural languages. In order to construct a formal model S(in : ~x ;&~z ) £ T (in : ~x ;&~z ), we check if the of a program, we ¯rst choose a set of program cut points values of the variables passed by reference are equal: S T ¨ such that at least one location in each branch (or loop) 'n ! (~z = ~z ). belongs to ¨ and the locations right before and after each read/write and call instruction belong to ¨. Each procedure It has been shown in [15] that in order to check if T is a (or function) whose implementation is given is represented correct translation of S it is su±cient to: by a transition graph. We choose the set ¨ of a procedure 1. construct a comparison graph C = S £ T ; Pi to be the set of nodes for the corresponding transition graph. For every pair of locations n; m in ¨, if there exists 2.

Load more