Verification of High-Level Transformations With
Total Page:16
File Type:pdf, Size:1020Kb
Verification of High-Level Transformations with Inductive Refinement Types Ahmad Salim Al-Sibahi Thomas P. Jensen IT University of Copenhagen Inria Rennes University of Copenhagen France Skanned.com [email protected] Denmark ahmad@{di.ku.dk,skanned.com} Aleksandar S. Dimovski Andrzej Wąsowski IT University of Copenhagen IT University of Copenhagen Denmark Denmark Mother Teresa University, Skopje [email protected] Macedonia [email protected] Abstract 17th ACM SIGPLAN International Conference on Generative Program- High-level transformation languages like Rascal include ex- ming: Concepts and Experiences (GPCE ’18), November 5–6, 2018, pressive features for manipulating large abstract syntax trees: Boston, MA, USA. ACM, New York, NY, USA, 19 pages. hps://doi.org/10.1145/3278122.3278125 first-class traversals, expressive pattern matching, backtrack- ing and generalized iterators. We present the design and 1 Introduction implementation of an abstract interpretation tool, Rabit, for verifying inductive type and shape properties for transfor- Transformations play a central role in software development. mations written in such languages. We describe how to per- They are used, amongst others, for desugaring, model trans- form abstract interpretation based on operational semantics, formations, refactoring, and code generation. The artifacts specifically focusing on the challenges arising when analyz- involved in transformations—e.g., structured data, domain- ing the expressive traversals and pattern matching. Finally, specific models, and code—often have large abstract syn- we evaluate Rabit on a series of transformations (normal- tax, spanning hundreds of syntactic elements, and a corre- ization, desugaring, refactoring, code generators, type in- spondingly rich semantics. Thus, writing transformations ference, etc.) showing that we can effectively verify stated is a tedious and error-prone process. Specialized languages properties. and frameworks with high-level features have been devel- oped to address this challenge of writing and maintaining CCS Concepts • Theory of computation → Program transformations. These languages include Rascal [31], Strat- verification; Program analysis; Abstraction; Functional ego/XT[11],TXL[15],Uniplate [34] forHaskell,and Kiama [46] constructs; Program schemes; Operational semantics; Control for Scala. For example, Rascal combines a functional core primitives; • Software and its engineering → Translator language supporting state and exceptions, with constructs arXiv:1809.06336v1 [cs.PL] 17 Sep 2018 writing systems and compiler generators; Semantics; for processing of large structures. Keywords transformation languages, abstract interpreta- tion, static analysis 1 PUBLIC SCRIPT FLATTENBLOCKS(SCRIPT S) { ACM Reference Format: 2 SOLVE(S) { Ahmad Salim Al-Sibahi, Thomas P. Jensen, Aleksandar S. Dimovski, 3 S = BOTTOM-UP VISIT(S) { and Andrzej Wąsowski. 2018. Verification of High-Level Transfor- 4 CASE STMTLIST: [ XS,BLOCK(YS), ZS] => mations with Inductive Refinement Types. In Proceedings of the * * 5 XS + YS + ZS GPCE ’18, November 5–6, 2018, Boston, MA, USA 6 } © 2018 Copyright held by the owner/author(s). Publication rights licensed 7 } to ACM. 8 RETURN S; This is the author’s version of the work. It is posted here for your personal 9 } use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 17th ACM SIGPLAN International Conference on Gen- erative Programming: Concepts and Experiences (GPCE ’18), November 5–6, Figure 1. Transformation in Rascal that flattens all nested 2018, Boston, MA, USA, hps://doi.org/10.1145/3278122.3278125. blocks in a statement GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski Figure 1 shows an example Rascal transformation program 1 DATA NAT = ZERO() | SUC(NAT PRED); 1 taken from a PHP analyzer. This transformation program 2 DATA EXPR = VAR(STR NM) | CST(NAT VL) recursively flattens all blocks in a list of statements. The pro- 3 | MULT(EXPR EL, EXPR ER); gram uses the following core Rascal features: 4 • A visitor (VISIT) to traverse and rewrite all statement 5 EXPR SIMPLIFY(EXPR EXPR) = lists containing a block to a flat list of statements. Visi- 6 BOTTOM-UP VISIT (EXPR) { tors support various strategies, like the BOTTOM-UP strat- 7 CASE MULT(CST(ZERO()), Y) => CST(ZERO()) egy that traverses the abstract syntax tree starting from 8 CASE MULT(X, CST(ZERO())) => CST(ZERO()) leaves toward the root. 9 }; • An expressive pattern matching language is used to non-deterministically find blocks inside a list of state- Figure 2. The running example: eliminating multiplications ments. The starred variable patterns *XS and *ZS match by zero from expressions arbitrary number of elements in the list, respectively before and after the BLOCK(YS) element. Rascal sup- ports non-linear matching, negative matching and spec- 4. Schmidt-style abstract operational semantics [43] for ifying patterns that match deeply nested values. a significant subset of Rascal adapting the idea of trace • The solve-loop (SOLVE) performing the rewrite until a memoization to support arbitrary recursive calls with fixed point is reached (the value of s stops changing). input from infinite domains. To rule out errors in transformations, we propose a static Together, these contributions show feasibility of applying analysis for enforcing type and shape properties, so that tar- abstract interpretation for constructing analyses for expres- get transformations produce output adhering to particular sive transformation languages and properties. shape constraints. For our PHP example, this would include: We proceed by presenting a running example in Sect. 2. We introduce the key constructs of Rascal in Sect. 3. Sec- • The transformation preserves the constructors used tion 4 describes the modular construction of abstract domains. in the input: does not add or remove new types of PHP Sections 5 to 8 describe abstract semantics. We evaluate the statements. analyzer on realistic transformations, reporting results in • The transformation produces flat statement lists, i.e., Sect. 9. Sections 10 and 11 discuss related papers and con- lists that do not recursively contain any block. clude. To ensure such properties, a verification technique must rea- son about shapes of inductive data—also inside collections 2 Motivation and Overview such as sets and maps—while still maintaining soundness Verifying types and state properties such as the ones stated and precision. It must also track other important aspects, for the program of Fig. 1 poses the following key challenges: like cardinality of collections, which interact with target lan- • The programs use heterogeneous inductive data types, guage operations including pattern matching and iteration. and contain collections such as lists, maps and sets, In this paper, we address the problem of verifying type and basic data such as integers and strings. This com- and shape properties for high-level transformations written plicates construction of the abstract domains, since in Rascal and similar languages. We show how to design and one shall model interaction between these different implement a static analysis based on abstract interpretation. types while maintaining precision. Concretely, our contributions are: • The traversal of syntax trees depends heavily on the 1. An abstract interpretation-based static analyzer—Rascal type and shape of input, on a complex program state, ABstract Interpretation Tool (Rabit)—that supports in- and involves unbounded recursion. This challenges the ferring types and inductive shapes for a large subset inference of approximate invariants in a procedure of Rascal. that both terminates and provides useful results. 2. An evaluation of Rabit on several program transfor- • Backtracking and exceptions in large programs intro- mations: refactoring, desugaring, normalization algo- duce the possibility of state-dependent non-local jumps. rithm, code generator, and language implementation This makes it difficult to statically calculate the con- of an expression language. trol flow of target programs and have a compositional 3. A modular design for abstract shape domains, that denotational semantics, instead of an operational one. allows extending and replacing abstractions for con- Figure 2 presents a small pedagogical example using visitors. crete element types, e.g. extending the abstraction for The program performs expression simplification by travers- lists to include length in addition to shape of contents. ing a syntax tree bottom-up and reducing multiplications by constant zero. We now survey the analysis techniques con- 1hps://github.com/cwi-swat/php-analysis tributed in this paper, explaining them using this example. Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA mult (cst (Nat) , cst (Nat)) Inductive refinement types Rabit works by inferring an recursei inductive refinement type representing the shape of possi- recurse cst (Nat) ··· ble output of a transformation given the shape of its input. ii It does this by interpreting the simplification program ab- recurse Nat stractly, considering all possible paths the program can take fail zero iii partition for values satisfying the input shape (any expression of type partition zero suc (Nat) Expr in this case). The result of running Rabit on this case iv v is: recurse Nat ′ ′ vi partition successcst (Nat)≀ var (str)≀ mult (Expr , Expr ) partition ′ ′ fail cst (Nat)≀ var (str)≀ mult