Verification of High-Level Transformations with Inductive Refinement Types Ahmad Salim Al-Sibahi, Aleksandar Dimovski, Thomas Jensen, Andrzej Wasowski

To cite this version:

Ahmad Salim Al-Sibahi, Aleksandar Dimovski, Thomas Jensen, Andrzej Wasowski. Verification of High-Level Transformations with Inductive Refinement Types. GPCE 2018 - 17th International Con- ference on Generative Programming: Concepts & Experience, Nov 2018, Boston, United States. pp.147-160, ￿10.1145/3278122.3278125￿. ￿hal-01898058￿

HAL Id: hal-01898058 https://hal.inria.fr/hal-01898058 Submitted on 19 Oct 2018

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Verification of High-Level Transformations with Inductive Refinement Types

Ahmad Salim Al-Sibahi Thomas P. Jensen DIKU/Skanned.com/ITU INRIA Rennes Denmark France Aleksandar S. Dimovski Andrzej Wąsowski ITU/Mother Teresa University, Skopje ITU Denmark/Macedonia Denmark Abstract 1 Introduction High-level transformation languages like Rascal include ex- Transformations play a central role in software development. pressive features for manipulating large abstract syntax trees: They are used, amongst others, for desugaring, model trans- first-class traversals, expressive pattern matching, backtrack- formations, refactoring, and code generation. The artifacts ing and generalized iterators. We present the design and involved in transformations—e.g., structured data, domain- implementation of an abstract interpretation tool, Rabit, for specific models, and code—often have large abstract syn- verifying inductive type and shape properties for transfor- tax, spanning hundreds of syntactic elements, and a corre- mations written in such languages. We describe how to per- spondingly rich semantics. Thus, writing transformations form abstract interpretation based on operational semantics, is a tedious and error-prone process. Specialized languages specifically focusing on the challenges arising when analyz- and frameworks with high-level features have been devel- ing the expressive traversals and pattern matching. Finally, oped to address this challenge of writing and maintain- we evaluate Rabit on a series of transformations (normaliza- ing transformations. These languages include Rascal [29], tion, desugaring, refactoring, code generators, type inference, Stratego/XT [12], TXL [16], Uniplate [32] for Haskell, and etc.) showing that we can effectively verify stated properties. Kiama [44] for Scala. For example, Rascal combines a func- tional core language supporting state and exceptions, with CCS Concepts • Theory of computation → Program constructs for processing of large structures. verification; Program analysis; Abstraction; Functional constructs; Program schemes; Operational semantics; Control primitives; • Software and its engineering → Translator 1 public Script flattenBlocks(Script s) { writing systems and compiler generators; Semantics; 2 solve(s) { 3 s = bottom-up visit(s) { Keywords transformation languages, abstract interpreta- 4 case stmtList: [ xs,block(ys), zs] => tion, static analysis * * 5 xs + ys + zs ACM Reference Format: 6 } Ahmad Salim Al-Sibahi, Thomas P. Jensen, Aleksandar S. Dimovski, 7 } and Andrzej Wąsowski. 2018. Verification of High-Level Trans- 8 return s; Proceedings of formations with Inductive Refinement Types. In 9 } the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE ’18), November 5– 6, 2018, Boston, MA, USA. ACM, New York, NY, USA, 14 pages. Figure 1. Transformation in Rascal that flattens all nested https://doi.org/10.1145/3278122.3278125 blocks in a statement

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies Figure1 shows an example Rascal transformation program 1 are not made or distributed for profit or commercial advantage and that taken from a PHP analyzer. This transformation program copies bear this notice and the full citation on the first page. Copyrights recursively flattens all blocks in a list of statements. The for components of this work owned by others than the author(s) must program uses the following core Rascal features: be honored. Abstracting with credit is permitted. To copy otherwise, or • republish, to post on servers or to redistribute to lists, requires prior specific A visitor (visit) to traverse and rewrite all statement permission and/or a fee. Request permissions from [email protected]. lists containing a block to a flat list of statements. Vis- GPCE ’18, November 5–6, 2018, Boston, MA, USA itors support various strategies, like the bottom-up © 2018 Copyright held by the owner/author(s). Publication rights licensed strategy that traverses the abstract syntax tree starting to ACM. from leaves toward the root. ACM ISBN 978-1-4503-6045-6/18/11...$15.00 https://doi.org/10.1145/3278122.3278125 1https://github.com/cwi-swat/php-analysis

147 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

• An expressive pattern matching language is used to non- 1 data Nat = zero() | suc(Nat pred); deterministically find blocks inside a list of statements. 2 data Expr = var(str nm) | cst(Nat vl) The starred variable patterns *xs and *zs match arbi- 3 | mult(Expr el, Expr er); trary number of elements in the list, respectively be- 4 fore and after the block(ys) element. Rascal supports 5 Expr simplify(Expr expr) = non-linear matching, negative matching and specify- 6 bottom-up visit (expr) { ing patterns that match deeply nested values. 7 case mult(cst(zero()), y) => cst(zero()) • The solve-loop (solve) performing the rewrite until a 8 case mult(x, cst(zero())) => cst(zero()) fixed point is reached (the value of s stops changing). 9 };

To rule out errors in transformations, we propose a static Figure 2. The running example: eliminating multiplications analysis for enforcing type and shape properties, so that by zero from expressions target transformations produce output adhering to particular shape constraints. For our PHP example, this would include: We proceed by presenting a running example in Sect.2. • The transformation preserves the constructors used in We introduce the key constructs of Rascal in Sect.3. Sec- the input: does not add or remove new types of PHP tion4 describes the modular construction of abstract do- statements. mains. Sections 5 to8 describe abstract semantics. We evalu- • The transformation produces flat statement lists, i.e., ate the analyzer on realistic transformations, reporting re- lists that do not recursively contain any block. sults in Sect.9. Sections 10 and 11 discuss related papers and conclude. To ensure such properties, a verification technique must rea- son about shapes of inductive data—also inside collections 2 Motivation and Overview such as sets and maps—while still maintaining soundness Verifying types and state properties such as the ones stated and precision. It must also track other important aspects, for the program of Fig.1 poses the following key challenges: like cardinality of collections, which interact with target lan- • guage operations including pattern matching and iteration. The programs use heterogeneous inductive data types, In this paper, we address the problem of verifying type and contain collections such as lists, maps and sets, and and shape properties for high-level transformations written basic data such as integers and strings. This compli- in Rascal and similar languages. We show how to design and cates construction of the abstract domains, since one implement a static analysis based on abstract interpretation. shall model interaction between these different types Concretely, our contributions are: while maintaining precision. • The traversal of syntax trees depends heavily on the type and shape of input, on a complex program state, 1. An abstract interpretation-based static analyzer—Rascal and involves unbounded recursion. This challenges the ABstract Interpretation Tool (Rabit)—that supports in- inference of approximate invariants in a procedure ferring types and inductive shapes for a large subset that both terminates and provides useful results. of Rascal. • Backtracking and exceptions in large programs intro- 2. An evaluation of Rabit on several program transfor- duce the possibility of state-dependent non-local jumps. mations: refactoring, desugaring, normalization - This makes it difficult to statically calculate the con- rithm, code generator, and language implementation trol flow of target programs and have a compositional of an expression language. denotational semantics, instead of an operational one. 3. A modular design for abstract shape domains, that al- lows extending and replacing abstractions for concrete Figure2 presents a pedagogical example using visitors. element types, e.g. extending the abstraction for lists The program performs expression simplification by travers- to include length in addition to shape of contents. ing a syntax tree bottom-up and reducing multiplications by 4. Schmidt-style abstract operational semantics [41] for a constant zero. We now survey the analysis techniques con- significant subset of Rascal adapting the idea of trace tributed in this paper, explaining them using this example. memoization to support arbitrary recursive calls with Inductive Refinement Types. Rabit works by inferring an input from infinite domains. inductive refinement type representing the shape of possible output of a transformation given the shape of its input. It does Together, these contributions show feasibility of applying ab- this by interpreting the simplification program abstractly, stract interpretation for constructing analyses for expressive considering all possible paths the program can take for values transformation languages and properties. satisfying the input shape (any expression of type Expr in

148 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

mult (cst (Nat) , cst (Nat)) this case). The result of running Rabit on this case is: recursei recurse success cst (Nat) ≀ var (str) ≀ mult (Expr′, Expr′) cst (Nat) ··· ′ ′ ii fail cst (Nat) ≀ var (str) ≀ mult (Expr , Expr ) recurse ′ ′ ′ Nat where Expr = cst (suc (Nat)) ≀ var (str) ≀ mult (Expr , Expr ). fail zero iii partition We briefly interpret how to read this type. The bar ≀ de- partition zero suc (Nat) notes a choice between alternative constructors. If the input iv v was rewritten during traversal (success, the first line) then recurse Nat the resulting syntax tree contains no multiplications by zero. vi partition partition All multiplications may only involve Expr′, which disallows . . the zero constant at the top level. Observe how the last al- . . ternative mult (Expr′, Expr′) contains only expressions of ′ type Expr , which in turn only allows multiplications by Figure 3. Naively abstractly interpreting the sim- constants constructed using suc (Nat) (that is ≥ 1). If the tra- plification example from Fig.2 with initial input versal failed to match (fail, the second line), then the input mult (cst (Nat) , cst (Nat)). The procedure does not did not contain any multiplication by zero to begin with and terminate because of infinite recursion on Nat. so does not the output, which has not been rewritten. The success and failure happen to be the same for our example, but this is not necessarily always the case. Keeping input, merging the new recursive node with the similar pre- separate result values allows retaining precision throughout vious one, thus creating a loop in the execution tree [39, 41]. the traversal, better reflecting concrete execution paths. We This loop is then resolved by a fixed-point iteration. now proceed discussing how Rabit can infer this shape using In Rabit, we propose partition-driven trace memoization, abstract interpretation. which works with potentially unbounded input like the in- Abstractly Interpreting Traversals The core idea of ab- ductive type refinements that are supported by our abstrac- stractly executing a traversal is similar to concrete execu- tion. We detect cycles by maintaining a memoization map tion: we recursively traverse the input structure and rewrite which for each type—used for partitioning—stores the last the values that match target patterns. However, because traversed value (input) and the last result produced for this of abstraction we must make sure to take into account all value (output). This memoization map is initialized to map applicable paths. Figure3 shows the execution tree of the all types to the bottom element (⊥) for both input and output. traversal on the simplification example (Fig.2) when it starts The evaluation is modified to use the memoization map, so with shape mult (cst (Nat) , cst (Nat)). Since there is only one it checks on each iteration the input i against the map: constructor, it will initially recurse down to traverse the con- • If the last processed refinement type representing the tained values (children) creating a new recursion node (yel- input i ′ is greater than the current input (i ′ ⊒ i), then low, light shaded) in the figure (ii) containing the left child it uses the corresponding output; i.e., we found a hit cst (Nat), and then recurse again to create a node (iii) con- in the memoization map. taining Nat. Observe here that Nat is an abstract type with • Otherwise, it will merge the last processed and cur- two possible constructors (zero, suc (·)), and it is unknown at rent input refinement types to a new value i ′′ = i ′∇i, time of abstract interpretation, which of these constructors update the memoization map and continue execution we have. When Rabit hits a type or a choice between alter- with i ′′. The operation ∇ is called a widening; it en- native constructors, it explores each alternative separately sures that the result is an upper bound of its inputs, creating new partition nodes (blue, darker). In our example i.e., i ′ ⊑ i ′′ ⊒ i and that the merging will eventually we partition the Nat type into its constructors zero (node terminate for the increasing chain of values. The mem- iv) and suc (Nat) (node v). The zero case now represents the oization map is updated to map the general type of i ′′ first case without children and we can run the visitor oper- (not refined, for instance Nat) to map to a pair (i ′′,o), ations on it. Since no pattern matches zero it will return a where the first component denotes the new input i ′′ fail zero result indicating that it has not been rewritten. For refinement type and the second component denotes the suc (Nat) case it will try to recurse down to Nat (node vi) the corresponding output o refinement type; initially, o which is equal to (node iii). Here, we observe a problem: if we is set to ⊥ and then changed to the result of executing continue our traversal algorithm as is, we will not terminate input i ′′ repeatedly until a fixed-point is reached. and get a result. To provide a terminating algorithm we will We demonstrate the trace memoization and fixed-point itera- resort to using trace memoization. tion procedures on Nat in Fig.4, beginning with the leftmost Partition-driven Trace Memoization The idea is to de- tree. The expected result is fail Nat, meaning that no pat- tect the paths where execution recursively meets similar tern has matched, no rewrite has happened, and a value of

149 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski type Nat is returned, since the simplification program only this with lines 6-7 in Fig.2, to see that running the abstract introduces changes to values of type Expr. value against this pattern might actually produce success. We show the memoization map inside a framed orange In order to obtain precise result shapes, we refine the input box. The result of the widening is presented below the mem- values when they fail to match a pattern. Our abstract in- oization map. In all cases the widening in Fig.4 is trivial, as terpreter produces a refinement of the type, by running it it happens against ⊥. The final line in node 1 stores the value through the pattern matching, giving: o produced by the previous iteration of the traversal, to prev success cst (Nat) establish whether a fixed point has been reached (⊥ initially). fail mult (cst (suc (Nat)) , cst (suc (Nat))) Trace Partitioning We partition [37] the abstract value The result means, that if the pattern match succeeds then it Nat along its constructors: zero and suc (·) (Fig.4). This par- produces an expression of type cst (Nat). More interestingly, titioning is key to maintain precision during the abstract if the matching failed neither the left nor the right argument interpretation. As in Fig.3, the left branch fails immediately, of mult (·, ·) could have contained the constant zero—the since no pattern in Fig.2 matches zero. The right branch interpreter captured some aspect of the semantics of the pro- descends into a new recursion over Nat, with an updated gram by refining the input type. Naturally, from this point on memoization table. This run terminates, due to a hit in the the recursion and iteration continues, but we shall abandon memoization map, returning ⊥. After returning, the value of the example, and move on to formal developments. suc (Nat) should be reconstructed with the result of travers- ⊥ ing the child Nat, but since the result is there is no value 3 Formal Language to reconstruct with, so ⊥ is just propagated upwards. At the return to the last widening node, the values are joined, and The presented technique is meant to be general and applica- widen the previous iteration result oprev (the dotted arrow on ble to many high-level transformation languages. However, top). This process repeats in the second and third iterations, to keep the presentation concise, we focus on few key con- but now the reconstruction in node 3 succeeds: the child Nat structs from Rascal [29], relying on the concrete semantics is replaced by zero and fail suc (zero) is returned (dashed from Rascal Light [2]. arrow from 3 to 1). In the third iteration, we join and widen We consider algebraic data types (at) and finite sets (set⟨t⟩) t at the following components (cf. oprev and the dashed arrows of elements of type . Each algebraic data type has a set of incoming into node 1 in the rightmost column): unique constructors. Each constructor k(t) has a fixed set of typed parameters. The language includes sub-typing, with [zero ≀ suc (zero) ∇ (zero ⊔ suc (zero≀suc (zero)))] = Nat void and value as bottom and top types respectively. Here, the used widening operator [18] accelerates the con- t ∈ Type F void | set⟨t⟩ | at | value vergence by increasing the value to represent the entire type We consider the following subset of Rascal expressions: From Nat. It is easy to convince yourself, by following the same left to right we have: variable access, assignments, sequenc- recursion steps as in the figure, that the next iteration, us- ing, constructor expressions, set literal expressions, matching ing o = Nat will produce Nat again, arriving at a fixed prev failure expression, and bottom-up visitors: point. Observe, how consulting the memoization map, and widening the current value accordingly, allowed us to avoid e F x ∈ Var | x = e | e; e | k(e) | {e} | fail | visit e cs infinite recursion over unfoldings of Nat. cs F case p ⇒ e Nesting Fixed Point Iterations. When inductive shapes Visitors are a key construct in Rascal. A visitor visit e cs (e.g., Expr) refer to other inductive shapes (e.g., Nat), it is nec- traverses recursively the value obtained by evaluating e (any essary to run nested fixed-point iterations to solve recursion combination of simple values, data type values and collec- at each level. Figure5 returns to the more high-level fragment tions). During the traversal, case expression cs are applied of the traversal of Expr starting with mult (cst (Nat) , cst (Nat)) to the nodes, and the values matching target patterns are as in Fig.3. We follow the recursion tree along nodes 5, 6, 7, 8, rewritten. We will discuss a concrete subset of patterns p 9, 10, 9, 6 with the same rules as in Fig.4. In node 10 we run further in Sect.6. For brevity, we only discuss bottom-up a nested fixed point iteration on Nat, already discussed in visitors, but Rabit (Sect.9) supports all strategies of Rascal. Fig.4, so we just include the final result. Notation We write (x,y) ∈ f to denote the pair (x,y) such Type Refinement. The output of the first iteration in node 6 that x ∈ dom f and y = f (x). Abstract semantic components, is fail cst (Nat), which becomes the new oprev, and the second sets, and operations are marked with a hat: ab. A sequence of iteration begins (to the right). After the widening the input is e1,..., en is contracted using an underlining e. The empty partitioned into e (node 7) and cst (Nat)(node elided). When sequence is written by ε, and concatenation of sequences e1 the second iteration returns to node 7 we have the follow- and e2 is written e1, e2. Notation is lifted to sequences in an ing reconstructed value: mult (cst (Nat) , cst (Nat)). Contrast intuitive manner: for example given a sequence v, the value

150 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

output: output: fail ⊥∇(zero⊔⊥) = fail zero fail zero∇(zero⊔suc (zero)) = fail zero ≀ suc (zero) recurse ··· fail Nat input: Nat input: Nat input: Nat

Nat 7→⊥, ⊥ Nat 7→⊥, ⊥ ) Nat 7→⊥, ⊥ ⊥ ))

widen: ⊥∇Nat = Nat widen: ⊥∇Nat = Nat zero widen: ⊥∇Nat = Nat ( zero oprev = ⊥ oprev = fail zero oprev = fail zero ≀ suc (zero) ( 1 partition 1 1 suc

part. part. ≀ reconstruction: fail suc fail zero fail zero fail zero

part. part. zero

part. ( zero zero zero 2 suc (Nat) 2 suc (Nat) 2 suc (Nat) 3 no reconstruction: 3 hit:fail zero 3 recurse recurse ≀ suc (zero) recurse reconstruction: fail suc hit: ⊥ hit: fail zero input: Nat input: Nat input: Nat Nat 7→Nat, ⊥ Nat 7→Nat, fail zero Nat 7→Nat, fail zero≀suc (zero) 4 4 4

Figure 4. Three iterations of a fixed point computation for input Nat. Iterations are separated by dotted arrows on top

input: e = mult (cst (Nat) , cst (Nat)) Expr 7→⊥, ⊥ output: Nat 7→⊥, ⊥ fail ⊥∇(⊥ ⊔ cst (Nat)) = fail cst (Nat) widen: ⊥∇e = e oprev = ⊥ recurse 5 input: cst (Nat) Expr 7→e, ⊥ recurse Nat 7→⊥, ⊥ ··· input: cst (Nat) widen: e∇cst (Nat) = e ≀ cst (Nat) Expr 7→e, ⊥ oprev = fail cst (Nat) type Nat 7→⊥, ⊥ part. 6 refinement

no reconstr.: widen: e∇cst (Nat) = e ≀ cst (Nat) ··· part. ⊥ reconstruction: oprev = ) mult (cst (Nat) , cst (Nat)) partition 6 e Nat ( ) 7 part. Nat ⊥ hit: ( hit: fail cst reconstruction: fail cst e cst (Nat) fail cst recurse 7 9 recurse ⊥ (Nat fail Nat ) hit: recurse input: cst (Nat) input: cst (Nat) input: cst (Nat) input: 7→ ≀ ( ) ⊥ 7→ ≀ ( ) ( ) 7→ ≀ ( ) ( ) Expr e cst Nat , Nat Expr e cst Nat , fail cst Nat Expr e cst Nat , fail cst Nat Nat 7→⊥, ⊥ 10 Nat 7→⊥, ⊥ Nat 7→⊥, ⊥ 8 8 11

Figure 5. A prefix of the abstract interpreter run for e = mult (cst (Nat) , cst (Nat)). Fragments of two iterations involving node 6 are shown, separated by a dotted arrow.

vi denotes the ith element in the sequence, and v :t denotes recursively defined domains. The modular domain design the sequence v1 :t1,...,vn :tn. generalizes parameterized domains [17] to follow a design inspired by the modular construction of types and domains 4 Abstract Domains [8, 15, 42]. The idea is to define domains parametrically—i.e. Our abstract domains are designed for modular composition. in the form bF(bE)—so that abstract domains for subcompo- Modularity is key for transformation languages, which ma- nents are taken as parameters, and explicit recursion is han- nipulate a large variety of values. The design allows easily dled separately. We use standard domain combinators [50] to replacing abstract domains for particular types of values, combine individual domains into the abstract value domain. as well as adding support for new value types. We want to Set Shape Domain. Let Set(E) denote the domain of sets construct an abstract value domain vsb ∈ ValueShapeœ which consisting of elements taken from E. We define abstract finite captures inductive refinement types of form: sets using abstract elements {be}[l;u] from a parameterized r domain SetShapeœ (bE). The component from the parameter at = k1(vsb1) ≀ · · · ≀ kn(vsbn) domain (be ∈ bE) represents the abstraction of the shape of r [ ] ∈ where each value vsbi can possibly recursively refer to at . elements, and a non-negative interval component l;u Below, we define abstract domains for sets, data types and Intervalœ+ is used to abstract over the cardinality (so l,u ∈ R+

151 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski and l ≤ u). The abstract set element acts as a reduced product Recursive shapes We extend our abstract domains to cover between be and [l;u] and the lattice operations follow directly. recursive structures such as lists and trees. Given a type Given a concretization function for the abstract content expression F(X) with a variable X, we construct the ab- domain γ ∈ E → ℘ (E), we can define a concretization stract domain as the solution to the recursive equation X = bE b function for the abstract set shape domain to possible finite F(X) [42, 45, 50], obtained by iterating the induced map F sets of concrete elements γ ∈ SetShape(E) → ℘ (Set (E)): over the empty domain 0 and adjoining a new top element SSc œ b to the limit domain. The concretization function of the recur-  γ ({e}[ ]) = es es ⊆ γ (e) ∧ |es| ∈ γ ([l;u]) sive domain follows directly from the concretization function SSc b l;u bE b bI of the underlying functor domain. Example 4.1. Let Intervalœ be a domain of intervals of inte- Example 4.3. We can concretize abstract elements of the gers (a standard abstraction over integers). We can concretize refinement type from our running example: abstract elements from SetShapeœ (Intervalœ ) to a set of possi- 2 ble sets of integers from ℘ (Set (Z)) as follows:   z }| {  γ (Expre ) = cst(suc(suc(zero))), mult(2, 2), ({[ ]} ) {{ } { } { }} DSc γSS 42; 43 [1;2] = 42 , 43 , 42, 43   c  mult(mult(2, 2), 2),...    Data Shape Domain. Inductive refinement types are de- where Expre = cst(suc(suc(zero))) ≀ mult(Expre , Expre ) In fined as a generalization of refinement24 types[ , 40, 52] that particular, our abstract element represents the set of all mul- inductively constrain the possible constructors and the con- tiplications of the constant 2. tent in a data structure. We use a parameterized abstraction of data types DataShapeœ (bE), whose parameter bE abstracts Value Domains. We presented the required components over the shape of constructor arguments: for abstracting individual types, and now all that is left is putting everything together. We construct our value shape d∈DataShape(E) = {⊥ }∪{k (e )≀ ... ≀k (e ) | e ∈E}∪{⊤ } domain using choice and recursive domain equations: b œ b DSc 1 1 n n i b DSc We have the least element ⊥ and top element ⊤ elements— ValueShapeœ = DSc DSc respectively representing no data types value and all data SetShapeœ (ValueShapeœ ) ⊕ DataShapeœ (ValueShapeœ ) type values—and otherwise a non-empty choice between unique (all different) constructors of the same algebraic data Similarly, we have the corresponding concrete shape domain: type k1(e1) ≀ · · · ≀ kn(en) (shortened k(e)). We can treat the Value = Set (Value) ⊎ Data(Value) constructor choice as a finite map [k1 7→ e1,...,kn 7→ en], We then have a concretization functionγ ∈ ValueShape → and then directly define our lattice operations point-wise. VSc œ ℘ ( ) Given a concretization function for the concrete content Value , which follows directly from the previously defined domain γ ∈ E → ℘ (E), we can create a concretization concretization functions. bE b function for the data shape domain 4.1 Abstract State Domains ∈ ( ) → ( ( )) We now explain how to construct abstractions of states and γDS DataShapeœ bE ℘ Data E c results when executing Rascal programs. where Data(E) = k(v) a type at. k(v) ∈ at ∧ v ∈ E . Abstract Store Domain. Tracking assignments of variables The concretization is defined∃ as follows: J K is important since matching variable patterns depends on γ (⊥ ) = ∅ γ (⊤ ) = Data(E) the value being assigned in the store: DSc DSc DSc DSc n o σ ∈ Store = Var → {ff, tt} × ValueShape γ (k (e ) ≀ · · · ≀ k (e )) = k (v) i ∈ [1, n] ∧ v ∈ γ (e ) b š œ DSc 1 1 n n i bE i For a variable x we get bσ(x) = (b, vsb) where b is true if x Example 4.2. We can concretize abstract data elements might be unassigned, and false otherwise (when x is defi- DataShapeœ (Intervalœ ) to a set of possible concrete data values nitely assigned). The second component, vsb is a shape ap- ℘ (Data(Z)). Consider values from the algebraic data type: proximating a possible value of x. We lift the orderings and lattice operations point-wise data errorloc = repl() | linecol(int, int) from the value shape domain to abstract stores. We define the concretization function γ ∈ Storeš → ℘ (Store) as: We can concretize abstracting elements as follows: Storeš x,b, vs. σ(x) = (b, vs) ⇒  b b b   ∀  γ (repl() ≀ linecol([1; 1], [3; 4])) = γ (σ) = σ (¬b ⇒ x ∈ dom σ) DSc Storeš b   {repl(), linecol(1, 3), linecol(1, 4)}  ∧ (x ∈ dom σ ⇒ σ(x) ∈ γ (vs))  bV b 

152 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

′ value resvd and abstract result store bσ , i.e.: same syntax abstracts over sets of result values and stores Resc = [rest1 7→ (resvš1, bσ1),..., restn 7→ (resvšn, bσn)] e ; σ ===⇒ rest resv ; σ ′ e ; σ ====⇒ There is an important difference in how the concrete and expr b a-expr Resc abstract semantic rules are used. In a concrete operational abstracts input store semantics a language construct is usually evaluated as soon as the premises of a rule are satisfied. When evaluating ab- Figure 6. Relating concrete semantics (left) to abstract se- stractly, we must consider all applicable rules, to soundly mantics (right). over-approximate the possible concrete executions. To this end, we introduce a special notation to collect all derivations Abstract Result Domain. Traditionally, abstract control with the same input i into a single derivation with output O flow is handled using a collecting denotational semantics equal to the join of the individual outputs: with continuations, or by explicitly constructing a control Ä {|i ⇒ O|} ≜ O = {o|i ⇒ o} flow graph. These methods are non-trivial to apply forarich language like Rascal, especially considering backtracking, Let’s use the operational rules for variable accesses to illus- exceptions and data-dependent control flow introduced by trate the steps in Schmidt-style translation of operational visitors. A nice side-effect of Schmidt-style abstract interpre- rules. The concrete semantics contains two rules for vari- tation is that it allows abstracting control flow directly. able accesses, E-V-S for successful lookup, and E-V-Er for We model different type of results—successes, pattern producing errors when accessing unassigned variables: match failures, errors directly in a ResSet domain which x ∈ dom σ › E-V-S keeps track of possible results with each its own separate x; σ ===⇒ success σ(x); σ store. Keeping separate stores is important to maintain pre- expr cision around different paths: x < dom σ E-V-Er x; σ ===⇒ error; σ expr rest ∈ ResTypeœ F success | exres We follow three steps, to translate the concrete rules to ab- exres F fail | error resvd ∈ ResVal› F · | vsb stract operational rules: Resc ∈ ResSet› = ResTypeœ ⇀ ResVal› × Storeš 1. For each concrete rule, create an abstract rule that uses a judgment for evaluation of a syntactic form, The lattice operations are lifted directly from the target value e.g., AE-V-S and AE-V-Er for variables. domains and store domains. We define the concretization 2. Replace the concrete conditions and semantic oper- function γ ∈ ResultSet → ℘ (Result × Store): RSc œ ations with the equivalent abstract conditions and ( ) semantic operations for target abstract values, e.g. (rest, (resvd, bσ)) ∈ Resc ∧ γ (Resc ) = (rest resv, σ) x ∈ dom σ with σ(x) = (b,vs) and a check on b. RSc resv ∈ γ (resv) ∧ σ ∈ γ (σ) b b RVc d Storeš b We obtain two execution rules:

5 Abstract Semantics σ(x) = (b, vs) AE-V-S b b A distinguishing feature of Schmidt-style abstract interpreta- x; bσ =====⇒ [success 7→ (vsb, bσ)] tion is that the derivation of abstract operational rules from a-expr-v a given concrete operational semantics is systematic and to a σ(x) = (tt, vs) AE-V-ER b b large extent mechanisable [10, 41]. The creative work is thus x; bσ =====⇒ [error 7→ (·, bσ)] reduced to providing abstract definitions for conditions and a-expr-v semantic operations such as pattern matching, and defining Observe when b is true, both a success and failure may trace memoization strategies for non-structurally recursive occur, and we need rules to cover both cases. operational rules, to produce a terminating static analysis 3. Create a rule that collects all possible evaluations of the that approximates an infinite number of concrete traces. syntax-specific judgment rules, e.g. AE-V for variables: Figure6 relates the concrete evaluation judgment (left) to ′ {|x; σ =====⇒ Resc |} the abstract evaluation judgment (right) for Rascal expres- b a-expr-v AE-V sions. Both judgements evaluate the same expression e. The ′ x; σ ====⇒ Resc abstract evaluation judgment abstracts the initial concrete b a-expr store σ with an abstract store bσ. The result of the abstract The possible shapes of the result value depend on the pair evaluation is a finite result set Resc , abstracting over possibly assigned to x in the abstract store. If the value shape of x is ⊥, ′ infinitely many concrete result values rest resv and stores σ . we drop the success result from the result set. The following Resc maps each result type rest to a pair of abstract result examples illustrate the possible outcome result shapes:

153 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

Assigned Value Result Set Rules A value v matches a pattern p (v |= p) iff there exists a ρ σ(x) = (ff, ⊥ ) [] AE-V-S binding environment that maps the variables in the pattern b VSc to values in dom ρ = vars(p) so that v is accepted by the σ(x) = (ff, [1; 3]) [success 7→ ([1; 3], σ)] AE-V-S b b satisfiability semantics v |=ρ p as defined in Fig. 7a. σ(x) = (tt, ⊥ )[error 7→ (·, σ)] AE-V-S, AE-V-Er k(p) b VSc b Constructor patterns accept any well-typed value [success 7→ ([1; 3], bσ), k(v) of the same constructor whose subcomponents v match bσ(x) = (tt, [1; 3]) AE-V-S, AE-V-Er error 7→ (·, bσ)] the sub-patterns p consistently in the same binding environ- It is possible to translate the operational semantics rules for ment ρ. A variable x matches exactly the value it is bound other basic expressions using the presented steps [4, Appen- to in the binding environment ρ. A set pattern {⋆p} accepts dix B]). The core changes are the ones moving from checks any set of values {v} such that an associative-commutative of definiteness to checks of possibility. For example: arrangement of the sub-values v matches the sequence of • Checking that evaluation of e has succeeded, re- sub-patterns ⋆p under ρ. | ⋆ quires that the abstract semantics uses e; bσ ====⇒ Resc A value sequence v matches a pattern sequence ⋆p (v = ′ a-expr and (success, (vs, σ )) ∈ Resc , as compared to ⋆p) if there exists a binding environment ρ such that dom ρ = b b ′ e; σ ===⇒ success v; σ in the concrete semantics. vars(⋆p) and v |=⋆ ⋆p. An empty sequence of patterns ε ac- expr ρ • Typing is now done using abstract judgments cepts an empty sequence of values ε. A sequence starting ′ vs : t and t

154 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

k(v) |=ρ k(p) iff t are parameter types of k ′ ′ k(vsb) |b=ρ k(p) iff t are parameter types of k and v : t and t <: t b ′ ′ ⋆ ⋆ and vs : t and t <: t and vs |b=ρ p and v |=ρ p bb b b b v |= x iff ρ(x) = v vs |= x iff ρ(x) ⊑ vs ρ b bρb b b ⋆ ⋆ {v} |=ρ {⋆p} iff v |=ρ ⋆p {vs}[ ] |= {⋆p} iff vs, [l;u] |= ⋆p b l;u bρb b bρb ⋆ ⋆ ε |=ρ ε always vs, [0;u] |b=ρ ε always ′ ⋆ ′ ′ ⋆ ′ b b v,v |= p,⋆p iff v |=ρ p and v |= ⋆p ⋆ ρ ρ vs, [l;u] |= p,⋆p′ iff u > 0 and vs |= p ′ ′ ′ ′ b bρb b bρ | ⋆ ( ) { } | ⋆ b ⋆ ′ v,v =ρ ⋆x,⋆p iff ρ x = v and v =ρ ⋆p and vs, [l − 1;u − 1] |= p,⋆p b bρb (a) Concrete (v |= p reads: v matches p with ρ) ⋆ ′ ′ ρ vs, [l;u] |= ⋆ x,⋆p iff ρ(x) = {vs } ′ ′ b bρb b b [l ;u ] and l ′ ≤ l and u′ ≤ u and vs′ ⊑ vs ⋆ b b and vs, [l − u′;u − l ′] |= ⋆p′ b bρb (b) Abstract (vs |= p reads: vs may match p with ρ) b bρb b b b b Figure 7. Satisfiability semantics for pattern matching by shape–lengths pairs, which needs to be taken into ac- unary constructors suc (Nat) ≀ zero \ zero = suc (Nat) or at count by sequence matching rules. This is most visible in the the end points of a lattice [1; 10]\[1; 2] = [3; 10]. very last rule, with a star pattern ⋆x, where we accept any Similarly, for matching against a constructor pattern k(p), assignment to a set abstraction vsb which has a more precise the core idea is that we should be able to partition our value shape and a smaller length. space into two: the abstract values that match the constructor and those that do not. For those values that possibly match 6.2 Computing Pattern Matches k(p), we produce a refined value with k as the only choice, The declarative satisfiability semantics of patterns, albeit making sure that the sub-values in the result are refined by clean, is not directly computable. In Rabit, we rely on an the sub-patterns p. abstract operational semantics [4, Appendix A] using the Otherwise, we exclude k from the refined value. For a data technique presented in Sect.5. The interesting ideas are in type abstraction exclusion removes the pattern constructor the refining semantic operators that we now discuss. from the possible choices

Semantic Operators with Refinement. Since Rascal sup- excludeœ(k(vsb)≀k1(vsb1)≀ · · · ≀kn(vsbn),k) = k1(vsb1)≀ ... ≀kn(vsbn) ports non-linear matching, it becomes necessary to merge and does not change the input shape otherwise. environments computed when matching sub-patterns to check whether a match succeeds or not. In abstract inter- 7 Traversals pretation, we can refine the abstract environments when First-class traversals are a key feature of high-level transfor- merging for each possibility. Consider when merging two mation languages, since they enable effectively transforming abstract environments, where some variable x is assigned ′ ′ large abstract syntax trees. We will focus on the challenges to vs in one, and vs in the other. If vs is possibly equal b b b for bottom-up traversals, but they are shared amongst all to vs, we refine both values using this equality assumption b ′ strategies supported in Rascal. The core idea of a bottom-up vs = vs . Here, we have that abstract equality is defined b b b traversal of an abstract value vs, is to first traverse children of as the greatest lower bound if the value is non-bottom, i.e. b ′ ′′ ′′ ′ the value childrenœ (vs) possibly rewriting them, then recon- vs = vs ≜ {vs |vs = vs ⊓ vs , ⊥}. Similarly, we can also b b b b b b b b ′ struct a new value using the rewritten children and finally refine both values if they are possibly non-equal vs , vs . b b b traversing the reconstructed value. The main challenge is Here, abstract inequality is defined using relative comple- handling traversal of children, whose representation and ments: thus execution rules depend on the particular abstract value. ( ′′ ′)| ′′ \( ⊓ ′) ⊥ ∪ ′ vsb , vsb vsb = vsb vsb vsb , Concretely, the childrenœ (vs) function returns a set of pairs vs vs ′ b ′ b b, b ≜  ′′ ′′ ′ ′ (vs , cvs) where the first component vs is a refinement of vs (vsb, vsb )|vsb = vsb \(vsb ⊓ vsb ) , ⊥ b c b b that matches the shape of children cvsc in the second compo- In our abstract domains, the relative complement (\) is lim- nent. For data type values the representation of children is ited. We heuristically define it for interesting cases, and oth- a heterogeneous sequence of abstract values vs′′, while for b ′′ erwise it degrades to identity in the first argument (no refine- set values the representation of children is a pair (vsb , [l;u]) ment). There are however useful cases, e.g., for excluding with the first component representing the shape of elements

155 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski and the second representing their count. For example, domain using a finite domain of elements, and on recur- sion degrade input values using previously met input values childrenœ (mult (Expr, Expr) ≀ cst (suc (Nat))) = from the same partition. We assume that all our domains (mult (Expr, Expr) , (Expr, Expr)), are lattices with a widening operator. Consider a recursive (cst (suc (Nat)) , suc (Nat)) operational semantics judgment i =⇒ o, with i being an in- put from domain Inputš , and o being the output from domain and childrenœ ({Expr}[1;10]) = {({Expr}[1;10], (Expr, [1; 10]))}. Outputœ. For this judgment, we associate a memoization map Note how the childrenœ function maintains precision by parti- Mb ∈ PInput› → Inputš ×Outputœ where PInput› is a finite parti- tioning the alternatives for data-types, when traversing each tioning domain that has a Galois connection with our actual γPI corresponding sequence of value shapes for the children. input, i.e. Input ←−−−−−c PInput. The memoization map keeps š −−−−−→α › PIc Traversing Children. The shape of execution rules depend track of the previously seen input and corresponding output on the representation of children; this is consistent with the for values in the partition domain. For example, for input requirements imposed by Schmidt [41]. For heterogeneous from our value domain Valueš we can use the correspond- sequences of value shapes vs, the execution rules iterate ing type from the domain Type as input to the memoization b 2 through the sequence recursively traversing each element. map. So for values 1 and [2; 3] we would use int, while for Due to over-approximation we may re-traverse the same or mult(Expr, Expr) we would use the defining data type Expr. a more precise value on recursion, and so we need to use We perform a fixed-point calculation over the evaluation of trace memoization (Sect.8) to terminate. For example the input i. Initially, the memoization map Mb is λpi.(⊥, ⊥), and children of an expression Expr refer to itself: during evaluation we check whether there was already a value from the same partition as i, i.e., α (i) ∈ dom Mb. At (mult (Expr, Expr) , (Expr, Expr)), PIc childrenœ (Expr) = each iteration, there are then two possibilities: (cst (Nat) , Nat), (var (str) , str) Hit The corresponding input partition key is in the mem- Traversing children represented by a shape-length pair, is oization map and a less precise input is stored, so M(α (i)) = (i ′,o′) where i ⊑ i ′. Here, the out- directed by the length interval [l;u]. If 0 is a possible value b PIc Inputš of the length interval, then traversal can finish, refining the put value o that is stored in the memoization map is input shape to be empty. Otherwise, we perform another tra- returned as result. versal recursively on the shape of elements and recursively Widen The corresponding input partition key is in the mem- on a new shape-length pair which decreases the length, fi- oization map, but an unrelated or more precise input nally combining their values. Note, that if the length is un- is stored, i.e., M(α (i)) = (i ′′,o′′) where i i ′′. In b PIc @Inputš bounded, e.g. [0; ∞], then the value can be decreased forever this case we continue evaluation but with a widened and trace memoization is also needed here for termination. input i ′ = i ′′∇ (i ′′ ⊔ i) and an updated map M ′ = Inputš b This means that trace memoization must here be nested [α (i) 7→ (i ′,o )]. Here, o is the output of the PIc prev prev breadth-wise (when recursing on an unbounded sequence last iteration for the fixed-point calculation for input of children), in addition to depth-wise (when recursing on i ′, and is assigned ⊥ on the initial iteration. children); this can be computationally expensive, and we Intuitively, the technique is terminating because the par- will discuss in Sect.9 how our implementation handles that. titioning is finite, and widening ensures that we reach an upper bound of possible inputs in a finite number of steps, 8 Trace Memoization eventually getting a hit. The fixed-point iteration also uses Abstract interpretation and static program analysis in gen- widening to calculate an upper bound, which similarly fin- eral perform fixed-point calculation for analysing unbounded ishes in a number of steps. The technique is sound because loops and recursion. In Schmidt-style abstract interpretation, we only use output for previous input that is less precise; the main technique to handle recursion is trace memoiza- therefore our function is continuous and a fixed-point exists. tion [39, 41]. The core idea of trace memoization is to detect non-structural re-evaluation of the same program element, 9 Experimental Evaluation i.e., when the evaluation of a program element is recursively We demonstrate the ability of Rabit to verify inductive shape dependent on itself, like a while-loop or traversal. properties, using five transformation programs, where three The main challenge when recursing over inputs from in- are classical and two are extracted from open source projects. finite domains, is to determine when to merge recursive paths together to correctly over-approximate concrete ex- Negation Normal Form (NNF) transformation [26, Sec- ecutions. We present an extension that is still terminating, tion 2.5] is a classical rewrite of a propositional formula to sound and, additionally, allows calculating results with good precision. The core idea is to partition the infinite input 2Provided that we bound the depth of type parameters of collections.

156 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA combination of conjunctions and disjunctions of literals, so Threats to Validity. The programs are not selected ran- negations appear only next to atoms. An implementation of domly, thus it can be hard to generalize the results. We mit- this transformation should guarantee the following: igated this by selecting transformations that are realistic P1 Implication is not used as a connective in the result and vary in authors and purpose. While translating the pro- P2 All negations in the result are in front of atoms grams to Rascal Light, we strived to minimize the amount of changes, but bias cannot be ruled out entirely. Rename Struct Field (RSF) refactoring changes the name of a field in a struct, and that all corresponding field access Implementation. We have implemented the abstract inter- 4 expressions are renamed correctly as well: preter in a prototype tool, Rabit , for all of Rascal Light fol- P3 Structure should not define a field with the old field name lowing the process described in sections 5 to8. This required P4 No field access expression to the old field handling additional aspects, not discussed in the paper: 1. Possibly undefined values Desugar Oberon-0 (DSO) transformation [7, 51], translates 2. Extended result state with more Control flow con- for-loops and switch-statements to while-loops and nested structs, backtracking, exceptions, loop control, and if-statements, respectively. 3. Fine-tuning memoization strategies to the different P5 for should be correctly desugared to while looping constructs and recursive calls P6 switch should be correctly desugared to if By default, we use the top element ⊤ specified as input, but P7 No auxiliary data in output the user can specify the initial data-type refinements, store Code Generation for Glagol (G2P) a DSL for REST-like and parameters, to get more precise results. The output of web development, translated to PHP for execution.3 We are the tool is the abstract result value set of abstractly interpret- interested in the part of the generator that translates Glagol ing target function, the resulting store state and the set of expressions to PHP, and the following properties: relevant inferred data-type refinements. The implementation extends standard regular tree gram- P8 Output only simple PHP expressions for simple Glagol mar operations [1, 18], to handle the recursive equations expression inputs for the expressive abstract domains, including base values, P9 No unary PHP expressions if no sign marks or negations collections and heterogeneous data types. We use a more in Glagol input precise partitioning strategy for trace memoization when Mini Calculational Language (MCL) a programming needed, which also takes the set of available constructors language text-book [43] implementation of a small expres- into account for data types. sion language, with arithmetic and logical expressions, vari- Results. We ran the experiments using Scala 2.12.2 on a 2012 ables, if-expressions, and let-bindings. The implementation Core i5 MacBook Pro. Table1 summarizes the size of the contains an expression simplifier (larger version of running programs, the runtime of Rabit, and whether the properties example), type inference, an interpreter and a compiler. have been verified. Since we verify the results on the abstract P10 Simplification procedure produces an expression with shapes, the programs are shown to be correct for all possible no additions with 0, multiplications with 1 or 0, sub- 4 tractions with 0, logical expressions with constant https://github.com/itu-square/Rascal-Light operands, and if-expressions with constant conditions. P11 Arithmetic expressions with no variables have type int Table 1. Time and success rate for analyzing programs and and no type errors properties presented earlier this section. P12 Interpreting expressions with no integer constants and let’s gives only Boolean values Transformation LOC Runtime [s] Property Verified P13 Compiling expressions with no if’s produces no goto’sand P1 ✓ NNF 15 7.3 if instructions P2 ✓ P3 ✗ P14 Compiling expressions with no if’s produces no labels RSF 35 6.0 and does not change label counter P4 ✓ P5 ✓ All these transformations satisfy the following criteria: DSO 125 25.0 P6 ✓ 1. They are formulated by an independent source, P7 ✗ 2. They can be translated in relatively straightforward 1.6 P8 ✓ G2P 350 manner to our subset of Rascal, and 3.5 P9 ✓ 3. They exercise important constructs, including visitors 1.6 P10 ✓ and the expressive pattern matching 0.7 P11 ✓ MCL 0.6 P12 ✓ We have ported all these programs to Rascal Light. 298 P13 ✓ 0.9 3https://github.com/BulgariaPHP/glagol-dsl P14 ✓

157 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

syntactic program points using rich domains, while folding Figure 8. Initial and inferred refinement types for NNF happens dynamically on the semantic execution graph. Definitional interpreters have been suggested as a tech- 1 data FIn = and(FIn, FIn) | atom(str) | neg(FIn) nique for building compositional abstract interpreters [21]. 2 | imp(FIn, FIn) | or(FIn, FIn) 3 They rely on a caching algorithm to ensure termination, sim- ilarly to ordinary finite input trace memoization [39]. Simi- 4 data FOut = and(FOut, FOut) | atom(str) larly, Van Horn and Might [27] present a systematic frame- 5 | neg(atom(str)) | or(FOut, FOut) work to abstract higher-order functional languages. They rely on store-allocated continuations to handle recursion, which is kept finite during abstraction to ensure a terminat- concrete inputs satisfying the given properties. We remark ing analysis. We focused on providing a more precise widen- that all programs use the high-level expressive features of ing based on the abstract input value, which was necessary Rascal and are succinct compared to general purpose code. for verifying the required properties in our evaluation. The runtime, varying from single seconds to less than a Modern SMT solvers supports reasoning with inductive minute, is reasonable. All, but two, properties were success- functions defined over algebraic data-types [36]. The proper- fully verified. The reason that our tool runs slower onthe ties they can verify are very expressive, but they do not scale DSO transformation is that it contains function calls and we to large programs like transformations. Possible constructor rely on inlining for interprocedural analysis. analysis [6] has been used to calculate the actual dependen- FIn Lines 1–2 in Fig.8 show the input refinement type for cies of a predicate and make flow-sensitive analyses more the normalization procedure. The inferred inductive output precise. This analysis works with complex data-types and FOut type (lines 4–5) specifies that the implication is not arrays, but only captures the prefix of the target structures. present in the output (P1), and negation only allows atoms as Techniques for model transformation verification on static subformulae (P2). In fact, Rabit inferred a precise formulation analysis [20] have been suggested, but are on verification of negation normal form as an inductive data type. of types and undefinedness properties. Symbolic execution has previously been suggested [3] as a way to validate high- 10 Related Work level transformation programs, but it targets test generation We start with discussing techniques that could make Rabit rather than verification of properties. Semantic typing [9, 13] verify properties like P3 and P7. To verify P3, we need to be has been used to infer recursive type and shape properties able to relate field names to their corresponding definitions for language with high-level constructs for querying and it- in the class. Relational abstract interpreteration [33] allows eration. However, they only consider small calculi compared specifying constraints that relate values across different vari- to Rascal Light, and our evaluation is more extensive. ables, even inside and across substructures [14, 25, 31]. For a concrete input of P7, we know that the number of auxiliary 11 Conclusion data elements decreases on each iteration, but this informa- Our goal was to use abstract interpretation to give a solid tion is lost in our abstraction. A possible solution could be to semantic foundation for analyzing programs in modern high- allow abstract attributes that extract additional information level transformation languages. We designed and imple- about the abstracted structures [11, 35, 47]. A generalization mented a Schmidt-style abstract interpreter, Rabit, including of the multiset abstraction [34], could be useful to track e.g., partition-driven trace memoization that supports infinite in- the auxiliary statement count, and show that they decrease put domains. This worked well for Rascal, and can be adapted using multiset-ordering [23]. Other techniques [5, 14, 49] sup- for similar languages with complex control flow. The modu- port inferring inductive relational properties for general data- lar construction of abstract domains was vital for handling a types—e.g, binary tree property—but require a pre-specified language of this scale and complexity. We evaluated Rabit structure to indicate where refinement can happen. on classical and open source transformations, by verifying a Cousot and Cousot [19] present a general framework for series of sophisticated shape properties for them. modularly constructing program analyses, but it requires lan- guages with compositional control flow. Toubhans, Rival and Chang [38, 48] develop a modular domain design for pointer- Acknowledgments manipulating programs, whereas our domain construction We thank Paul Klint, Tijs van der Storm, Jurgen Vinju and focuses on abstracting pure heterogeneous data-structures. Davy Landman for discussions on Rascal and its semantics. There are similarities between our work and verification We thank Rasmus Møgelberg and Jan Midtgaard for discus- techniques based on program transformation[22, 30]. Our sions on correctness of the recursive shape abstractions. This systematic exploration of execution rules for abstraction material is based upon work supported by the Danish Coun- is similar to unfolding, and widening is similar to folding. cil for Independent Research under Grant No. 0602-02327B The main difference is that abstract interpretation widens at and Innovation Fund Denmark under Grant No. 7039-00072B.

158 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

References [20] Jesús Sánchez Cuadrado, Esther Guerra, and Juan de Lara. 2017. Static [1] Alexander Aiken and Brian R. Murphy. 1991. Implementing Regular Analysis of Model Transformations. IEEE Trans. Software Eng. 43, 9 Tree Expressions. In FPLCA 1991. 427–447. https://doi.org/10.1007/ (2017), 868–897. https://doi.org/10.1109/TSE.2016.2635137 3540543961_21 [21] David Darais, Nicholas Labich, Phuc C. Nguyen, and David Van Horn. [2] Ahmad Salim Al-Sibahi. 2017. The Formal Semantics of Rascal Light. 2017. Abstracting definitional interpreters (functional pearl). PACMPL CoRR abs/1703.02312 (2017). http://arxiv.org/abs/1703.02312 1, ICFP (2017), 12:1–12:25. https://doi.org/10.1145/3110256 [3] Ahmad Salim Al-Sibahi, Aleksandar S. Dimovski, and Andrzej Wa- [22] Emanuele De Angelis, Fabio Fioravanti, Alberto Pettorossi, and Maur- sowski. 2016. Symbolic execution of high-level transformations. In izio Proietti. 2014. Program verification via iterated specialization. Sci. SLE 2016. 207–220. http://dl.acm.org/citation.cfm?id=2997382 Comput. Program. 95 (2014), 149–175. https://doi.org/10.1016/j.scico. [4] Ahmad Salim Al-Sibahi, Thomas P. Jensen, Aleksandar S. Dimovski, 2014.05.017 and Andrzej Wąsowski. 2018. Verification of High-Level Transforma- [23] Nachum Dershowitz and Zohar Manna. 1979. Proving Termination tions with Inductive Refinement Types. ArXiv e-prints (Sept. 2018). with Multiset Orderings. Commun. ACM 22, 8 (1979), 465–476. https: arXiv:cs.PL/1809.06336 https://arxiv.org/abs/1809.06336 //doi.org/10.1145/359138.359142 [5] Aws Albarghouthi, Josh Berdine, Byron Cook, and Zachary Kincaid. [24] Timothy S. Freeman and Frank Pfenning. 1991. Refinement Types for 2015. Spatial Interpolants. In ESOP 2015. 634–660. https://doi.org/10. ML. In PLDI 1991. 268–277. http://doi.acm.org/10.1145/113445.113468 1007/978-3-662-46669-8_26 [25] Nicolas Halbwachs and Mathias Péron. 2008. Discovering properties [6] Oana Fabiana Andreescu, Thomas Jensen, and Stéphane Lescuyer. 2015. about arrays in simple programs. In PLDI 2008. 339–348. https://doi. Dependency Analysis of Functional Specifications with Algebraic Data org/10.1145/1375581.1375623 Structures. In ICFEM 2015. 116–133. https://doi.org/10.1007/978-3- [26] John Harrison. 2009. Handbook of Practical Logic and Automated 319-25423-4_8 Reasoning. Cambridge University Press. [7] Bas Basten, Jeroen van den Bos, Mark Hills, Paul Klint, Arnold [27] David Van Horn and Matthew Might. 2010. Abstracting abstract ma- Lankamp, Bert Lisser, Atze van der Ploeg, Tijs van der Storm, and chines. In Proceeding of the 15th ACM SIGPLAN international conference Jurgen J. Vinju. 2015. Modular language implementation in Ras- on Functional programming, ICFP 2010, Baltimore, Maryland, USA, Sep- cal - Experience Report. Sci. Comput. Program. 114 (2015), 7–19. tember 27-29, 2010, Paul Hudak and Stephanie Weirich (Eds.). ACM, http://dx.doi.org/10.1016/j.scico.2015.11.003 51–62. https://doi.org/10.1145/1863543.1863553 [8] Marcin Benke, Peter Dybjer, and Patrik Jansson. 2003. Universes for [28] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial Generic Programs and Proofs in Theory. 10, 4 (2003), evaluation and automatic program generation. Prentice Hall. 265–289. [29] Paul Klint, Tijs van der Storm, and Jurgen Vinju. 2011. EASY Meta- [9] Véronique Benzaken, Giuseppe Castagna, Kim Nguyen, and Jérôme programming with Rascal. In GTTSE III, JoãoM. Fernandes, Ralf Läm- Siméon. 2013. Static and dynamic semantics of NoSQL languages. In mel, Joost Visser, and João Saraiva (Eds.). 222–289. https://doi.org/10. POPL 2013. 101–114. https://doi.org/10.1145/2429069.2429083 1007/978-3-642-18023-1_6 [10] Martin Bodin, Thomas Jensen, and Alan Schmitt. 2015. Certified [30] Alexei P. Lisitsa and Andrei P. Nemytykh. 2015. Finite Countermodel Abstract Interpretation with Pretty-Big-Step Semantics. In CPP 2015. Based Verification for Program Transformation (A Case Study). In 29–40. https://doi.org/10.1145/2676724.2693174 Proceedings of the Third International Workshop on Verification and [11] Ahmed Bouajjani, Cezara Dragoi, Constantin Enea, and Mihaela Program Transformation, VPT@ETAPS 2015, London, United Kingdom, Sighireanu. 2012. Abstract Domains for Automated Reasoning about 11th April 2015. (EPTCS), Alexei Lisitsa, Andrei P. Nemytykh, and List-Manipulating Programs with Infinite Data. In VMCAI 2012. 1–22. Alberto Pettorossi (Eds.), Vol. 199. 15–32. https://doi.org/10.4204/ https://doi.org/10.1007/978-3-642-27940-9_1 EPTCS.199.2 [12] Martin Bravenboer, Karl Trygve Kalleberg, Rob Vermaas, and Eelco [31] Jiangchao Liu and Xavier Rival. 2017. An array content static analysis Visser. 2008. Stratego/XT 0.17. A language and toolset for program based on non-contiguous partitions. Computer Languages, Systems & transformation. Sci. Comput. Program. 72, 1-2 (2008), 52–70. https: Structures 47 (2017), 104–129. https://doi.org/10.1016/j.cl.2016.01.005 //doi.org/10.1016/j.scico.2007.11.003 [32] Neil Mitchell and Colin Runciman. 2007. Uniform boilerplate and [13] Giuseppe Castagna and Kim Nguyen. 2008. Typed iterators for XML. list processing. In Haskell 2007, Freiburg, Germany. 49–60. https: In ICFP 2008. 15–26. https://doi.org/10.1145/1411204.1411210 //doi.org/10.1145/1291201.1291208 [14] Bor-Yuh Evan Chang and Xavier Rival. 2008. Relational Inductive [33] Alan Mycroft and Neil D. Jones. 1985. A relational framework for Shape Analysis. In POPL 2008. 247–260. https://doi.org/10.1145/ abstract interpretation. In Programs as Data Objects. 156–171. https: 1328438.1328469 //doi.org/10.1007/3-540-16446-4_9 [15] James Chapman, Pierre-Évariste Dagand, Conor McBride, and Peter [34] Valentin Perrelle and Nicolas Halbwachs. 2010. An Analysis of Permu- Morris. 2010. The gentle art of levitation. In ICFP 2010. 3–14. https: tations in Arrays. In VMCAI 2010. 279–294. https://doi.org/10.1007/978- //doi.org/10.1145/1863543.1863547 3-642-11319-2_21 [16] James R. Cordy. 2006. The TXL source transformation language. Sci. [35] Tuan-Hung Pham and Michael W. Whalen. 2013. An Improved Comput. Program. 61, 3 (2006), 190–210. https://doi.org/10.1016/j.scico. Unrolling-Based Decision Procedure for Algebraic Data Types. In 2006.04.002 VSTTE 2013. 129–148. https://doi.org/10.1007/978-3-642-54108-7_7 [17] Patrick Cousot. 2003. Verification by Abstract Interpretation. In Verifi- [36] Andrew Reynolds and Viktor Kuncak. 2015. Induction for SMT Solvers. cation: Theory and Practice, Essays Dedicated to Zohar Manna on the In VMCAI 2015. 80–98. https://doi.org/10.1007/978-3-662-46081-8_5 Occasion of His 64th Birthday. 243–268. https://doi.org/10.1007/978-3- [37] Xavier Rival and Laurent Mauborgne. 2007. The trace partitioning 540-39910-0_11 abstract domain. ACM Trans. Program. Lang. Syst. 29, 5 (2007), 26. [18] Patrick Cousot and Radhia Cousot. 1995. Formal Language, Grammar https://doi.org/10.1145/1275497.1275501 and Set-Constraint-Based Program Analysis by Abstract Interpretation. [38] Xavier Rival, Antoine Toubhans, and Bor-Yuh Evan Chang. 2014. Con- In FPCA 1995. 170–181. http://doi.acm.org/10.1145/224164.224199 struction of Abstract Domains for Heterogeneous Properties. In ISoLA [19] Patrick Cousot and Radhia Cousot. 2002. Modular Static Program 2014. 489–492. https://doi.org/10.1007/978-3-662-45231-8_40 Analysis. In CC 2002. 159–178. https://doi.org/10.1007/3-540-45937- [39] Mads Rosendahl. 2013. Abstract Interpretation as a Programming 5_13 Language. In Semantics, Abstract Interpretation, and Reasoning about

159 GPCE ’18, November 5–6, 2018, Boston, MA, USA A. S. Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

Programs: Essays Dedicated to David A. Schmidt on the Occasion of his [46] Morten Heine Sørensen, Robert Glück, and Neil D. Jones. 1996. A Sixtieth Birthday. 84–104. https://doi.org/10.4204/EPTCS.129.7 Positive Supercompiler. J. Funct. Program. 6, 6 (1996), 811–838. https: [40] John M. Rushby, Sam Owre, and Natarajan Shankar. 1998. Subtypes //doi.org/10.1017/S0956796800002008 for Specifications: Predicate Subtyping in PVS. IEEE Trans. Software [47] Philippe Suter, Mirco Dotta, and Viktor Kuncak. 2010. Decision pro- Eng. 24, 9 (1998), 709–720. https://doi.org/10.1109/32.713327 cedures for algebraic data types with abstractions. In POPL 2010, [41] David A. Schmidt. 1998. Trace-Based Abstract Interpretation of Opera- Manuel V. Hermenegildo and Jens Palsberg (Eds.). ACM, 199–210. tional Semantics. Lisp and Symbolic Computation 10, 3 (1998), 237–271. https://doi.org/10.1145/1706299.1706325 [42] Dana S. Scott. 1976. Data Types as Lattices. SIAM J. Comput. 5, 3 [48] Antoine Toubhans, Bor-Yuh Evan Chang, and Xavier Rival. 2013. Re- (1976), 522–587. http://dx.doi.org/10.1137/0205037 duced Product Combination of Abstract Domains for Shapes. In VMCAI [43] Peter Sestoft and Niels Hallenberg. 2017. con- 2013. 375–395. https://doi.org/10.1007/978-3-642-35873-9_23 cepts. Springer. [49] Niki Vazou, Patrick Maxim Rondon, and Ranjit Jhala. 2013. Abstract [44] AnthonyM. Sloane. 2011. Lightweight Language Processing in Kiama. Refinement Types. In ESOP 2013. 209–228. https://doi.org/10.1007/978- In GTTSE III, JoãoM. Fernandes, Ralf Lämmel, Joost Visser, and João 3-642-37036-6_13 Saraiva (Eds.). Lecture Notes in Computer Science, Vol. 6491. Springer [50] Glynn Winskel. 1993. Information Systems. MIT Press, Chapter 12. Berlin Heidelberg, 408–425. https://doi.org/10.1007/978-3-642-18023- [51] . 1996. Compiler Construction. Addison-Wesley. 1_12 [52] Hongwei Xi and Frank Pfenning. 1998. Eliminating Array Bound [45] Michael B. Smyth and Gordon D. Plotkin. 1982. The Category- Checking Through Dependent Types. In PLDI 1998. 249–257. https: Theoretic Solution of Recursive Domain Equations. SIAM J. Comput. //doi.org/10.1145/277650.277732 11, 4 (1982), 761–783. http://dx.doi.org/10.1137/0211062

160