arXiv:1809.06336v1 [cs.PL] 17 Sep 2018 08Cprgthl yteonrato() ulcto rights Publication owner/author(s). the by held Copyright 2018 © ha ai lSbh,Toa .Jne,Aesna .Dimo S. Aleksandar Jensen, P. Thomas Al-Sibahi, Salim Ahmad Format: Reference ACM rtn ytm n oplrgenerators compiler and systems writing eeaut ai nasre ftasomtos(normal- transformations of series a on Rabit evaluate we hsi h uhrsvrino h ok ti otdhr o your for here posted is It work. the of version author’s the is This verification eiyn nutv yeadsaepoete o transfor- for properties shape and type inductive verifying s.Ntfrrdsrbto.TedfiiieVrino eodwspubl was Record of Version in definitive The redistribution. for Not use. 08 otn A USA MA, ’18), Boston, (GPCE 2018, Experiences and Concepts Programming: erative ain ihIdcieRfieetTps In Types. Refinement Inductive with Tran High-Level mations of Verification 2018. Wąsowski. Andrzej and in ttcanalysis static tion, Keywords rmtvs • primitives; oACM. to USA MA, Boston, 2018, 5–6, November ’18, GPCE constructs C Concepts CCS stated verify effectively can properties. we in- that type showing generators, etc.) ference, code refactoring, desugaring, ization, Finally, matching. pattern and traversals expressive analyz- the when ing arising challenges the on focusing specifically semantics, operational per- on based to interpretation how abstract describe form We languages. such in written mations n n eeaie trtr.W rsn h einand fo design Rabit, tool, the interpretation present abstract an We of implementation iterators. generalized and ing backt matching, pattern expressive traversals, first-class tree syntax abstract large manipulating for ex- features pressive include Rascal like languages transformation High-level Abstract rceig fte1t C IPA nentoa Conferen International SIGPLAN ACM 17th the of Proceedings e icto fHg-ee rnfrain with Transformations High-Level of Verification ; rga schemes Program rnfrainlnugs btatinterpreta- abstract languages, transformation otaeadisengineering its and Software ; rga analysis Program [email protected] ohrTrs nvriy Skopje University, Teresa Mother ahmad@{di.ku.dk,skanned.com} • TUiest fCopenhagen of University IT TUiest fCopenhagen of University IT lkadrS Dimovski S. Aleksandar ha ai Al-Sibahi Salim Ahmad , nvriyo Copenhagen of University hps://doi.org/10.1145/3278122.3278125 hoyo computation of Theory Skanned.com Macedonia Denmark Denmark ; prtoa semantics Operational nutv enmn Types Refinement Inductive ; Abstraction rceig fthe of Proceedings → Semantics; ; → Translator ; oebr5–6, November Functional Program Control ; . eo Gen- on ce personal licensed ished rack- sfor- vski, r s: hyaeue,aogtohr,frdsgrn,mdltrans- model desugaring, for others, amongst used, are They development. software in role central a play Transformations ig ocpsadEprecs(PE’8,Nvme –,20 5–6, November ’18), USA. (GPCE MA, Boston, Experiences and Concepts ming: Pr Generative on Conference International SIGPLAN ACM 17th o rcsigo ag structures. large constructs of core with processing functional exceptions, for and a state combines supporting Rascal language example, For [46 Scala. Kiama and for Haskell, for [34] Uniplate [15], TXL [11], maintaining Strat- ego/XT [31], and Rascal writing include languages of These challenge transformations. devel- this been address have to features oped high-level with languages Specialized frameworks process. and error-prone and transformations tedious writing a corre- is Thus, a semantics. and syn- rich elements, abstract syntactic spondingly of large hundreds have spanning code—often tax, and models, domain- specific data, structured transformations—e.g., artifacts in The involved generation. code and refactoring, formations, Introduction 1 lcsi statement a in blocks 1. Figure 1 4 3 2 9 8 7 6 5 public } return solve } = s } rnfraini aclta atn l nested all flattens that Rascal in Transformation case s { { (s) s) flattenBlocks(Script Script TUiest fCopenhagen of University IT otmu visit bottom-up s; C,NwYr,N,UA 9pages. 19 USA, NY, York, New ACM, [email protected] nre Wąsowski Andrzej ttit [ stmtList: hmsP Jensen P. Thomas [email protected] ni Rennes Inria s+y zs + ys + xs Denmark France * s { (s) xs,block(ys), * s => zs] hps://doi.org/10.1145/3278122.3278125 ogram- 18, ] GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

Figure 1 shows an example Rascal transformation program 1 data Nat = zero() | suc(Nat pred); 1 taken from a PHP analyzer. This transformation program 2 data Expr = var(str nm) | cst(Nat vl) recursively flattens all blocks in a list of statements. The pro- 3 | mult(Expr el, Expr er); gram uses the following core Rascal features: 4 • A visitor (visit) to traverse and rewrite all statement 5 Expr simplify(Expr expr) = lists containing a block to a flat list of statements. Visi- 6 bottom-up visit (expr) { tors support various strategies, like the bottom-up strat- 7 case mult(cst(zero()), y) => cst(zero()) egy that traverses the abstract syntax tree starting from 8 case mult(x, cst(zero())) => cst(zero()) leaves toward the root. 9 }; • An expressive pattern matching language is used to non-deterministically find blocks inside a list of state- Figure 2. The running example: eliminating multiplications ments. The starred variable patterns *xs and *zs match by zero from expressions arbitrary number of elements in the list, respectively before and after the block(ys) element. Rascal sup- ports non-linear matching, negative matching and spec- 4. Schmidt-style abstract operational semantics [43] for ifying patterns that match deeply nested values. a significant subset of Rascal adapting the idea of trace • The solve-loop (solve) performing the rewrite until a memoization to support arbitrary recursive calls with fixed point is reached (the value of s stops changing). input from infinite domains. To rule out errors in transformations, we propose a static Together, these contributions show feasibility of applying analysis for enforcing type and shape properties, so that tar- abstract interpretation for constructing analyses for expres- get transformations produce output adhering to particular sive transformation languages and properties. shape constraints. For our PHP example, this would include: We proceed by presenting a running example in Sect. 2. We introduce the key constructs of Rascal in Sect. 3. Sec- • The transformation preserves the constructors used tion 4 describes the modular construction of abstract domains. in the input: does not add or remove new types of PHP Sections 5 to 8 describe abstract semantics. We evaluate the statements. analyzer on realistic transformations, reporting results in • The transformation produces flat statement lists, i.e., Sect. 9. Sections 10 and 11 discuss related papers and con- lists that do not recursively contain any block. clude. To ensure such properties, a verification technique must rea- son about shapes of inductive data—also inside collections 2 Motivation and Overview such as sets and maps—while still maintaining soundness Verifying types and state properties such as the ones stated and precision. It must also track other important aspects, for the program of Fig. 1 poses the following key challenges: like cardinality of collections, which interact with target lan- • The programs use heterogeneous inductive data types, guage operations including pattern matching and iteration. and contain collections such as lists, maps and sets, In this paper, we address the problem of verifying type and basic data such as integers and strings. This com- and shape properties for high-level transformations written plicates construction of the abstract domains, since in Rascal and similar languages. We show how to design and one shall model interaction between these different implement a static analysis based on abstract interpretation. types while maintaining precision. Concretely, our contributions are: • The traversal of syntax trees depends heavily on the 1. An abstract interpretation-based static analyzer—Rascal type and shape of input, on a complex program state, ABstract Interpretation Tool (Rabit)—that supports in- and involves unbounded recursion. This challenges the ferring types and inductive shapes for a large subset inference of approximate invariants in a procedure of Rascal. that both terminates and provides useful results. 2. An evaluation of Rabit on several program transfor- • Backtracking and exceptions in large programs intro- mations: refactoring, desugaring, normalization - duce the possibility of state-dependent non-local jumps. rithm, code generator, and language implementation This makes it difficult to statically calculate the con- of an expression language. trol flow of target programs and have a compositional 3. A modular design for abstract shape domains, that denotational semantics, instead of an operational one. allows extending and replacing abstractions for con- Figure 2 presents a pedagogical example using visitors. crete element types, e.g. extending the abstraction for The program performs expression simplification by travers- lists to include length in addition to shape of contents. ing a syntax tree bottom-up and reducing multiplications by constant zero. We now survey the analysis techniques con- 1hps://github.com/cwi-swat/php-analysis tributed in this paper, explaining them using this example. Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

mult (cst (Nat) , cst (Nat)) Inductive refinement types Rabit works by inferring an recursei inductive refinement type representing the shape of possi- recurse cst (Nat) ··· ble output of a transformation given the shape of its input. ii It does this by interpreting the simplification program ab- recurse Nat stractly, considering all possible paths the program can take fail zero iii partition for values satisfying the input shape (any expression of type partition zero suc (Nat) Expr in this case). The result of running Rabit on this case iv v is: recurse Nat ′ ′ vi partition successcst (Nat)≀ var (str)≀ mult (Expr , Expr ) partition ′ ′ fail cst (Nat)≀ var (str)≀ mult (Expr , Expr ) . . . . where Expr′ = cst (suc (Nat)) ≀ var (str)≀ mult (Expr′, Expr′). We briefly interpret how to read this type. The bar ≀ de- Figure 3. Naively abstractly interpreting the sim- notes a choice between alternative constructors. If the input plification example from Fig. 2 with initial input was rewritten during traversal (success, the first line) then mult (cst (Nat) , cst (Nat)). The procedure does not ter- the resulting syntax tree contains no multiplications by zero. minate because of infinite recursion on Nat. All multiplications may only involve Expr′, which disallows the zero constant at the top level. Observe this in the last al- ternative mult (Expr′, Expr′) that contains only expressions the suc (Nat) case it will try to recurse down to Nat (node of type Expr′, which in turn only allows multiplications by vi) which is equal to (node iii). Here, we observe a problem: constants constructed using suc (Nat) (that is ≥ 1). If the if we continue our traversal algorithm as is, we will not ter- traversal failed to match (fail, the second line), then the in- minate and get a result. To provide a terminating algorithm put did not contain any multiplication by zero to begin with we will resort to using trace memoization. and so does not the output, which has not been rewritten. Partition-driven trace memoization The idea is to de- The success and failure happen to be the same for our ex- tect the paths where execution recursively meets similar in- ample, but this is not necessarily always the case. Keeping put, merging the new recursive node with the similar previ- separate result values allows retaining precision throughout ous one, thus creating a loop in the execution tree [41, 43]. the traversal, better reflecting concrete execution paths. We This loop is then resolved by a fixed-point iteration. now proceed discussing how Rabit can infer this shape us- In Rabit, we propose partition-driven trace memoization, ing abstract interpretation. which works with potentially unbounded input like the in- Abstractly interpreting traversals The core idea of ab- ductive type refinements that are supported by our abstrac- stractly executing a traversal is similar to concrete execu- tion. We detect cycles by maintaining a memoization map tion: we recursively traverse the input structure and rewrite which for each type—used for partitioning—stores the last the values that match target patterns. However, because of traversed value (input) and the last result produced for this abstraction we must make sure to take into account all appli- value (output). This memoization map is initialized to map cable paths. Figure 3 shows the execution tree of the traver- all types to the bottom element (⊥) for both input and out- sal on the simplification example (Fig. 2) when it starts with put. The evaluation is modified to use the memoization map, shape mult (cst (Nat) , cst (Nat)). Since there is only one con- so it checks on each iteration the input i against the map: structor, it will initially recurse down to traverse the con- • If the last processed refinement type representing the tained values (children) creating a new recursion node (yel- input i ′ is greater than the current input (i ′ ⊒ i), then low, light shaded) in the figure (ii) containing the left child it uses the corresponding output; i.e., we found a hit cst (Nat), and then recurse again to create a node (iii) con- in the memoization map. taining Nat. Observe here that Nat is an abstract type with • Otherwise, it will merge the last processed and cur- two possible constructors (zero, suc (·)), and it is unknown at rent input refinement types to a new value i ′′ = i ′∇i, time of abstract interpretation, which of these constructors update the memoization map and continue execution we have. When Rabit hits a type or a choice between alter- with i ′′. The operation ∇ is called a widening; it en- native constructors, it explores each alternative separately sures that the result is an upper bound of its inputs, creating new partition nodes (blue, darker). In our example i.e., i ′ ⊑ i ′′ ⊒ i and that the merging will eventually we partition the Nat type into its constructors zero (node terminate for the increasing chain of values. The mem- iv) and suc (Nat) (node v). The zero case now represents the oization map is updated to map the general type of i ′′ first case without children and we can run the visitor oper- (not refined, for instance Nat) to mapto a pair (i ′′,o), ations on it. Since no pattern matches zero it will return a where the first component denotes the new input i ′′ fail zero result indicating that it has not been rewritten. For refinement type and the second component denotes GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

the corresponding output o refinement type; initially, a nested fixed point iteration on Nat, already discussed in o is set to ⊥ and then changed to the result of execut- Fig. 4, so we just include the final result. ing input i ′′ repeatedly until a fixed-point is reached. Type refinement. The output of the first iteration in node We demonstrate the trace memoization and fixed-point iter- 6 is fail cst (Nat), which becomes the new oprev, and the sec- ation procedures on Nat in Fig. 4, beginning with the left- ond iteration begins (to the right). After the widening the in- most tree. The expected result is fail Nat, meaning that no put is partitioned into e (node 7) and cst (Nat)(node elided). pattern has matched, no rewrite has happened, and a value When the second iteration returns to node 7 we have the fol- of type Nat is returned, since the simplification program lowing reconstructed value: mult (cst (Nat) , cst (Nat)). Con- only introduces changes to values of type Expr. trast this with lines 6-7 in Fig. 2, to see that running the We show the memoization map inside a framed orange abstract value against this pattern might actually produce box. The result of the widening is presented below the mem- success. In order to obtain precise result shapes, we refine oization map. In all cases the widening in Fig. 4 is trivial, the input values when they fail to match a pattern. Our ab- as it happens against ⊥. The final line in node 1 stores the stract interpreter produces a refinement of the type, by run- value oprev produced by the previous iteration of the traver- ning it through the pattern matching, giving: sal, to establish whether a fixed point has been reached (⊥ initially). success cst (Nat) fail mult (cst (suc (Nat)) , cst (suc (Nat))) Trace partitioning We partition [39] the abstract value Nat along its constructors: zero and suc (·) (Fig. 4). This par- The result means, that if the pattern match succeeds then it titioning is key to maintain precision during the abstract produces an expression of type cst (Nat). More interestingly, interpretation. As in Fig. 3, the left branch fails immediately, if the matching failed neither the left nor the right argument since no pattern in Fig. 2 matches zero. The right branch of mult (·, ·) could have contained the constant zero—the in- descends into a new recursion over Nat, with an updated terpreter captured some aspect of the semantics of the pro- memoization table. This run terminates, due to a hit in the gram by refining the input type. Naturally, from this point memoization map, returning ⊥. After returning, the value of on the recursion and iteration continues, but we shall aban- suc (Nat) should be reconstructed with the result of travers- don the example, and move on to formal developments. ing the child Nat, but since the result is ⊥ there is no value to reconstruct with, so ⊥ is just propagated upwards. At the 3 Formal Language return to the last widening node, the values are joined, and The presented technique is meant to be general and applica- widen the previous iteration result oprev (the dotted arrow ble to many high-level transformation languages. However, on top). This process repeats in the second and third iter- to keep the presentation concise, we focus on few key con- ations, but now the reconstruction in node 3 succeeds: the structs from Rascal [31], relying on the concrete semantics child Nat is replaced by zero and fail suc (zero) is returned from Rascal Light[2]. (dashed arrow from 3 to 1). In the third iteration, we join and We consider algebraic data types (at) and finite sets (sethti) widen the following components (cf. oprev and the dashed ar- of elements of type t. Each algebraic data type at hasasetof rows incoming into node 1 in the rightmost column): unique constructors. Each constructor k(t) hasa fixedset of [zero ≀ suc (zero)∇(zero ⊔ suc (zero≀suc (zero)))] = Nat typed parameters. The language includes sub-typing, with void and value as bottom and top types respectively. Here, the used widening operator [17] accelerates the con- F vergence by increasing the value to represent the entire type t ∈ Type void | sethti | at | value Nat. It is easy to convince yourself, by following the same We consider the following subset of Rascal expressions: From recursion steps as in the figure, that the next iteration, us- left to right we have: variable access, assignments, sequenc- = ing oprev Nat will produce Nat again, arriving at a fixed ing, constructor expressions, set literal expressions, match- point. Observe, how consulting the memoization map, and ing failure expression, and bottom-up visitors: widening the current value accordingly, allowed us to avoid infinite recursion over unfoldings of Nat. e F x ∈ Var | x = e | e; e | k(e) | {e} | fail | visit e cs cs F case p ⇒ e Nesting fixed point iterations. When inductive shapes (e.g., Expr) refer to other inductive shapes (e.g., Nat), it is neces- Visitors are a key construct in Rascal. A visitor visit e cs tra- sary to run nested fixed-point iterations to solve recursion verses recursively the value obtained by evaluating e (any at each level. Figure 5 returns to the more high-level frag- combination of simple values, data type values and collec- mentofthetraversal of Expr starting with mult (cst (Nat) , cst (Nattions).)) During the traversal, case expression cs are applied as in Fig. 3. We follow the recursion tree along nodes 5, 6, 7, 8, to the nodes, and the values matching target patterns are 9, 10, 9, 6 with the same rules as in Fig. 4. In node 10 we run rewritten. We will discuss a concrete subset of patterns p Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

output: output: fail ⊥∇(zero⊔⊥) = fail zero fail zero∇(zero⊔suc (zero)) = fail zero ≀ suc (zero) recurse ··· fail Nat input: Nat input: Nat input: Nat

Nat 7→⊥, ⊥ Nat 7→⊥, ⊥ ) Nat 7→⊥, ⊥ ⊥ = = = )) widen: ⊥∇Nat Nat widen: ⊥∇Nat Nat zero widen: ⊥∇Nat Nat ( zero oprev = ⊥ oprev = fail zero oprev = fail zero ≀ suc (zero) ( 1 partition 1 1 suc

part. part. ≀ reconstruction: fail suc fail zero fail zero fail zero

part. part. zero

part. ( zero zero zero 2 suc (Nat) 2 suc (Nat) 2 suc (Nat) 3 no reconstruction: 3 hit:fail zero 3 recurse recurse ≀ suc (zero) recurse reconstruction: fail suc hit: ⊥ hit: fail zero input: Nat input: Nat input: Nat Nat 7→Nat, ⊥ Nat 7→Nat, fail zero Nat 7→Nat, fail zero≀suc (zero) 4 4 4

Figure 4. Three iterations of a fixed point computation for input Nat. Iterations are separated by dotted arrows on top

input: e = mult (cst (Nat) , cst (Nat)) Expr 7→⊥, ⊥ output: Nat 7→⊥, ⊥ fail ⊥∇(⊥ ⊔ cst (Nat)) = fail cst (Nat) widen: ⊥∇e = e oprev = ⊥ recurse 5 input: cst (Nat) Expr 7→e, ⊥ recurse Nat 7→⊥, ⊥ ··· input: cst (Nat) widen: e∇cst (Nat) = e ≀ cst (Nat) Expr 7→e, ⊥ oprev = fail cst (Nat) type Nat 7→⊥, ⊥ part. 6

no reconstr.: widen: e∇cst (Nat) = e ≀ cst (Nat) ··· part. = ⊥ reconstruction: oprev ) mult (cst (Nat) , cst (Nat)) partition 6 e Nat ( ) 7 ⊥ part. hit: (Nat hit: fail cst reconstruction: fail cst e cst (Nat) fail cst recurse 7 9 recurse (Nat ⊥ fail Nat hit: ) recurse input: cst (Nat) input: cst (Nat) input: cst (Nat) input: , , , Expr 7→e ≀ cst (Nat) ⊥ Nat Expr 7→e ≀ cst (Nat) fail cst (Nat) Expr 7→e ≀ cst (Nat) fail cst (Nat) Nat 7→⊥, ⊥ 10 Nat 7→⊥, ⊥ Nat 7→⊥, ⊥ 8 8 11

Figure 5. A prefix of the abstract interpreter run for e = mult (cst (Nat) , cst (Nat)). Fragments of two iterations involving node 6 are shown, separated by a dotted arrow. further in Sect. 6. For brevity, we only discuss the bottom- 4 Abstract Domains up visitors in the paper. However, Rabit (Sect. 9) supports Our abstract domains are designed to allow modular com- all visitor strategies of Rascal. position. Modularity is key for transformation languages, which manipulate a large variety of kinds of values. The design allows easily replacing abstract domains for particu- lar types of values, as well as adding support for new value Notation We write (x,y) ∈ f to denote the pair (x,y) such types. We want to construct an abstract value domain vs ∈ that x ∈ dom f and y = f (x). Abstract semantic components, sets, and operations are marked with a hat: a. A sequence of ValueShape which captures inductive refinement types of form: e1,..., en is contracted using an underlining e. The empty b œ sequence is written by ε, and concatenationb of sequences e1 atr = k (vs )≀···≀ k (vs ) and e2 is written e1, e2. Notation is lifted to sequences in an 1 1 n n intuitive manner: for example given a sequence v, the value r vi denotes the ith element in the sequence, and v :t denotes where each value vsi canb possibly recursivelyb refer to at . the sequence v1 :t1,...,vn :tn . Below, we define abstract domains for sets, data types and recursively definedb domains. GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

The modular domain design generalizes parameterized where Data(E) = k(v) ∃ a type at. k(v) ∈ at ∧ v ∈ E The domains [16] to follow a design inspired by the modular con- concretization is defined as follows: J K  struction of types and domains [7, 14, 44]. The idea is to de- γ (⊥ ) = ∅ γ (⊤ ) = Data(E) fine domains parametrically—i.e. in the form F(E)—so that DS DS DS DS abstract domains for subcomponents are taken as parame- = γDS(k1(e1)≀···≀c ckn(en)) kic(v) ci ∈[1, n] ∧ v ∈ γE(ei ) ters, and explicit recursion is handled separately.b b We use n o standard domain combinators [52] to combine the various c b Example 4.2. We can concretize abstract data elements DataShape(Interval) domains into our target abstract value domain. to a set of possible concrete data values ℘ (Data(Z)). Con- sider values from the algebraic data type: Set shape domain Let Set(E) denote the domain of sets œ œ consisting of elements taken from E. We define abstract fi- data errorloc = repl() | linecol(int, int) nite sets using abstract elements {e}[l;u] from a parameter- ized domain SetShape(E). The component from the param- We can concretize abstracting elements as follows: eter domain (e ∈ E) represents theb abstraction of the shape γ (repl()≀ linecol([1;1], [3;4])) = of elements, andœ a non-negativeb interval component [l;u] ∈ DS + + {repl(), linecol(1, 3), linecol(1, 4)} Interval is usedb tob abstract over the cardinality (so l,u ∈ R c and l ≤ u). The abstract set element acts as a reduced prod- Recursive shapes We extend our abstract domains to cover uctœ between e and [l;u] and thus the lattice operations fol- recursive structures such as lists and trees. Given a type ex- low directly. pression F(X ) with a variable X , we construct the abstract Given a concretizationb function for the abstract content domain as the solution to the recursive equation X = F(X ) [44, domain γE ∈ E → ℘ (E), we can define a concretization 47, 52], obtained by iterating the induced map F over the function for the abstract set shape domain to possible finite empty domain 0 and adjoining a new top element to the b sets of concreteb elements γSS ∈ SetShape(E) → ℘ (Set (E)): limit domain. The concretization function of the recursive domain follows directly from the concretization function of ({ } ) = ⊆ ( )∧| | ∈ ([ ]) γSS e [l;u] es esc γE œe es b γI l;u the underlying functor domain.  Examplec 4.1.b Let Interval beb ab domain of intervalsb of in- Example 4.3. We can concretize abstract elements of the tegers (a standard abstraction over integers). We can con- refinement type from our running example: cretize abstract elementsœ from SetShape(Interval) to a set 2 of possible sets of integers from ℘ (Set (Z)) as follows: ( e ) = ( ( ( ))), ( , ), œ œ γDS Expr cst suc suc zero mult 2 2 ({[ ]} ) = {{ }, { }, { , }}   γSS 42;43 [1;2] 42 43 42 43  z mult}|(mult(2, {2), 2),...  c   Data shapec domain Inductive refinement types are de- where Expre = cst(suc(suc(zero))) ≀ mult(Expre , Expr e ) In   fined as a generalization of refinement types [23, 42, 54] that particular, our abstract element represents the set of all mul- inductively constrain the possible constructors and the con- tiplications of the constant 2. tent in a data structure. We use a parameterized abstraction of data types DataShape(E), whose parameter E abstracts Value domains We presented the required components over the shape of constructor arguments: for abstracting individual types, and now all that is left is œ b b putting everything together. We construct our value shape d∈DataShape(E) = domain using choice and recursive domain equations:

{⊥ } ∪ {k1(e1)≀ ... ≀kn(en) | ei ∈E}∪{⊤ } = b œ DSb DS ValueShape We have the least element ⊥ and top element ⊤ elements— c DS b DS c SetShape(ValueShape)⊕ DataShape(ValueShape) respectively representing no data types value and all data œ Similarly, we have the corresponding concrete shape domain: type values—and otherwisec a non-empty choicec between œ œ œ œ unique (all different) constructors of the same algebraic data Value = Set (Value) ⊎ Data(Value) type k1(e1)≀···≀kn(en) (shortened k(e)). We can treat the con- We then have a concretization functionγ ∈ ValueShape → structor choice as a finite map [k1 7→ e1,..., kn 7→ en], and VS then directly define our lattice operations point-wise. ℘ (Value), which follows directly from the previously de- c Given a concretization function for the concrete content fined concretization functions. œ domain γ ∈ E → ℘ (E), we can create a concretization E Abstract state domains function for the data shape domain b b We now explain how to construct abstractions of states and γDS ∈ DataShape(E) → ℘ (Data(E)) results when executing Rascal programs.

c œ b Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

Abstract store domain Tracking assignments of variables abstracts over sets of result values and stores is important since matching variable patterns depends on same syntax the value being assigned in the store: e ; σ ===⇒ rest resv ; σ ′ e ; σ ====⇒ Res σ ∈ Store = Var → {ff, }× ValueShape expr a-expr abstracts input store For a variable x we get σ(x) = (b, vs) where b is true if x b c š œ might beb unassigned, and false otherwise (when x is defi- Figure 6. Relating concrete semantics (left) to abstract se- nitely assigned). The secondb component,b vs is a shape ap- mantics (right). proximating a possible value of x. We lift the orderings and lattice operationsb point-wise from the value shape domain to abstract stores. We define Figure 6 relates the concrete evaluation judgment (left) to the abstract evaluation judgment (right) for Rascal expres- the concretization function γStore ∈ Store → ℘ (Store) as: sions. Both judgements evaluate the same expression e. The ∀x,b, vs. σ(x) = (b, vs) ⇒ š š abstract evaluation judgment abstracts the initial concrete = (¬b ⇒ x ∈ dom σ) γStore(σ) σ store σ with an abstract store σ. The result of the abstract  domb b b  evaluation is a finite result set Res, abstracting over possibly  ∧(x ∈ σ ⇒ σ(x) ∈ γV(vs)) š   ′ b infinitely many concrete resultb values rest resv and stores σ . Abstract result domain Traditionally, abstractb control flow  b  Res maps each result type restcto a pair of abstract result is handled using a collecting denotational semantics with value resv and abstract result store σ ′, i.e.: continuations, or by explicitly constructing a control flow c graph. These methods are non-trivial to apply for a rich lan- Res = [rest1 7→ (resv1, σ1),..., restn 7→ (resvn, σn)] guage like Rascal, especially considering backtracking, ex- d b There is an important difference in how the concrete and ceptions and data-dependent control flow introduced by vis- c abstract semantic rulesš areb used. In a concreteš operationalb itors. A nice side-effect of Schmidt-style abstract interpreta- semantics a language construct is usually evaluated as soon tion is that it allows handling abstraction of control flow as the premises of a rule are satisfied. When evaluating ab- directly. stractly, we must consider all applicable rules, to soundly We model different type of results—successes, pattern match over-approximate the possible concrete executions. To this failures, errors directly in a ResSet domain which keeps track end, we introduce a special notation to collect all derivations of possible results with each its own separate store. Keeping with the same input i into a single derivation with output O separate stores is important to maintain precision around › equal to the join of the individual outputs: different paths: , = rest ∈ ResType F success | exres {|i ⇒ O|} O {o|i ⇒ o} exres F fail | error resv ∈ ResVal F · | vs Let’s use the operational rules forÄ variable accesses to illus- œ trate the steps in Schmidt-style translation of operational = Res ∈ ResSet ResType ⇀ ResVal × Store rules. The concrete semantics contains two rules for vari- d › b The lattice operations are lifted directly from the target value able accesses, E-V-S for successful lookup, and E-V-Er for domainsc and store› domains.œ We define› the concretizationš producing errors when accessing unassigned variables: ℘ function γRS ∈ ResultSet → (Result × Store): x ∈ dom σ E-V-S (rest, (resv, σ)) ∈ Res ∧ x; σ ===⇒ success σ(x); σ =c œ expr γRS(Res) (rest resv, σ) ( resv ∈ γRV(resv) ∧ σ ∈ γStore(σ)) x < dom σ c E-V-Er c d b x; σ ===⇒ error; σ c expr 5 Abstract Semantics c d š b A distinguishing feature of Schmidt-style abstract interpre- We follow three steps, to translate the concrete rules to ab- tation is that the derivation of abstract operational rules stract operational rules: from a given concrete operational semantics is systematic 1. For each concrete rule, create an abstract rule that andto a largeextent mechanisable[9, 43].Thecreative work uses a judgment for evaluation of a syntactic form, is therefore reduced to providing abstract definitions for con- e.g., AE-V-S and AE-V-Er for variables. ditions and semantic operations such as pattern matching, 2. Replace the concrete conditions and semantic opera- and defining trace memoization strategies for non-structurally tions with the equivalent abstract conditions and se- recursive operational rules, to finitely approximate an infi- mantic operations for target abstract values, e.g. x ∈ nite number of concrete traces and produce a terminating dom σ with σ(x) = (b,vs) and a check on b. We ob- static analysis. tain two execution rules: b b GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

matching. For brevity, we discuss a subset, including vari- σ(x) = (b, vs) ables x, constructor patterns k(p), and set patterns {⋆p}: AE-V-S x; σ =====⇒ [success 7→ (vs, σ)] p F x | k(p) | {⋆p} ⋆p F p | ⋆x a-expr-vb b σ(x) = (, vs) Rascal allows non-linear matching where the same variable AE-V-ER b b b x; σ =====⇒ [error 7→ (·, σ)] x can be mentioned more than once: all values matched a-expr-vb b against x must have equal values for the match to succeed. Each set pattern contains a sequence of sub-patterns ⋆p; Observe whenbbis true, both a success andb failure may occur, and we need rules to cover both cases. each sub-pattern in the sequence is either an ordinary pat- 3. Create a rule that collects all possible evaluations of tern p matched against a single set element, or a star pattern the syntax-specific judgment rules, e.g. AE-V for vari- ⋆x to be matched against a subset of elements. Star patterns ables: can backtrack when pattern matching fails because of non- ′ linear variable references, or when explicitly triggered by {|x; σ =====⇒ Res |} a-expr-v the fail expression. AE-V ′ This expressiveness poses challenges for developing an x; σ ====⇒ Resc b a-expr abstract interpreter that is not only sound, but is also suf- ficiently precise to prove interesting properties. The key as- The possible shapes of theb result valuec depend on the pair pects of Rabit in handling pattern matching is how we main- assigned to x in the abstract store. If the value shape of x is ⊥, tain precision by refining input values on pattern matching we drop the success result from the result set. The following successes and failures. examples illustrate the possible outcome result shapes: Assigned Value Result Set Rules 6.1 Satisfiability semantics for patterns σ(x) = (ff, ⊥ ) [] AE-V-S We begin by defining what it means that a (concrete/ab- VS stract) value matches a pattern. Figure 7a shows the con- σ(x) = (ff, [1;3]) [success 7→ ([1;3], σ)] AE-V-S b c crete semantics for patterns. In the figure, ρ is a binding ( ) = ( , ⊥ ) [ 7→ (·, )] AE-V-S AE-V-Er σ x  VS error σ , environment: b b = c [success 7→ ([1;3], σ), ρ ∈ BindingEnv Var ⇀ Value σb(x) = (, [1;3]) b AE-V-S, AE-V-Er error 7→ (·,σ)] A value v matches a pattern p (v |= p) iff there exists a bind- b Itb is possible to translate the operational semantics rules ing environment ρ that maps the variables in the pattern b for other basic expressions using the presented steps (see to values in dom ρ = vars(p) so that v is accepted by the = Appendix B). The core changes are the ones moving from satisfiability semantics v | ρ p as defined in Fig. 7a. checks of definiteness to checks of possibility. For example: Constructor patterns k(p) accept any well-typed value k(v) • Checking that evaluation of e has succeeded, re- of the same constructor whose subcomponents v match the quires that the abstract semantics uses e; σ ====⇒ Res sub-patterns p consistently in the same binding environment a-expr and (success, (vs, σ ′)) ∈ Res, as compared to ρ. A variable x matches exactly the value it is bound to in ′ the binding environment ρ. A set pattern {⋆p} accepts any e; σ ===⇒ success v; σ in the concrete semantics.b c expr set of values {v} such that an associative-commutative ar- • Typing is nowb b done usingc abstract judgments vs : t and t <: t ′. In particular, type t is an abstract sub- rangement of the sub-values v matches the sequence of sub- type of type t ′ (t <: t ′) if there is a subtype t ′′ of t patterns ⋆p under ρ. ′′ ′ ′′ ′ =⋆ (bt b<: t) thatb is also a subtype of t (t <: t ). This A value sequence v matches a pattern sequence ⋆p (v | ′ ′ implies that t <: t band t ≮:t are non-exclusive. ⋆p) if there exists a binding environment ρ such that dom ρ = • To check whether a particular constructor is possible, =⋆ vars(⋆p) and v | ρ ⋆p. An empty sequence of patterns ε ac- we use the abstractb auxiliaryb function unfold(vs, t) cepts an empty sequence of values ε. A sequence starting which produces a refined value of type t if possible— p,⋆p′ with an ordinary pattern p matches any non-empty splitting alternative constructors for› datab type sequence of values v,v ′ where v matches p and v ′ matches values—and additionally produces error if the value ⋆p′ consistently under the same binding environment ρ.A is possibly not an element of t. sequence ⋆x,⋆p′ works analogously but it splits the value ′ 6 Pattern Matching sequence in two v and v , such that x is assigned to v in ρ and v ′ matches ⋆p′ consistently in ρ. Expressive pattern matching is key feature of high-level trans- formation languages. Rabit handles the full Rascal pattern Example 6.1. We revisit the running example to under- language including type-based matching and deep pattern stand how the data type values are matched. We consider Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

k(v) |=ρ k(p) iff t are parameter types of k = ′ ′ k(vs) | ρ k(p) iff t are parameter types of k and v : t and t <: t ′ ′ ⋆ =⋆ and vs : t and t <: t and vs |= p and v | ρ p ρ b b b v |= x iff ρ(x) = v vs |= x iff ρ(x)⊑ vs ρ ρ b b b b b b = =⋆ ⋆ {v} | ρ {⋆p} iff v | ρ ⋆p {vs} |= {⋆p} iff vs, [l;u] |= ⋆p [l;ub] bρb b b ρ =⋆ ⋆ ε | ρ ε always = vs, [0;u] | ρ ε always ′ ⋆ ′ ′ ⋆ ′ b b b b b b v,v |= p,⋆p iff v |= p and v |= ⋆p ⋆ ρ ρ ρ = ′ = vs, [l;u] | ρ p,⋆p iff u > 0 and vs | ρ p ′ =⋆ ′ = ′ =⋆ ′ b b b ⋆ v,v | ρ ⋆x, ⋆p iff ρ(x) {v} and v | ρ ⋆p = ′ and vs, [l − 1;u − 1] | ρ p,⋆p = b b⋆b b b b (a) Concrete (v | ρ p reads: v matches p with ρ) = ′ = ′ vs, [l;u] | ⋆ x, ⋆p iff ρ(x) {vs }[l ′;u′] ρ b b b ′ and l ′ ≤ l and u′ ≤ u and vs ⊑ vs ′ ′ =⋆ ′ b b b andb vs, [lb− u ;u − l ] | ρ ⋆p b b = (b) Abstract (vs | ρ p reads: vs mayb match p withbρb)

Figure 7. Satisfiability semantics for patternb matchingb b b b b b matching the following set of expression values: 6.2 Computing pattern matches v The declarative satisfiability semantics of patterns, albeit

quite clean, is unfortunately not directly computable. In Ra- {mult (cst (zero) , cst (suc (zero))) , cst (zero)} bit, we rely on an abstract operational semantics (see ap- against thez pattern p = {mult}|(x,y) ,⋆w, x} in the{ environ- pendix A), translated from the concrete operational pattern ment ρ = [x 7→ cst (zero) ,y 7→ cst (suc (zero)) ,w 7→ {}]. matching semantics [2], using similar technique to the one The matching argument is as follows: presented in Sect. 5. The interesting ideas are in the refining semantic operators used, which we will discuss further. = =⋆ {v} | ρ p iff v | ρ mult (x,y) ,⋆w, x Semantic operators with refinement Since Rascal sup- iff mult (cst (zero) , cst (suc (zero))) |=ρ mult (x,y) ports non-linear matching, it becomes necessary to merge =⋆ and cst (zero) | ρ ⋆w, x environments computed when matching sub-patterns to check We see that the first conjunct matches as follows: whether a match succeeds or not. In abstract interpretation, we can refine the abstract environments when merging for mult (cst (zero) , cst (suc (zero))) |=ρ mult (x,y) each possibility. Consider when merging two abstract envi- =⋆ iff cst (zero) , cst (suc (zero)) | ρ x,y ronments, where some variable x is assigned to vs in one, ′ ′ iff ρ(x) = cst (zero) and ρ(y) = cst (suc (zero)) and vs in the other. If vs is possibly equal to vs, we re- = ′ fine both values using this equality assumptionbvs vs . Similarly, the second matches as follows: Here,b we have that abstractb equality is defined asb the great- ′ =⋆ = = est lower bound if the value is non-bottom, i.e. vs = vs , cst (zero)| ρ ⋆w, x iff ρ(w) {} and ρ(x) cst (zero) b b b {vs′′|vs′′ = vs ⊓ vs′ , ⊥}. Similarly, we can also refine both ′ The abstract pattern matching semantics (Fig. 7b) is analo- values if they are possibly non-equal vs , vs . Here,b b abstractb gous, but with a few noticeable differences. First, an abstract inequalityb b isb definedb using relative complements: value vs matches a pattern p (vs |= p) if there exists a more b b b ′ ′ ′′, ′ ′′ = ′ , precise value vs (so vs ⊑ vs) and an abstract binding en- ′ (vs vs )|vs vs \(vs ⊓ vs ) ⊥ ∪ vs , vs , = ′ = ′′ ′′ = ′ ′ vironmentb ρ with dom ρ varsb b(p) so that vs | ρ p. The  (vs, vs )|vs vs \(vs ⊓ vs ) , ⊥ reason for using a more precise shape is the potential loss b b b b b b b b b b b b b  \ of informationb during over-approximation—ab b moreb precise In our abstract domains,b b theb relativeb complementb b ( ) is lim- value might have matched the pattern, even if the relaxed ited. We heuristically define it for interesting cases, and oth- value does not necessarily. Second, sequences are abstracted erwise it degrades to identity in the first argument (no re- by shape–lengths pairs, which needs to be taken into ac- finement). There are however useful cases, e.g., for exclud- count by sequence matching rules. This is most visible in ing unary constructors suc (Nat)≀ zero \ zero = suc (Nat) or the very last rule, with a star pattern ⋆x, where we accept at the end points of a lattice [1;10]\[1;2] = [3;10]. any assignment to a set abstraction vs which has a more Similarly, for matching against a constructor pattern k(p), precise shape and a smaller length. the core idea is that we should be able to partition our value b GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski space into two: the abstract values that match the construc- children of an expression Expr refer to itself: tor and those that do not. For those values that possibly (mult (Expr, Expr) , (Expr, Expr)), match k(p), we produce a refined value with k as the only children(Expr) = (cst (Nat) , Nat), (var (str) , str) choice, making sure that the sub-values in the result are re-   fined by the sub-patterns p. Traversingœ children represented by a shape-length pair, is Otherwise, we exclude k from the refined value. For a data directed by the length interval [l;u]. If 0 is a possible value type abstraction exclusion removes the pattern constructor of the length interval, then traversal can finish, refining the from the possible choices input shape to be empty. Otherwise, we perform another tra- versal recursively on the shape of elements and recursively = exclude(k(vs)≀k1(vs1)≀···≀kn(vsn), k) k1(vs1)≀ ... ≀kn(vsn) on a new shape-length pair which decreases the length, fi- nally combining their values. Note, that if the length is un- and does not change the input shape otherwise. œ b b b b b bounded, e.g. [0; ∞], then the value can be decreased for- 7 Traversals ever and trace memoization is also needed here for termi- nation. This means that trace memoization must here be First-class traversals are a key feature of high-level transfor- nested breadth-wise (when recursing on an unbounded se- mation languages, since they enable effectively transform- quence of children), in addition to depth-wise (when recurs- ing large abstract syntax trees. We will focus on the chal- ing on children); this can be computationally expensive, and lenges for bottom-up traversals, but they are shared amongst we will discuss in Sect. 9 how our implementation handles all strategies supported in Rascal. The core idea of a bottom- that. up traversal of an abstract value vs, is to first traverse chil- dren of the value children(vs) possibly rewriting them, then 8 Trace Memoization reconstruct a new value using theb rewritten children and Abstract interpretation and static program analysis in gen- finally traversingœ the reconstructedb value. The main chal- eral perform fixed-point calculation for analysing unbounded lenge is handling traversal of children, whose representa- loops and recursion. In Schmidt-style abstract interpreta- tion and thus execution rules depend on the particular ab- tion, the main technique to handle recursion is trace mem- stract value. oization [41, 43]. The core idea of trace memoization is to ( ) Concretely, the children vs function returns a set of pairs detect non-structural re-evaluation of the same program el- ( ′, ) ′ vs cvs where the first component vs is a refinement of vs ement, i.e., when the evaluation of a program element is re- that matches the shapeœ of childrenb cvs in the second com- cursively dependent on itself, like a while-loop or traversal. ponent. For data type values the representation of children b c b ′′ b The main challenge when recursing over inputs from infi- is a heterogeneous sequence of abstractc values vs , while nite domains, is to determine when to merge recursive paths for set values (and the top element) the representation of together to correctly over-approximate concrete executions. ( ′′, [ ]) children is a pair vs l;u with the first componentb repre- We present an extension that is still terminating, sound and, senting the shape of elements and the second representing additionally, allows calculating results with good precision. their count. For example,b The core idea is to partition the infinite input domain using children(mult (Expr, Expr)≀ cst (suc (Nat))) = a finite domain of elements, and on recursion degrade in- put values using previously met input values from the same (mult (Expr, Expr) , (Expr, Expr)), œ partition. We assume that all our domains are lattices with (cst (suc (Nat)) , suc (Nat)) a widening operator. Consider a recursive operational se-   mantics judgment i =⇒ o, with i being an input from do- = and children({Expr}[1;10]) {({Expr}[1;10], (Expr, [1;10]))}. main Input, and o being the output from domain Output. Note how the children function maintains precision by par- For this judgment, we associate a memoization map M ∈ titioningœ the alternatives for data-types, when traversing PInputš→ Input×Output where PInput is a finite partition-œ each correspondingœ sequence of value shapes for the chil- ing domain that has a Galois connection with our actualb in- dren. γPI put,› i.e. Inputš−−−−−→←−−−−−œPInput. The› memoization map keeps αcPI Traversing children The shape of execution rules depend track of the previously seen input and corresponding output on the representation of children; this is consistent with the for valuesš in thec partition› domain. For example, for input requirements imposed by Schmidt [43]. For heterogeneous from our value domain Value we can use the correspond- sequences of value shapes vs, the execution rules iterate ing type from the domain Type as input to the memoization through the sequence recursively traversing each element. map.2 So for values 1 andš[2;3] we would use int, while for Due to over-approximation web may re-traverse the same or mult(Expr, Expr) we would use the defining data type Expr. a more precise value on recursion, and so we need to use trace memoization (Sect. 8) to terminate. For example the 2Provided that we bound the depth of type parameters of collections. Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

We perform a fixed-point calculation over the evaluation of Code Generation for Glagol (G2P) a DSL for REST-like input i. Initially, the memoization map M is λpi.(⊥, ⊥), and web development, translated to PHP for execution.3 We are during evaluation we check whether there was already a interested in the part of the generator that translates Glagol expressions to PHP, and the following properties: value from the same partition as i, i.e., αbP I (i) ∈ dom M. At each iteration, there are then two possibilities: P8 Output only simple PHP expressions for simple Glagol Hit The corresponding input partition keyc is in the memo-b expression inputs = P9 No unary PHP expressions if no sign marks or negations ization map and a less precise input is stored, so M(αP I (i)) ′ ′ ′ in Glagol input (i ,o ) where i ⊑Input i . Here, the output value o that c is stored in the memoization map is returned asb result. Mini Calculational Language (MCL) a programming lan- š Widen The corresponding input partition key is in the mem- guage text-book [45] implementation of a small expression oization map, but an unrelated or more precise input language, with arithmetic and logical expressions, variables, = ′′ ′′ @ ′′ is stored, i.e., M(αP I (i)) (i ,o ) where i Input i . In if-expressions, and let-bindings. The implementation con- this case we continue evaluation but with a widened tains an expression simplifier (larger version of running ex- ′ = ′′ c ′′ š ′ = input i i ∇bInput(i ⊔ i) and an updated map M ample in Fig. 2), a type inference procedure, an interpreter ′ [αP I (i) 7→ (i ,oprev)]. Here, oprev is the output of the and a compiler. š last iteration for the fixed-point calculation for inputb P10 Simplification procedure produces a simplified expres- ′ i ,c and is assigned ⊥ on the initial iteration. sion with no additions with 0, multiplications with 1 Intuitively, the technique is terminating because the par- or 0, subtractions with 0, logical expressions with con- titioning is finite, and widening ensures that we reach an stant Boolean operands, and if-expressions with con- upper bound of possible inputs in a finite number of steps, stant Boolean conditions. eventually getting a hit. The fixed-point iteration also uses P11 Arithmetic expressions with no variables have type int widening to calculate an upper bound, which similarly fin- and no type errors ishes in a number of steps. The technique is sound because P12 Interpreting expressions with no integer constants and we only use output for previous input that is less precise; let’s gives only Boolean values therefore our function is continuous and a fixed-point ex- P13 Compiling expressions with no if’s produces no goto’sand ists. if instructions P14 Compiling expressions with no if’s produces no labels 9 Experimental Evaluation and does not change label counter We demonstrate the ability of Rabit to verify type and induc- All these transformations satisfy the following criteria: tive shape properties, using five transformation programs 1. They are formulated by an independent source, across various applications. Three programs are classic ex- 2. They can be translated in relatively straightforward amples, and two are extracted from open source projects. manner to our subset of Rascal, and 3. They exercise important constructs, including visitors Negation Normal Form (NNF) transformation[27, Sec- and the expressive pattern matching tion 2.5] is a classical rewrite of a propositional formula to combination of conjunctions and disjunctions of literals, so We have ported all these programs to Rascal Light. negations appear only next to atoms. An implementation of Threats to validity. Theprogramsare not selected randomly, this transformation should guarantee the following: thus it is hard to generalize the results for other transforma- P1 Implication is not used as a connective in the result tions. We mitigated this by selecting transformations that P2 All negations in the result are in front of atoms are realistic and vary in authors, programming style and purpose. While translating the programs to Rascal Light, we Rename Struct Field (RSF) refactoring changes the name strived to minimize the amount of changes, but generally of a field in a struct, and that all corresponding field access bias cannot be ruled out entirely. expressions are renamed correctly as well: P3 Structure should not define a field with the old field name Implementation. We have implemented the abstract inter- P4 No field access expression to the old field preter in a prototype tool, Rabit, for all of Rascal Light fol- lowing the process described in sections 5 to 8. This required Desugar Oberon-0 (DSO) transformation [6, 53], translates handling additional aspects, not discussed in the paper: for-loops and switch-statements to while-loops and nested 1. Possibly undefined values if-statements, respectively. 2. Extended result state with more Control flow constructs, P5 for should be correctly desugared to while backtracking, exceptions, loop control, and P6 switch should be correctly desugared to if P7 No auxiliary data in output 3hps://github.com/BulgariaPHP/glagol-dsl GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

3. Fine-tuning memoization strategies to the different loop- ing constructs and recursive calls Figure 8. Initial and inferred refinement types for NNF

By default, we use the top element ⊤ for the types specified 1 data FIn = and(FIn, FIn) | atom(str) | neg(FIn) as input. The user can specify the initial data-type refine- 2 | imp(FIn, FIn) | or(FIn, FIn) ments, store and parameters, to get a more precise result 3 for target function to be abstractly interpreted. The output 4 data FOut = and(FOut, FOut) | atom(str) of the tool is the abstract result value set of abstractly inter- 5 | neg(atom(str)) | or(FOut, FOut) preting target function, the resulting store state and the set of relevant inferred data-type refinements. The implementation extends standard regular tree gram- the DSO transformation than those with more lines of code mar operations [1, 17], to handle the recursive equations (G2Pand MCL),is that it contains manynested traversals ex- for the expressive abstract domains, including base values, pressed as function calls: our analysis is interprocedural but collections and heterogeneous data types. We use a more handles function calls by inlining which can lead to some precise partitioning strategy for trace memoization when overhead during analysis. needed, which also takes the set of available constructors Lines 1–2 in Fig. 8 show the input refinement type FIn for into account for data types. The source code of our imple- the normalization procedure. The inferred inductive output mentation, including subject transformations, is freely avail- 4 type FOut (lines 4–5) specifies that the implication is not able. present in the output (P1), and negation only allows atoms Results. We ran the experiments using Scala 2.12.2 on a as subformulae (P2). In fact, Rabit inferred a precise char- 2012 Core i5 MacBook Pro. Table 1 summarizes the size of acterization of negation normal form as an inductive data the programs, the runtime of the abstract interpreter, and type. whether the properties have been verified. Since we verify the results on the abstract shapes, the programs then are 10 Related Work shown to be correct for all possible concrete inputs satisfy- We start with discussing techniques that could be used to ing the given properties. We remark that all programs use make Rabit infer more precise shapes and verify properties the high-level expressive features of Rascal and are thus sig- like P3 and P7. To verify P3, we need to be able to relate nificantly more succinct than comparable code in general field names to their corresponding definitions in the field purpose languages. definition map of a class, which is not possible using the pre- The runtime, varying from single seconds to less than sented non-relational abstract domains. Relational abstract a minute, is reasonable. All, but two, properties were suc- interpreteration [35] allows specifying such constraints that cessfully verified. The reason that our tool runs slower on relate values across different variables, and even inside and

4 across substructures [13, 26, 33]. For a concrete input of P7, hps://github.com/itu-square/Rascal-Light we know that the number of auxiliary data elements de- creases on each iteration, but this information is lost in our Table 1. Time and success rate for analyzing programs and abstraction of data structures. A possible solution could be properties presented earlier this section. Time is the median to allow abstract attributes that extract additional informa- of five runs. If the same time is reported for multiple prop- tion about the abstracted structures [10, 37, 49]. For P7, a erties, then they could be verified on the same input generalization of the multiset abstraction [36] for data types, could be useful to track e.g., the auxiliary statement count, Transformation LOC Runtime [s] Property Verified and show that they decrease using multiset-ordering [22] P1 ✓ like in term rewriting. Other techniques [4, 13, 51] support NNF 15 7.3 P2 ✓ inferring inductive relational properties for general data-types— P3 ✗ e.g, binary tree property—but require a pre-specified struc- RSF 35 6.0 P4 ✓ ture to indicate the places where refinement can happen. P5 ✓ Cousot and Cousot[18] present a general framework for DSO 125 25.0 P6 ✓ modularly constructing program analyses, but it requires a ✗ P7 language with a compositional control flow which Rascal 1.6 P8 ✓ G2P 350 does not have. Toubhans, Rival and Chang [40, 50] develop a 3.5 P9 ✓ 1.6 P10 ✓ modular domain design for pointer-manipulating programs 0.7 P11 ✓ supporting a rich set of fixed data abstractions, whereas our MCL 0.6 P12 ✓ domain construction focuses on providing automated infer- 298 P13 ✓ ence of inductive refinement types based on pure heteroge- 0.9 P14 ✓ neous data-structures. Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

There are similarities between our work and verification transformation directly would not finish verifying even sim- techniques based on program transformation (e.g., [21, 32]) ple properties within reasonable time. Possible constructor like partial evaluation [29] and supercompilation [48]. Our analysis [5] has been used to calculate the actual dependen- systematic exploration of execution rules for abstraction is cies of a predicate and make flow-sensitive analyses more similar to unfolding, and our use of widening is similar to precise. This is a type of shape analysis that works with com- folding. The main difference between the two techniques plex data-types and arrays, but only captures the prefix of is that abstract interpretation mainly focuses on capturing the target structures. rich domains and performing widening at syntactic program Techniques for model transformation verification based points, whereas program transformation based techniques on static analysis [19] have been suggested, but are currently often rely on symbolic inputs and perform folding dynami- focused on verification of rule errors based on types and un- cally on the semantic execution graph during specialization. definedness. Symbolic execution has previously been sug- We believe that there could benefits for the communities, to gested [3] as a way to validate high-level transformation explore combinations of these two approaches in the future. programs. However, that work targets test generation rather Definitional interpreters have been suggested as a tech- than verification of properties. Semantic typing [8, 12] has nique for building compositional abstract interpreters [20]. been used to infer recursive type and shape properties for The idea is to rely on a monad transformer stack to share the language with high-level constructs for querying and iter- implementation of the concrete and abstract interpreters. ation. The languages considered are however small calculi We believe that our interpreter would benefit by being writ- compared to the supported subset of Rascal we consider, and ten in such style5, which complements our modular domain our evaluation is significantly more extensive. construction well. To ensure termination they rely on a caching algorithm, similar to ordinary finite input trace memoiza- 11 Conclusion tion [41]. Similarly, Van Horn and Might [28] present a sys- Our goal was to use abstract interpretation to give a solid se- tematic framework to abstract higher-order functional lan- mantic foundation for analyzing programs in modern high- guages with effects and complex control flow. They rely on level transformation languages. To this end we have designed store-allocated continuations within abstract machines to and formalized a Schmidt-style abstract interpreter, includ- handle recursion, which is then kept finite during abstrac- ing partition-driven trace memoization which works with in- tion to ensure a terminating analysis. Our technique focused finite input domains. This worked well for a language like on providing a more precise widening based on the abstract Rascal with complex control flow, and can be adapted work input value, which was necessary for verifying the required for similar languages that have an operational semantics. properties in our evaluation. We believe that it could be The proposed modular construction of abstract domains was useful to look into abstract machine-based abstractions in vital for handling a language of this scale and complexity. the future, in the case that higher-order transformation lan- We implemented the interpreter as a tool, Rabit, which guages need to be handled. supports a non-trivial subset of Rascal, containing key fea- Garrigue [24, 25] presents algorithms for typing pattern tures: several traversal strategies, expressive pattern match- matching on polymorphic variant types in OCaml, where ing, backtracking, exceptions and control operators, and gen- the set of constructors for a data type is not fixed in advance. eralized looping constructs. We evaluated Rabit on classical The theory is useful since it supports inferring simple recur- transformations and on examples selected from open source sive shapes of programs, but it has its limitations: inference projects, showing it allows verification of a series of sophis- is syntactic and exact, and it is unclear how to generalize it ticated type and shape properties for these transformations. to work with the rich pattern matching constructs and het- erogeneous visitors. Haskell supports analysing coverage A Operational Pattern Matching of its pattern matching language, that includes generalized algebraic data types (GADTs) and Boolean constraints [30]. Computing Pa ern Matching The judgements are pre- While generalHaskell functioncallscanoccurin the Boolean sented in Fig. 9 for both the concrete and abstract rules. Con- constraints, the analysis treats them shallowly as function sider the concrete (top-left) judgement: a value v matches a symbols; some covering pattern matches that depend on pattern p, given a store σ, producing a sequence of bind- particular semantics of calledfunctions, will be markedfalsely ing environments ρ. The binding environments form a se- as non-exhaustive. Modern SMT solvers supports reasoning quence, since multiple concrete environments, say ρ1 and = = with inductive functions defined over algebraic data-types [38]. ρ2, can make v match against p, i.e., v | ρ1 p and v | ρ2 p. The properties they can verify are very expressive, and in- Backtracking using the fail-expression, allows the program- clude inductive semantic properties. The exact techniques mer to explore a different assignment from the sequence of employed are not very scalable, and encoding a complex environments, until no possible assignment is left. For an ordinary pattern p (top) the abstraction relation 5We only learned about this related work at a late stage is direct: an abstract store σ abstracts a concrete store σ

b GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski and a value shape vs abstracts a concrete value v. The no- If our matched abstract value possibly contains the pat- table change is that the abstract semantics uses a set of ab- tern constructork (AP-C-S rule: abstract pattern-constructor- stract binding environmentsb ϱ ⊆ Store × ValueShape × success) we produce an abstract value with k containing the sub-values refined against constructor sub-patterns: BindingEnv⊥ that not only abstracts over the sequence of concrete binding environmentsbρ, butš also, for eachœ abstract data at = ···| k(t)| ... bindingœ environment stores the corresponding refinement (success k(vs′)) ∈ unfold(vs, at) of the input abstract store σ and the corresponding refine- ? ′ ===== ? ′ ===== mentofthematchedvalueshape vs according to the matched σ ⊢ p1 ≔ vs1 ⇒ ϱ1 ... σ ⊢ pn ≔ vsn ⇒ ϱn a-match b › b a-match pattern. ′ ′ ? ′ ′ ? b (σ1, vs1, ρ1) ∈ ϱ1 ... (σn, vsn, ρn) ∈ ϱn For sequences of set sub-patternsb ⋆p, the sequence of con- AP-C-S b b b b b b ? ======′ ? crete values v is abstracted by two components: the shape σ ⊢ k(p) ≔ vs ⇒ (di σi , k(vs ), merge(ρ )) b b a-match-consb b b b b b of values vs and an interval approximating the length of the value sequence [l;u]. Both of these values are refined as a The totalb function mergeb unifies assignmentsb b from› two bind-b result of theb matching, which is captured by the abstract ing environments point-wise by names, taking the greatest binding environment ϱ (of the same type as for the simple lower bound of shapes› to combine bindings for a name. It patterns), since we treat the value refined as the abstract yields bottom for the entire result if at least one of the point- set containing the valuesb of the given shape and of given wise meets yields bottom (shapes for at least one name are cardinality. The concrete semantics of set sub-patterns also not reconcilable). Otherwise, we try to refine the matched contains a backtracking state V which is not used in the ab- value to exclude the pattern constructor in the AP-C-F rules: stract semantics, because the abstraction of set elements is coarse and we thus abstractly consider all possible subset data at = ···| k(t) | ... assignments at the same time (joining instead of backtrack- (success k ′(vs′)) ∈ unfold(vs, at) k ′ , k ing). AP-C-F1 ? σ ⊢ k(p) ≔ vs ======⇒ (σ, exclude(vs, k), ⊥) Operational Rules We will show how refinement is cal- ba-match-cons› b culated by the abstract operational semantics by presenting some of key rules for abstract pattern matching. Rascal also b b b œ b data at = ···| k(t) | ... error ∈ unfold(vs, at) allows non-linear pattern matching against assigned store AP-C-F2 variables, and it is possible to use this information for refin- ? ======σ ⊢ k(p) ≔ vs ⇒ (σ, exclude›(vs, k), ⊥) ing the input store and abstract value. In the AP-V-U rule we a-match-cons b match the variable to the value shape and restrict the shape For set patterns,b the refinementb happensb œ by patternb match- abstraction for the variable value to match the pattern. The ing set sub-patterns. binding environment does not change as the name is already ′ bound in the store. In the AP-V-F rule, the matchingfails (⊥), success {vs }[l;u] ∈ unfold(vs, sethvaluei) and then we learn that the value shape in the store should ? σ ⊢ ⋆p ≔ vs, [l;u] ======⇒ ϱ be refined to something that does not match. ›a-match⋆ AP-S-S b b = ′ ′ , ? σ(x) (b, vs ) vs ⊥VS bσ ⊢ {⋆p} b≔ vs ======⇒ ϱb vs′′ ∈(vs=vs′) σ ′ = σ[x 7→ (ff, vs′′)] a-match-set AP-V-U c ? b b b′ ′′ For example, whenb it is possibleb that the abstractedb value σ ⊢ x ≔ vs ======⇒ (σ , vs , []) = b bbb a-match-vb b b sequence (vs, [l;u]) is empty (l 0) and patterned matched against an empty set sub-pattern sequence, we can refine b = b′ b b′ the result to be the empty abstract set {⊥} (rule APL-E-B). σ(x) (b, vs ) vs , ⊥VS b 0 ′′ ′′′ ′ ′ = ′′′ (vs , vs )∈(vs,vs ) σ σ[x 7→ (ff, vs )] l ≤ u l = 0 AP-V-F c APL-E-B b ? b b ≔ ======′ ′′ ? σ ⊢ x vs ⇒ (σ , vs , ⊥) σ ⊢ ε ≔ vs, [l;u] ======⇒ (σ, {⊥ }0, {[]}) b b bbba-match-vb b b a-match⋆-1 VS We also show the AP-V-B (abstract pattern-variable-bind) b b b b A more complexb exampleb is the one whereb wec try to pattern rule which simply binds the variable in the binding environ- match a potentially non-empty value sequence against a set ment, assuming that it is possibly not assigned in the store sub-pattern sequence p,⋆p′ starting with an ordinary pat- (a free name). tern (APL-M-P). Here we pattern match against p and the σ(x) = (, vs′) rest of the sequence ⋆p′ and combine the refined results of AP-V-B ? these matches producing a refinement of the containing set σ ⊢ x ≔ vs ======⇒ (σ[x 7→ (, ⊥ )], vs, [x 7→ vs]) a-match-vb b VS value by combining the refined shapes and increasing the

b b b c b b Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA refinement of the length by the set sub-pattern sequence by This material is based upon work supported by the Dan- one. ish Council for Independent Research under Grant No. 0602- 02327B and Innovation Fund Denmark under Grant No. 7039- 00072B. Any opinions, findings, and conclusions or recom- ? ===== ′ l ≤ u u , 0 σ ⊢ p ≔ vs ⇒ ϱR a-match mendations expressed in this material are those of the au- ? thor and do not necessarily reflect the views of the funding σ ⊢ ⋆p ≔ vs, [l − 1;u − 1] ======⇒ ϱ ′′ b a-matchb ⋆ R c agencies. ′ ′ ′ ′ ′′ ′′ ′′ ′′ (σ , vs , ϱ ) ∈ ϱR (σ , {vs }[l ′′;u′′], ϱ ) ∈ ϱR b b′ ′′ ′ ′′ c ′′′ (σ ⊓ σ , {vs ⊔ vs }[l ′′+1,u′′+1], ϱ = ′ ′′ b bR b mergec (ϱ b, ϱ ))b b c References APL-M-P   [1] Alexander Aiken and Brian R. Murphy. 1991. Implement- b ? b b b cσ ⊢ p,⋆p ≔ vs, [l;u] ======⇒ ϱ ′′′ ing Regular Tree Expressions. In FPLCA 1991. 427–447. › b b a-match⋆-1 R hps://doi.org/10.1007/3540543961_21 [2] Ahmad Salim Al-Sibahi. 2017. The Formal Semantics of Rascal Light. b b c CoRR abs/1703.02312 (2017). hp://arxiv.org/abs/1703.02312 [3] Ahmad Salim Al-Sibahi, Aleksandar S. Dimovski, and Andrzej Wa- sowski. 2016. Symbolic execution of high-level transformations. In B Abstract Semantic Rules SLE 2016. 207–220. hp://dl.acm.org/citation.cfm?id=2997382 Figures 11 and 12 shows the formal rules for executing the [4] Aws Albarghouthi, Josh Berdine, Byron Cook, and Zachary bottom-up visit-expression; we have omitted the collecting Kincaid. 2015. Spatial Interpolants. In ESOP 2015. 634–660. rules and some error handling rules to avoid presenting un- hps://doi.org/10.1007/978-3-662-46669-8_26 [5] Oana Fabiana Andreescu, Thomas Jensen, and Stéphane Les- necessary details. We will further discuss the ideas behind cuyer. 2015. Dependency Analysis of Functional Specifica- the rules in a high-level fashion. tions with Algebraic Data Structures. In ICFEM 2015. 116–133. hps://doi.org/10.1007/978-3-319-25423-4_8 Executing visitors The evaluation rule for the visit-expression [6] Bas Basten, Jeroen van den Bos, Mark Hills, Paul Klint, Arnold Lankamp, Bert Lisser, Atze van der Ploeg, Tijs van der Storm, and itself is mainly concerned with evaluating the target expres- Jurgen J. Vinju. 2015. Modular language implementation in Ras- sion e to be traversed to a value, and then using a separate cal - Experience Report. Sci. Comput. Program. 114 (2015), 7–19. traversal relation to rewrite the value recursively with the hp://dx.doi.org/10.1016/j.scico.2015.11.003 sequence of cases cs. The main item to notice is how it uses [7] Marcin Benke, Peter Dybjer, and Patrik Jansson. 2003. Universes for Generic Programs and Proofs in Dependent . 10, 4 (2003), the value refined by the case patterns in case of failure (AE- 265–289. Vt-F ), turning the result into successful execution (like in our [8] Véronique Benzaken, Giuseppe Castagna, Kim Nguyen, and Jérôme running example in Sect. 2). Siméon. 2013. Static and dynamic semantics of NoSQL languages. In POPL 2013. 101–114. hps://doi.org/10.1145/2429069.2429083 Evaluating Cases During traversal, the target value will [9] Martin Bodin, Thomas Jensen, and Alan Schmitt. 2015. Certified Ab- be rewritten with a sequence of cases. The evaluation of a stract Interpretation with Pretty-Big-Step Semantics. In CPP 2015. 29– 40. hps://doi.org/10.1145/2676724.2693174 case sequence is straight-forward, iterating through the pos- [10] Ahmed Bouajjani, Cezara Dragoi, Constantin Enea, and Mihaela sible cases, pattern matching against each pattern and exe- Sighireanu. 2012. Abstract Domains for Automated Reasoning about cuting the corresponding expression when applicable. The List-Manipulating Programs with Infinite Data. In VMCAI 2012. 1–22. main idea is that, when the abstract value fails to match a hps://doi.org/10.1007/978-3-642-27940-9_1 pattern, the refined value is used to match against the rest [11] Martin Bravenboer, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Visser. 2008. Stratego/XT 0.17. A language and toolset for pro- of the cases (ACS-M-F). This ensures that the order of patterns gram transformation. Sci. Comput. Program. 72, 1-2 (2008), 52–70. influences the refinement, leading to a more precise abstract hps://doi.org/10.1016/j.scico.2007.11.003 shape that better matches the set of concrete shapes during [12] Giuseppe Castagna and Kim Nguyen. 2008. Typed iterators for XML. execution. In ICFP 2008. 15–26. hps://doi.org/10.1145/1411204.1411210 [13] Bor-Yuh Evan Chang and Xavier Rival. 2008. Rela- tional Inductive Shape Analysis. In POPL 2008. 247–260. Acknowledgments hps://doi.org/10.1145/1328438.1328469 [14] James Chapman, Pierre-Évariste Dagand, Conor McBride, and Pe- We would like to thank Paul Klint, Tijs van der Storm, Jur- ter Morris. 2010. The gentle art of levitation. In ICFP 2010. 3–14. gen Vinju and Davy Landman for discussions on Rascal and hps://doi.org/10.1145/1863543.1863547 its semantics. We would further like to thank Rasmus Møgel- [15] James R. Cordy. 2006. The TXL source transformation berg and Jan Midtgaard for discussions on correctness of our language. Sci. Comput. Program. 61, 3 (2006), 190–210. hps://doi.org/10.1016/j.scico.2006.04.002 recursive shape abstractions. We would like the anonymous [16] Patrick Cousot. 2003. Verification by Abstract Interpreta- reviewers for their comments, especially the one who pre- tion. In Verification: Theory and Practice, Essays Dedicated to sented us the link between our style of abstract interpreta- Zohar Manna on the Occasion of His 64th Birthday. 243–268. tion and verification techniques in program transformation. hps://doi.org/10.1007/978-3-540-39910-0_11 GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

same pattern abstracts binding environment sequence

? ? σ ⊢ p ≔ v ====⇒ ρ σ ⊢ p ≔ vs =====⇒ ϱ match a-match abstracts store refines abstract value abstracts input valueb b b refines abstract store

abstracts binding environment sequence same pattern refines abstract store

? ? σ ⊢ ⋆p ≔ v | V =====⇒ ρ σ ⊢ ⋆p ≔ vs , [l;u] ======⇒ ϱ match⋆ a-match⋆ abstracts store abstracts shape of input valueb sequence b refinesb length abstracts length of input value sequence refines shape Figure 9. Relating abstract operational semantics (left) to the concrete operational semantics (right).

[17] Patrick Cousot and Radhia Cousot. 1995. Formal Lan- ICFP 2015, Kathleen Fisher and John H. Reppy (Eds.). ACM, 424–436. guage, Grammar and Set-Constraint-Based Program Anal- hps://doi.org/10.1145/2784731.2784748 ysis by Abstract Interpretation. In FPCA 1995. 170–181. [31] Paul Klint, Tijs van der Storm, and Jurgen Vinju. 2011. EASY hp://doi.acm.org/10.1145/224164.224199 Meta-programming with Rascal. In GTTSE III, JoãoM. Fernan- [18] Patrick Cousot and Radhia Cousot. 2002. Modu- des, Ralf Lämmel, Joost Visser, and João Saraiva (Eds.). 222–289. lar Static Program Analysis. In CC 2002. 159–178. hps://doi.org/10.1007/978-3-642-18023-1_6 hps://doi.org/10.1007/3-540-45937-5_13 [32] Alexei P. Lisitsa and Andrei P. Nemytykh. 2015. Finite Coun- [19] Jesús Sánchez Cuadrado, Esther Guerra, and Juan de Lara. 2017. Static termodel Based Verification for Program Transformation (A Case Analysis of Model Transformations. IEEE Trans. Software Eng. 43, 9 Study). In Proceedings of the Third International Workshop on (2017), 868–897. hps://doi.org/10.1109/TSE.2016.2635137 Verification and Program Transformation, VPT@ETAPS 2015, Lon- [20] David Darais, Nicholas Labich, Phuc C. Nguyen, and David Van don, United Kingdom, 11th April 2015. (EPTCS), Alexei Lisitsa, An- Horn. 2017. Abstracting definitional interpreters (functional pearl). drei P. Nemytykh, and Alberto Pettorossi (Eds.), Vol. 199. 15–32. PACMPL 1, ICFP (2017), 12:1–12:25. hps://doi.org/10.1145/3110256 hps://doi.org/10.4204/EPTCS.199.2 [21] Emanuele De Angelis, Fabio Fioravanti, Alberto Pettorossi, [33] Jiangchao Liu and Xavier Rival. 2017. An array content static analysis and Maurizio Proietti. 2014. Program verification via iter- based on non-contiguous partitions. Computer Languages, Systems & ated specialization. Sci. Comput. Program. 95 (2014), 149–175. Structures 47 (2017), 104–129. hps://doi.org/10.1016/j.cl.2016.01.005 hps://doi.org/10.1016/j.scico.2014.05.017 [34] Neil Mitchell and Colin Runciman. 2007. Uniform boilerplate [22] Nachum Dershowitz and Zohar Manna. 1979. Proving Termina- and list processing. In Haskell 2007, Freiburg, Germany. 49–60. tion with Multiset Orderings. Commun. ACM 22, 8 (1979), 465–476. hps://doi.org/10.1145/1291201.1291208 hps://doi.org/10.1145/359138.359142 [35] Alan Mycroft and Neil D. Jones. 1985. A relational framework [23] Timothy S. Freeman and Frank Pfenning. 1991. Refinement Types for for abstract interpretation. In Programs as Data Objects. 156–171. ML. In PLDI 1991. 268–277. hp://doi.acm.org/10.1145/113445.113468 hps://doi.org/10.1007/3-540-16446-4_9 [24] Jacques Garrigue. 1998. Programming with polymorphic variants. In [36] Valentin Perrelle and Nicolas Halbwachs. 2010. An Anal- ML Workshop, Vol. 13. ysis of Permutations in Arrays. In VMCAI 2010. 279–294. [25] Jacques Garrigue. 2004. Typing deep pattern-matching in presence hps://doi.org/10.1007/978-3-642-11319-2_21 of polymorphic variants. In JSSST Workshop on Programming and Pro- [37] Tuan-Hung Pham and Michael W. Whalen. 2013. An Improved gramming Languages. Unrolling-Based Decision Procedure for Algebraic Data Types. In [26] Nicolas Halbwachs and Mathias Péron. 2008. Discovering prop- VSTTE 2013. 129–148. hps://doi.org/10.1007/978-3-642-54108-7_7 erties about arrays in simple programs. In PLDI 2008. 339–348. [38] Andrew Reynolds and Viktor Kuncak. 2015. In- hps://doi.org/10.1145/1375581.1375623 duction for SMT Solvers. In VMCAI 2015. 80–98. [27] John Harrison. 2009. Handbook of Practical Logic and Automated Rea- hps://doi.org/10.1007/978-3-662-46081-8_5 soning. Cambridge University Press. [39] Xavier Rival and Laurent Mauborgne. 2007. The trace partitioning [28] David Van Horn and Matthew Might. 2010. Abstracting abstract ma- abstract domain. ACM Trans. Program. Lang. Syst. 29, 5 (2007), 26. chines. In Proceeding of the 15th ACM SIGPLAN international confer- hps://doi.org/10.1145/1275497.1275501 ence on Functional programming, ICFP 2010, Baltimore, Maryland, USA, [40] Xavier Rival, Antoine Toubhans, and Bor-Yuh Evan Chang. 2014. Con- September 27-29, 2010, Paul Hudak and Stephanie Weirich (Eds.). ACM, struction of Abstract Domains for Heterogeneous Properties. In ISoLA 51–62. hps://doi.org/10.1145/1863543.1863553 2014. 489–492. hps://doi.org/10.1007/978-3-662-45231-8_40 [29] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial [41] MadsRosendahl. 2013. Abstract Interpretation as a Programming Lan- evaluation and automatic program generation. Prentice Hall. guage. In Semantics, Abstract Interpretation, and Reasoning about Pro- [30] Georgios Karachalias, Tom Schrijvers, Dimitrios Vytiniotis, and Si- grams: Essays Dedicated to David A. Schmidt on the Occasion of his mon L. Peyton Jones. 2015. GADTs meet their match: pattern- Sixtieth Birthday. 84–104. hps://doi.org/10.4204/EPTCS.129.7 matching warnings that account for GADTs, guards, and laziness. In Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

Expressions (General)

{|x = e;σ ======⇒ Res|} {|e1;e2; σ ======⇒ Res|} {|k(e); σ ======⇒ Res|} a-expr-asgn a-expr-seq a-expr-cons AE-A AE-Sq AE-C x = e; σ =====⇒ Resc e1; e2; σ =====⇒ Resc k(e); σ =====⇒ Resc b a-expr b a-expr b a-expr {|{e}; σ ======⇒ Res|} {|e; σ ======⇒ Res|} ba-expr-set c b c a-exprb ⋆-1 c AE-St AE-Fl AES {e};σ =====⇒ Resc fail; σ =====⇒ [fail 7→ (·,σ)] e; σ ======⇒ Resc b a-expr a-expr b a-expr⋆ Assignment Expression b c b b b c local t x ∨ global tx e;σ =====⇒ Res a-expr (success, (vs, σ ′)) ∈ Res vs : t ′ t ′ <: t AE-A-S b c x = e;σ ======⇒ [success 7→ (vs, σ ′[x 7→ (ff,vs)])] a-expr-asgnb b c b b b local t x ∨ global tx e;bσ =====⇒ Res b b b a-expr e; σ =====⇒ Res (exres, (resv, σ ′)) ∈ Res (success, (vs, σ ′)) ∈ Res vs : t ′ t ′ ≮:t a-expr AE-A-Er b c AE-A-Ex = ; ======′ = ; ======′ x e σ ⇒ [error 7→ (·,σ )] xb e σ c ⇒ [exresd7→ (bresv, σ c)] b a-expr-asgnb c b b b a-expr-asgn Sequencing Expression b b b d b e1,e2; σ ======⇒ Res⋆ e1,e2; σ ======⇒ Res⋆ a-expr⋆ a-expr⋆ ′ ′ (success, ((vs1, vs2), σ )) ∈ Res⋆ (exres, (resv, σ )) ∈ Res⋆ AE-Sq-S š AE-Sq-Ex š b ′ b ′ e1;e2; σ ======⇒ [success 7→ (vs2, σ )] e1;e2; σ ======⇒ [exres 7→ (resv, σ )] a-expr-seqb b b š a-expr-seqd b š Constructor Expression b b b b d b data at = ... |k(t)| ... e; σ ======⇒ Res⋆ a-expr⋆ (success, (vs, σ ′)) ∈ Res⋆ vs : t ′ t ′ <: t AE-C-S b š k(e); σ =====⇒ [success 7→ (k(vs), σ ′)] ba-exprb š bb b data at = ... |k(t)| ... e; σ ======⇒bRes⋆ b b a-expr⋆ ======′ ′ ′ ∃ ′ e; σ ⇒ Res⋆ (exres, (resv, σ )) ∈ Res⋆ (success, (vs, σ )) ∈ Res⋆ vs : t i.ti ≮:ti a-expr⋆ AE-C-Er b š AE-C-Ex ===== ′ ===== ′ k(e); σ ⇒ [error 7→ (·, σ )] b k(e); σ š⇒ [exres 7→d (resvb, σ )] š b b a-exprš bb b a-expr Set Expression b b b d b e; σ ======⇒ Res⋆ e; σ ======⇒ Res⋆ a-expr⋆ a-expr⋆ (success, (vs, σ ′)) ∈ Res⋆ (exres, (resv, σ ′)) ∈ Res⋆ AE-St-S b š AE-St-Ex b š ======′ ======′ {e}; σ ⇒ [success 7→ ({ i vsi }[0;|vs |], σ )] {e}; σ ⇒ [exres 7→ (resv, σ )] a-expr-set b š a-expr-setd b š à Expression Sequencesb c b b d b e; σ =====⇒ Res (success, (vs, σ ′′)) ∈ Res a-expr ′ ′ e ′; σ ′′ ======⇒ Res⋆ (success, (vs ′, σ ′)) ∈ Res⋆ b a-expr⋆c b b c AES-Em AES-Mr ′ ′ ′ ε;σ ======⇒ [success 7→ (ε,σ)] e,e ; σ ======š⇒ [success 7→ ((vs,vs ), σ š)] a-expr⋆-1 b a-expr⋆-1 b b

b b b b b b e;σ =====⇒ Res (success, (vs, σ ′′)) ∈ Res a-expr ′ ′ e; σ =====⇒ Res (exres, (resv, σ ′)) ∈ Res e ′; σ ′′ ======⇒ Res⋆ (exres, (resv, σ ′)) ∈ Res⋆ a-expr b a-expr⋆c b b c AES-Ex AES-Ex e,e ′; σ ======⇒ [exres 7→ (resv, σ ′)] e,e ′; σ ======⇒ [exres 7→ (resv, σ ′)] b a-exprc ⋆-1 d b c b a-exprš⋆-1 d b š

b d b b d b Figure 10. Abstract Operational Semantics Rules for Basic Expressions GPCE’18,November5–6,2018,Boston,MA,USA A.S.Al-Sibahi, T. P. Jensen, A. S. Dimovski and A. Wąsowski

Visit Expression e; σ =====⇒ Res (success, (vs, σ ′′)) ∈ Res e; σ =====⇒ Res (success, (vs, σ ′′)) ∈ Res a-expr a-expr ′ ′ ′ ′ cs; vs; σ ′′ ======⇒ Res (success, (vs′, σ ′)) ∈ Res cs; vs; σ ′′ ======⇒ Res (fail, (vs′, σ ′)) ∈ Res b a−buc−visit b b c b a−buc−visit b b c AE-Vt-S AE-Vt-F ′ ′ ′ ′ visit e cs; σ ======c ⇒ [success 7→ (vs , σ )] c visit e cs; σ ======c⇒ [success 7→ (vs , σ )]c b b a-expr-visit b b b b a-expr-visit b b

b b b e; σ =====⇒ Resb (success, (vs, σ ′′))b ∈ Resb a-expr ′ ′ e;σ =====⇒ Res (exres, (resv, σ ′)) ∈ Res cs; vs; σ ′′ ======⇒ Res (error, (resv, σ ′)) ∈ Res a-expr b a−buc−visit b b c AE-Vt-Ex1 AE-Vt-Ex2 ′ ′ visit e cs; σ ======c ⇒ [exres 7→ (resv, σc)] visit e cs; σ ======⇒ [error 7→ (resv, σ )] b a-expr-visit d b b b a-expr-visitc d b c

Bottom-up Traversalb of Single Value d b b d b (vs′′, cvs) ∈ children(vs) cs; cvs; σ ======⇒ Res⋆ (success, (cvs′, σ ′)) ∈ Res⋆ a−bu−visit⋆ ′ recons vs′′ using cvs′ to RCRes (success, vs′) ∈ RCRes cs; vs′; σ ′ ======⇒ Res b c œ b c b š c ab−cases š ABU-S ′ ›cs; vs; σ ======⇒ Res› c œb c a−bu−visitb−go b b

(vs′′, cvs) ∈ children(vs) b b ccs; cvs; σ ======⇒ Res⋆ a−bu−visit⋆ ′ (fail, (cvs′, σ ′)) ∈ Res⋆ recons vs′′ using cvs′ to [success 7→ vs′] cs; vs′; σ ′ ======⇒ Res b c œ b c b ša−cases ABU-F ′ š cs; vs; σ ======⇒ Res c c b œb a−cbu−visit−go b b b

b b c Bottom-up Traversal of Children cs; vs; σ ======⇒ Res (vfres, (vs′′, σ ′′)) ∈ Res a−bu−visit ′ ′ ′ cs; vs′; σ ′′ ======⇒ Res⋆ (vfres , (vs′′′, σ ′)) ∈ Res⋆ b b a−bu−visit⋆c š b b c ′′ ′ Res = vcombine(vfres, vs′′, vfres , vs′′′, σ ′) ABUC-E ABUC-M b b š š b b š ======′ ′′ cs; ε;σ ⇒ [fail 7→ (ε,σ)] cs; vs, vs ; σ ======⇒ Res a−bu−visit⋆−go c œ ša−bu−visitb ⋆š−go b b

b ABUS-E b b b b c cs; (vs, [0;u]);σ ======⇒ [fail 7→ ((⊥, 0), σ)] a−bu−visit⋆−go ′′ ′′ ′′ ′ u > 0 cs; vs; σ ======⇒ Resb (vfresb, (vs , σ )) ∈ Res cs; (vs, [l − 1;bu − 1]); σ ======⇒ Res⋆ a−bu−visit a−bu−visit⋆ ′ ′ ′′ ′ (vfres , ((vs′′′, [l ′;u′]), σ ′)) ∈ Res⋆ Res = vcombine(vfres, vs′′, vfres , (vs′′′, [l ′;u′]), σ ′) ABUS-M b b c š b b c b b š ′′ cs; (vs, [l;u]);σ ======⇒ Res š b b š ca−bu−visitœ⋆−go š b š b b b b c Figure 11. Selected Abstract Operational Semantics Rules for Traversal

[42] John M. Rushby, Sam Owre, and Natarajan Shankar. 1998. Subtypes [47] Michael B. Smyth and Gordon D. Plotkin. 1982. The Category- for Specifications: Predicate Subtyping in PVS. IEEE Trans. Software Theoretic Solution of Recursive Domain Equations. SIAM J. Comput. Eng. 24, 9 (1998), 709–720. hps://doi.org/10.1109/32.713327 11, 4 (1982), 761–783. hp://dx.doi.org/10.1137/0211062 [43] David A. Schmidt. 1998. Trace-Based Abstract Interpretation of Oper- [48] Morten Heine Sørensen, Robert Glück, and Neil D. Jones. 1996. A ational Semantics. Lisp and Symbolic Computation 10, 3 (1998), 237– Positive Supercompiler. J. Funct. Program. 6, 6 (1996), 811–838. 271. hps://doi.org/10.1017/S0956796800002008 [44] Dana S. Scott. 1976. Data Types as Lattices. SIAM J. Comput. 5, 3 [49] Philippe Suter, Mirco Dotta, and Viktor Kuncak. 2010. Decision (1976), 522–587. hp://dx.doi.org/10.1137/0205037 procedures for algebraic data types with abstractions. In POPL 2010, [45] Peter Sestoft and Niels Hallenberg. 2017. con- Manuel V. Hermenegildo and Jens Palsberg (Eds.). ACM, 199–210. cepts. Springer. hps://doi.org/10.1145/1706299.1706325 [46] AnthonyM. Sloane. 2011. Lightweight Language Processing [50] Antoine Toubhans, Bor-Yuh Evan Chang, and Xavier Rival. 2013. Re- in Kiama. In GTTSE III, JoãoM. Fernandes, Ralf Lämmel, duced Product Combination of Abstract Domains for Shapes. In VM- Joost Visser, and João Saraiva (Eds.). Lecture Notes in Com- CAI 2013. 375–395. hps://doi.org/10.1007/978-3-642-35873-9_23 puter Science, Vol. 6491. Springer Berlin Heidelberg, 408–425. hps://doi.org/10.1007/978-3-642-18023-1_12 Verification of High-Level Transformations ... GPCE ’18, November 5–6, 2018, Boston, MA, USA

Case Sequence ? σ ⊢ p ≔ vs ======⇒ ϱ (vs′, σ ′, ρ?) ∈ ϱ a-match ρ?;e; σ ′ =====⇒ Res (rest, (resv, σ ′′)) ∈ Res rest , fail a−case b c b ACS-E ACS-M-O b b b b ε; vs; σ ======⇒ [fail 7→ (vs, σ)] case p ⇒ e, cs; vs; σ ======⇒ [rest 7→ (resv, σ ′′)] a−cases−go b b c a−casesd−gob c

? b b σ b⊢ pb≔ vs ======⇒ ϱ b(vsb′, σ ′, ρ?) ∈ ϱ d b a-match ′ ρ?;e;σ ′ =====⇒ Res (fail, (resv, σ ′′)) ∈ Res cs; vs′; σ ′ ======⇒ Res ba−case b b c b ba−casesb−go ACS-M-F ′ ccase p ⇒ e, cs; vs; σ ======c ⇒ Res c b b d b a−cases−go b b

Case σ ρ;be =====b ⇒ Res (restc , (resv, σ ′′)) ∈ Res rest , fail a-expr AC-E AC-M-O ⊥;e; σ ======⇒ [fail 7→ (·, σ)] ρ; vs; σ ======⇒ [rest 7→ (resv, σ ′′)] a−case−go b b c a−case−go d b c

b b σ ρ;e =====⇒ Res (failb, (resvb b, σ ′′)) ∈ Res d b a-expr AC-M-F ρ; vs; σ ======⇒ [fail 7→ (resv, σ)] b b a−ccase−go d b c

b b b d b Figure 12. Selected Abstract Operational Semantic Rules for Traversal (Cont.)

[51] Niki Vazou, Patrick Maxim Rondon, and Ranjit Jhala. [54] Hongwei Xi and Frank Pfenning. 1998. Eliminating Array 2013. Abstract Refinement Types. In ESOP 2013. 209–228. Bound Checking Through Dependent Types. In PLDI 1998. 249–257. hps://doi.org/10.1007/978-3-642-37036-6_13 hps://doi.org/10.1145/277650.277732 [52] Glynn Winskel. 1993. Information Systems. MIT Press, Chapter 12. [53] . 1996. Compiler Construction. Addison-Wesley.