Processing an Intermediate Representation Written in Lisp

Ricardo Pena,˜ Santiago Saavedra, Jaime Sanchez-Hern´ andez´ Complutense University of Madrid, Spain∗ [email protected],[email protected],[email protected]

In the context of designing the verification platform CAVI-ART, we arrived to the need of deciding a textual format for our intermediate representation of programs. After considering several options, we finally decided to use S-expressions for that textual representation, and for processing it in order to obtain the verification conditions. In this paper, we discuss the benefits of this decision. S-expressions are homoiconic, i.e. they can be both considered as data and as code. We exploit this duality, and extensively use the facilities of the Common Lisp environment to make different processing with these textual representations. In particular, using a common compilation scheme we show that program execution, and verification condition generation, can be seen as two instantiations of the same generic process.

1 Introduction

Building a verification platform such as CAVI-ART [7] is a challenging endeavour from many points of view. Some of the tasks require intensive research which can make advance the state of the art, as it is for instance the case of automatic invariant synthesis. Some others are not so challenging, but require a lot of pondering and discussion before arriving at a decision, in order to ensure that all the requirements have been taken into account. A bad or a too quick decision may have a profound influence in the rest of the activities, by doing them either more difficult, or longer, or more inefficient. This is the case of the problem presented in this paper: deciding a textual representation for an otherwise internal representation of the platform. The CAVI-ART platform accepts programs written in a variety of languages, both imperative and functional ones, and potentially it could be extended to more. A carefully chosen intermediate represen- tation common to all of them ensures that most of the platform activities can be performed in a language independent way. The CAVI-ART Intermediate Representation (in what follows IR), introduced in [7], is an internal to which all the source languages supported by CAVI-ART are transformed. The language independent activities of the platform include termination analysis, invari- ant synthesis, verification condition generation, verification condition proving, test case generation, and some other. As we will show, the IR is the confluence point of all the tools. These tools either produce, consume, or transform the IR in different ways. As the tools are implemented in a variety of languages, we decided to use files to communicate the tools between them. This makes the tools more independent of each other, and also gives visibility to the intermediate results. So, we arrived at the point of deciding the format of these files containing IR-transformed programs. As we will see, it has not been an easy decision, given the somewhat contradictory requirements we have for these files. The final decision has been to represent the IR programs as Common Lisp S-expressions [5]. This satisfies all our requirements and also gives us for free a new possibility: that of being able to execute the

∗Work partially funded by the Spanish Ministry of Economy and Competitiveness, under the grant TIN2013-44742-C4-3-

Contribution to: © R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ PROLE 2016 R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 149

a ::= { constant } | x { variable } be ::= a { atomic expression } | f ai { function/primitive operator application } | haii { tuple construction } | C ai { constructor application } e ::= be { binding expression } | let hxi :: τii = be in e { sequential let. Left part of the binding can be a tuple } | letfun defi in e { recursive let for function definitions } | case a of alti[; → e] { case distinction with optional default branch }

def ::= f (xi :: τi) :: yi :: τi = e { function definition. Output results are named }

alt ::= C xi :: τi → e { case branch } τ ::= α { type variable } | T τi { type constructor application }

Figure 1: CAVI-ART IR syntax

IR. Additionally, the Common Lisp environment provides a set of facilities, such as -expansion, evaluation while compiling, and some other, that can make the processing of the IR files a very systematic and uniform task, independently of whether we want to execute it, or to do any other processing. The plan of the paper is as follows: In Sec. 2 we explain the different uses of the IR and all the requirements that these uses pose on the IR-files; then, in Sec. 3 we justify the decision of using Lisp S- expressions; in Sec. 4 we introduce the facilities of the Common Lisp environment and present a common scheme for doing all kind of processing one may wish to do with IR files; as two instantiations of this scheme, we show how to execute an IR file, and how to extract the verification conditions for proving the program correct. A running example is used all across the paper, and in Sec. 5 we show the resulting goals of the second instantiation of the scheme. Finally, Sec. 6 concludes.

2 Contents and Uses of the Intermediate Representation

The IR abstract syntax is reproduced in Fig. 1, where the vector notation z is an abbreviation for z1,...,zn. One of its aims was to achieve minimality, as it was clear for us that the bigger the number of the differ- ent IR constructions, the longer, more difficult, and more error-prone would be the above cited tasks. In this sense, iteration and recursion are unified into a single construction letfun which allows to define sets of mutually recursive functions. The imperative assignments are translated into sequential let expres- sions, and an SSA1 transformation ensures that no variable is assigned more than once. All conditional statements, both of imperative and functional languages, are transformed into case expressions. The rest of the IR consists of function and constructor applications, tuples and atoms. The IR is a functional language, so there is nothing like destructive update, output function arguments, or arguments passed by reference. When a function returns more than one single result, then it returns a tuple. An atom is either a literal constant or a variable, and in several places of the IR atoms are mandatory. For instance, in actual arguments of an application. These restrictions make it easier the activities related to verification condition generation and proving. For a broader motivation of the decisions leading to the IR, see [7].

1 Static Single Assignment. See for instance [1, Chap. 19]. 150 Processing an Intermediate Representation Written in Lisp

quicksort (v :: array int, n :: int) :: (vres :: array int) = {Q : n = length(v)} {R : sorted(vres) ∧ permut all(v,vres)} v o i q s o r t ( i n t v [ ] , i n t i , i n t j ){ letfun // Pre:0 <=i <=j < l e n g t h(v) qsort (v :: array int, i :: int, j :: int) :: (vres :: array int) = i n t p ; {Q1 : 0 ≤ i ≤ j < length(v)} {R1 : sorted sub(vres,i, j + 1) ∧ permut sub(v,vres,i, j + 1)} i f ( i

Figure 2: The quickSort algorithm we will use as running example

We will use the algorithm quicksort as a running example in this paper. Fig. 2a shows a source ver- sion in C++. Once transformed into the IR, it has adopted the form shown in Fig. 2b. Preconditions and postconditions are logic formulas which may use atomic predicates introduced in externally axiomatized theories. In the example, we use the predicates sorted, permut for checking the corresponding proper- ties on the full length of vectors, and sorted sub, permut sub for reasoning about subvectors. These predicates are part of Why3 [3], which is the underlying platform that CAVI-ART uses for discharging formulas. The IR is strongly typed in a polymorphic Hindley–Milner-like type system. After the transformation from the source language (either by translating the user given types, or by inferring them), the IR is assumed to be annotated with types. In Fig. 1 it can be seen that types are included at all the defining occurrences of variables, i.e. in the formal arguments and results of function definitions, in the let-bound variables, and in the case-bound variables introduced by pattern-matching. The syntax of types is also shown in Fig. 1. The type constructors are language-dependent, and also new ones can be introduced by datatype declarations, not shown. A monomorphic type like int is considered to be an argument-less type constructor. In the example, we assume the type constructors int, bool, and array to be defined. Arrays have integer indices and are polymorphic in the elements. The IR also contains assertions. These may be either given by the user or synthesised by the platform. In this paper, we will assume that assertions are included in the IR in a quantity enough to be able to automatically extract the verification conditions (in what follows, VC) that, if discharged, prove the correctness of the program. These assertions include at least the precondition and postcondition of the top-level function to be proved, and the invariant assertion of each loop. In case that the original program contains a recursive function (as it is the case in our running example), then the precondition and postcondition of this function is also required. Since the transformation may introduce new functions not corresponding to any source function or method (in the example, this the case of f1), it cannot be expected that every function of the IR has R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 151 associated a precondition and a postcondition. The VC generator (in what follows, VCGEN) must take care of this lack of orthogonality, as we will show in Sec. 4. The syntax of assertions is very similar to that of Why3 formulas, because in the end we send the VC generated by VCGEN to this platform in order to prove them. The Why3 platform, in turn, uses a number of SMT solvers, such as Z3 [8] or CVC4 [2], in order to prove these formulas. Assertions are also typed. We call a verification unit (in what follows, VU) to the minimal unit that a user may wish to verify. This typically includes a method, or a set of related methods. We will call them the top-level functions of the VU. These methods may call to external methods belonging to other VUs. For these external methods, we assume that we have their respective preconditions and postconditions, independently whether they have already been verified or not. After the transformation, the IR of a VU will consist of: 1. A header naming the imported VUs and the imported Why3 theories needed to resolve the names of the atomic predicates occurring in assertions. 2. A collection of unspecified metadata, like source language, version, VU status, etc.. 3. For each top-level function, its precondition and its postcondition. 4. For each top-level function, its body will consist of a single letfun expression, defining a collection of mutually recursive auxiliary functions representing its basic blocks. 5. If one of the auxiliary functions is called as non-tail recursive (this is detected because it is called in the binding expression of a let), then we require also a precondition and a postcondition for it. Under the above requirements, the VCGEN will be able to automatically extract the VC that, if proved, will ensure the VU correctness. This processing is described in detail in Sec. 4. But VC extraction is only one of the possible IR uses done by the platform. There are other uses that pose different requirements on it. For instance, the IR is both the input and the output for the different static analysers. The most important ones are: Termination analyser This tries to prove the termination of user programs in their IR transformed form. To this aim, it synthesises ranking functions decreasing at each transition of the function call graph induced by the program IR. Should it succeed, then an IR annotated with ranking functions would be returned. Invariant synthesiser Given an IR with the precondition and the postcondition of a top-level function and with those of the external methods, this tool tries to infer all the intermediate assertions of the method. These include the invariant assertions of loops and the preconditions and postconditions of recursive functions. In a companion paper [6] we present some preliminary, but promising results, based on the liquid types approach [10]. As a result, an IR annotated with the inferred assertions is returned. Another use is for generating test cases. As the IR is annotated with at least the preconditions and the postconditions, then black-box test cases can be generated based on these specifications. Also white- box ones may be generated, covering, with different covering criteria, the paths of the program in its IR transformed form. Since they are based on the IR, these tools, currently under development, will be language independent. A last use of the IR is that we require it to be executable. This is a bit of surprise because from the beginning, the IR was thought as an internal representation for just easing the verification tasks. No emphasis was placed on making it efficient, or even on giving it an operational semantics. For instance, 152 Processing an Intermediate Representation Written in Lisp since the IR is functional, the operation modifying an array conceptually creates a new array, because this is good for verification purposes. The reasons for deciding to execute the IR are the following:

• As a feedback to check whether the transformation undergone by the source language program is free of bugs. • To extend executability also to the assertions, and then to have language independent test case execution. This will lead us to develop language independent testing tools similar to QuickCheck [4]. Then, test case generation, test execution, and test result checking, could all be automatically done by just pressing a button. • Also, to make it possible to develop language independent debuggers.

3 Choosing a Textual Representation for the IR

The IR, as defined in Fig. 1, is nothing more than an abstract syntax. The IR-processing tools deal with IR-transformed programs as Abstract Syntax Trees (AST). For instance, if a tool is written in Java, this AST is a collection of objects belonging to a hierarchy of classes and subclasses. If the tool is written in Haskell, then the AST is a term belonging to a set of mutually recursive algebraic datatypes. But, as said in the Introduction, we have decided to communicate the tools through IR-files, and then we need to decide a format for them. Since one of the purposes of storing the IR in files is to give visibility to the intermediate results, then the format should be a textual one. A textual representation of the IR has two other important benefits:

• Files can be edited, which is good for experimenting with variants, repairing errors on the fly, or temporarily removing undesired parts. • In case of necessity, example programs could be directly written in the IR, without having to wait for completing a transformer from the source language.

Textual files imply the need of parsing, in order to convert them into the different ASTs, and also of pretty-printing, in order to convert the ASTs into textual files. We considered some standard alternatives including widely used descriptive formats such as JSON or XML, which provide parsers and pretty- printers for many platforms and have a simple syntax. We also considered the possibility of embedding the IR in a programming language, such as Java or Haskell, which would also facilitate the execution of the IR. The first two alternatives were discarded because of their verbosity, which made it tedious the task of writing them by hand. Moreover, JSON dictionaries do not prescribe an order on objects, which represents a drawback in our context, since for example the arguments of a function must follow an order. The last two alternatives were considered to be too restrictive. Finally, we have chosen Common Lisp S-expressions as textual representation of the IR. We have found that they meet all the enumerated requirements, and provide additional benefits in our context:

• They support a minimalist and clean syntax that reduces the impedance of the overall process. Its parsing and pretty-printing are almost trivial. • Even its minimality, they are still reasonably legible. • A well known feature of S-expressions is that they can be considered data as well as source code for language processors such as Lisp. In our context, they can be seen as executable ASTs, which means that IR programs could be executed with a minimal effort. R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 153

(define quicksort ((v (array int)) (n int)) ((vres (array int))) ( d e c l a r e ( a s s e r t i o n ( p r e c d ( and (@ <= 0 (@ l e n g t h v ) ) (@ = n (@ l e n g t h v ) ) ) ) ( p o s t c d ( and (@ sorted vres) (@ p e r m u t all v vres))))) ( l e t f u n ((qsort ((v (array int)) (i int) (j int)) ((vsort (array int))) ( d e c l a r e ( a s s e r t i o n ( p r e c d ( and (@ <= 0 i j ) (@ < j (@ l e n g t h v ) ) ) ) ( p o s t c d ( and (@ sorted sub vsort i (@+1 j)) (@ permut sub v vsort i (@+ 1 j)))))) ( l e t ((b bool)) (@ < i j ) ( case b ((the bool true) (@ f1 v i j)) ((the bool false) v)))) (f1 ((v (array int)) (i int) (j int)) ((result (array int))) ( l e t ((v1 (array int)) (p int)) (@ partition v i j) ( l e t ((p1 int)) (@ − p (the int 1)) ( l e t ((v2 (array int))) (@ qsort v1 i p1) ( l e t ((p2 int)) (@+ p (the int 1)) (@ qsort v2 p2 j))))))) ( l e t ((n1 int)) (@ − n (the int 1)) (@ qsort v (the int 0) n1))))

Figure 3: CL-IR S-expression corresponding to the quicksort IR

In fact, writing code for the IR is essentially writing Lisp code, which is facilitated by user friendly editing environments such as Emacs. We introduce below the concrete syntax of the IR representation by using S-expressions. An S- expression is an atom, also called symbols, or it is a list of S-expressions enclosed in parentheses. For- mally: hsexpi ::= symbol | ’(’ hsexpi+ ’)’ Applications are always written in prefix form. The representation of an AST consists of an untyped, arbitrarily nested list. In what follows we will call CL-IR (Common Lisp IR) to the textual representation of the IR. Fig. 3 shows the CL-IR corresponding to the quicksort algorithm of Fig. 2b. Some remarks about the notation: the reserved symbol define is used to define the top-level function quicksort; pre and post-conditions are written (declare (assertion (precd . . . ) (postcd . . . ))); the symbol @ is used to denote function applications, always in prefix notation. Constructor applications are not needed in this example, but the application of a constructor C to arguments a1,...,an will be expressed (@@ C a1 ... an).

4 Processing the Intermediate Representation

The idea of representing programs as symbolic expressions comes from the Lisp language family, one of the first high-level programming languages [5]. In this section, we explain how to take advantage of the Lisp features for processing S-expressions, to generate different compiled code for different ways of processing of the same IR-file. We illustrate the scheme with two instantiations of the method: one for executing the IR-file, and another one for generating its formal verification conditions. 154 Processing an Intermediate Representation Written in Lisp

Common Lisp Runtime

AST Object code parsing macroexpansion text file.clir / + execution Execution result reading compilation

Common Lisp enviornment

Figure 4: The different evaluation steps that happen in Common Lisp

4.1 Why to use Lisp to process the IR

Lisp was the first language to use symbolic expressions for representing code. Analogously to Lisp, the S-expressions of the CAVI-ART CL-IR are program representations which could be evaluated according to a given semantics. We found out that we could implement different semantics for our IR in a Lisp environment, in such a way that we could generate different programs that would lead to different results when executed. In this way, instead of compiling the IR with a fixed goal, we explored the route of giving different operational semantics when compiling the IR-programs, a route that leads to the different results we want. After considering as candidates for our implementation some dialects of the Lisp family, we chose Common Lisp mostly due to its package system, its non-hygienic macro system, and its editing facilities supported by advanced editors such as Emacs. The Common Lisp Standard was last modified by ANSI in 1994 [9], and thanks to the standardisation effort, it has a very precise semantics. Every Common Lisp implementation has an interactive prompt, called the REPL (Read-Eval-Print Loop), which is similar to the prompt of most interpreted languages. However, files are compiled and loaded inside this prompt, that is, there is no separate utility outside the runtime to compile Common Lisp programs. In other words, the compiler is embedded in the runtime, and thus it can be invoked at any time. As a consequence, the compiler has access to the runtime features of the language. When a file is requested from the REPL, it is first parsed and then loaded into the runtime envi- ronment, then its definitions come into effect, then it is compiled, and then it is executed. Since all this happens as a single process, it is important to distinguish, from the Common Lisp’s implementation point of view, three different phases: read-time, compile-time (or load-time, if the file was compiled and just needs to be loaded into the environment), and run-time. A graphical explanation of the different phases can be seen in Fig. 4. As a fundamentally interactive environment, the Common Lisp environment can be altered at run- time, by providing definitions to the REPL. However, it is also possible to alter the semantics of the compile-time behaviour, by rewriting the source to other expressions after they have been read, but before they are compiled. Common Lisp provides this possibility during the compilation step by sup- porting macros. As the language is homoiconic, —i.e. code and data shares the same S-expression representation—, macros in Common Lisp are not lexical text substitutions (as they are the C Preproces- sor Macros), but they are syntactic. A macro in Common Lisp is, essentially, a compile-time function R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 155

> (macroexpand-1 ’(when var some-code)) (IF 5 > (eval ’(when t "Hello, world")) (PROGN some-code) "Hello, world" NIL)

(a) Macroexpansion (b) Compilation and evaluation

Figure 5: Sample macroexpansion, compilation and evaluation in Common Lisp which receives the uncompiled inner S-expressions where it was invoked, and returns the resulting ex- pression which will be finally compiled. This process, called macroexpansion, also happens inside the Common Lisp runtime. Thus, any previously defined function will be available during the macro execution. This means that Lisp macros can execute arbitrary code in order to generate their result, including other functions or other macros. After a file is compiled, it is evaluated. The evaluation of a Common Lisp function may produce side effects, and also return one or more values. Due to the homoiconicity of the language2, these compilation and evaluation facilities are easily accessed at runtime. The macroexpansion of a builtin macro is shown in Fig. 5a, and full compilation and evaluation is shown in Fig. 5b. According to the Common Lisp specification, eval performs both steps in a compiler-based implementation3, so in fact, eval calls macroexpand before actually running the code. All the symbols used in a Common Lisp program must belong to a package, which is the feature Common Lisp has for dividing units into namespaces to avoid name clashes. Thus, although there exists a standard Common Lisp length function, the user can define a custom length function in another package, provided the custom package is instructed not to import the homonym symbol from the common-lisp package. By defining symbols with the same name, but holding the definitions in different isolated packages, we will give our CL-IR different semantics in the following two sections, depending on the current package the IR-file is loaded into.

4.2 Executing the IR

Our IR is an S-expression, so we can use the macroexpansion and evaluation facilities coming with any Common Lisp implementation to convert our definitions into valid Common Lisp programs as part of the compilation process in Common Lisp. We use a custom package to define our CL-IR syntactic symbols as Common Lisp macros, thus giving it an operational semantics inside our host language, without clashing with any of the built-in Common Lisp names. For example, our toplevel function definitions are introduced by a toplevel list with a first symbol define. This gets transformed to a defun declaration in the host language. An example of a small program expansion can be seen in Figure 7. The executable semantics of our macros is summarized in Table 1. Our IR-files are read and loaded into an environment where our custom package is active, so that the host language’s symbols are not in , and then our names cannot clash with those of Common Lisp (e.g., both languages have let or the, with different meanings). Thus, by leveraging the Common Lisp read and eval constructs, we are able to read our IR source file, to parse it into an AST, to expand the IR AST into Common Lisp code, to compile it, and then load the

2 In Common Lisp code can be represented as literal lists, and lists can be evaluated as code, by using the eval function. 3 Although code in Common Lisp can be interpreted as well as compiled, we deliberately omitted interpretation from this explanation, because although the same idea holds, it would unnecessarily complicate the rules of eval and the interpretation of when macroexpansion and evaluation happens. 156 Processing an Intermediate Representation Written in Lisp

Figure 6: The evaluation steps in Common Lisp to evaluate the CAVI-ART IR

> ( macroexpand-1 ( defun factorial (n) ’( define factorial ((n int)) ((result int)) ( declare ( type int n)) (declare ( declare (assertion (precd (@ >= n 0)) (assertion (precd (@ >= n 0)) (postcd (@ = result (@ fact n))))) (postcd (@ = result (@ fact n))))) ( case‡ n ( the int (( the int 0) ( the int 1)) ( case n (default (0 1) ( let‡ ((n1 int)) (@ - n ( the int 1)) (t ( let‡ ((f1 int)) (@ factorial n1) ( let ((n1 ( the int (- n 1)))) (@ * n f1)))))) ( let ((f1 ( the int (factorial n1)))) (* n f1))))))) ‡This symbol is our case/let operator, not Common Lisp’s.

Figure 7: Macroexpansion of define resulting code into the environment, so that it could be called at the REPL by the user with appropriate arguments. This process mimics the one given in Fig. 4, and is shown in Fig. 6, which can be seen as an instantiation of that prior figure. For the expert reader, the macro definition for define is shown in Fig. 8. We call macroexpand-1 in the body, although this is not strictly necessary, as the compiler would walk through the code and expand the remaining parts for us, but by expanding the macros with our own macroexpansion we can generate the whole final representation in just one step, without having to walk the AST, because we know where the possible macros will occur.

4.3 Collecting the VCs from an IR-file The main purpose of our internal representation is to be verifiable. The usual technique for generating the verification conditions (VCs) of a program is to code-walk the source, or its AST, generating the needed conditions as a byproduct of the navigation. As our program is represented by an S-expression, we can use however the parsing features of our Lisp host. By also using Common Lisp macros, we can concentrate ourselves in a more declarative algorithm for constructing the VCGEN. A good side-effect of using macros for expanding the program into a VCGEN is that we no longer need to explicitly program the code-walking. We instead get it for free, as it could be seen in the execution example below. Our VCGEN will consists of the same macro-expanded grammar used in Sec. 4.2, but with a different operational semantics. The new macros will generate a program, which at runtime will R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 157

( defmacro ir.rt.core:define (function-name typed-lambda-list result-lambda-list declaration &body full-body) ( let ((function-lambda-list ( mapcar #’car typed-lambda-list))) ‘( defun ,function-name ,function-lambda-list ( declare ,@(lambda-list-type-decls typed-lambda-list)) ,declaration ,(enclose-in-typed-return-type result-lambda-list ( macroexpand-1 ( car full-body))))))

Figure 8: Definition of the define macro in terms of defun

IR construct Lisp construct Explanation define defun defines a toplevel function in the host environment letfun labels defines mutually recursive functions in the lexical scope case case/destructuring-bind our case construction is a super-set of Common Lisp’s, so we may need to destructure tuples and constructor application let let/destructuring-bind our let is a one-binding let which may also destructure tuples the the we use this construct from CL, but only for typing constants) @ function application CL can apply functions with (fname args): we just drop the @ variable usage variable usage our variable usages are the same as Common Lisp’s

Table 1: Executable operational semantics of the IR compute the VCs by a process that is later explained in this section. In this way, we do both implementations under the same scheme. The only difference for the host implementation is the current package that our IR file is read within. If read in the executable package, the IR program is transformed to a Common Lisp program which will have the same operational semantics as the IR. If read in the verification conditions package, the IR program is transformed to a specialised Common Lisp program which, when executed, will output the VCs, which we will call goals. The transformation process of the CL-IR program into goals may not appear as straightforward as we say, but it is conceptually very similar to the transformation performed on the CL-IR for generating an executable version. This can be confirmed in Fig. 9, by comparing it with the prior figures 4 and 6. Although our semantics for VC generation is implemented as a collection of Common Lisp macros, one for each CL-IR symbol, we consider that the transformation could be better understood as a math- ematical function, which we will call collect. The VC generation process should end up with a list of logic formulas or goals that, if discharged, will prove the program correct. Each goal corresponds to a path in the program between two assertions, and consists of an implication, i.e. a list of premises, and a consequence which is properly the goal. The algorithm must account for non-tail recursion, and deal with the possible lack of preconditions and postconditions in some of the functions that were artificially generated by the source-to-IR transformation. We deal with the lack of preconditions and postconditions by generating dummy symbolic ones: if a function does not have a pre/post condition when it is requested by the algorithm, then an uninterpreted symbolic one is generated. This is the case of Q f1 and R f1 in our running example. In the examples, we will use Q for preconditions, R for postconditions, and boldface fonts when referring to symbolic predicates. Then, the goal recollection will be a two-steps process: in the first one, we collect a list of protogoals, possibly containing these uninterpreted dummy symbols, and in a second one we will assemble the protogoals by replacing the dummies by their definitions, and then output the resulting goals. 158 Processing an Intermediate Representation Written in Lisp

Figure 9: The evaluation steps in Common Lisp to generate verification conditions in the CAVI-ART IR

The collect function requires a Context carrying the functions currently in scope in the AST node being parsed, and the current function inside which the expression takes place. A context is then a pair (scope,currentFunction). The function also needs a Prefix, which contains the current assumptions p1 → ··· pi → holding in the traversed code path, and the remaining expression to be evaluated. The collect function then traverses the AST in a depth-first way, and returns a set of protogoals. A protogoal is the VC which must hold in a path starting and ending in an assertion. One of them, or both, may be symbolic ones. We then give collect the following type:

collect :: Context → Prefix → Expression → Set Protogoal Its pseudocode is depicted in Figure 10. There, we make use of several auxiliary functions, whose name should be descriptive enough. Collecting the protogoals of a letfun consists of collecting the ones of each defined function, and adding them to those of the main expression. The prefix of the main expression is the current one, while the prefix of each defined function starts with the assumption of its respective precondition. If the remaining expression is an atom, a tuple, or a constructor application, these represent base cases, since the postcondition of the current function has been reached. Then, a protogoal is built in which the current prefix implies that postcondition, by previously substituting the actual results for the formal ones. In a let, we distinguish two cases: • The auxiliary expression is a function application with a precondition and a postcondition. In that case, a protogoal ends up in the precondition, and a second one continues with the main expression, by previously adding the postcondition to the prefix. • The auxiliary expression is an atom, a tuple, a construction, or a primitive operator application (see Fig. 1). Then, the equality is added to the current prefix, and the protogoal continues with the main expression. The protogoal recollection of a case, in essence consists of the protogoal recollection of all its branches. The prefix of each branch incorporates its respective branching condition. Finally, when the remaining expression is a function application, the treatment is very similar to the one explained for the first case of a let. The difference here is that there is no main expression, but instead the postcondition of the current function has been reached. After collect has been run, the resulting protogoal set needs to be assembled. We can distinguish three different protogoal cases: R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 159

The following definitions are always in scope:

R = currentPostcondition(c) or R f for current function f

[resulti] = currentResultNames(c); when results is expected to be a sequence [result] = currentResultName(c); when it is expected to be only one value

 0 S collect c, p,(letfun fi,xi j,yi j,Qi,Ri = ei in e) = collect(c , p,e) ∪ collect(ci,[Qi],ei) i 0 S 0 0 where (scope, f ) = c scope = scope ∪ {( fi,xi j,yi j,Qi,Ri)} i 0 0 0 c = (scope , f ) ci = (scope , fi) 0 0 Qi = Qi if Qi is defined; Q fi otherwise; Ri = Ri if Ri is defined; R fi otherwise

collect(c, p,(a :: atom)) = {p → R[a/result]}

a collect(c, p,(tuple(ai)) = {p → R[ i/resulti]}

collect(c, p,(@@ C ai) = {p → R[C ai/result]}

collect(c, p,(let xi = (@ f ai) in e) = g0 ∪ collect(c, p → r,e)

where q = precond( f ,c)[ai/args(f,c)] ( f must have a precondition in a let rhs) ( {p → q} if q 6= true g0 = /0 if q = true r = postcond( f ,c)[ai/args( f ,c), xi/result( f ,c)]

collect(c, p,(let xi = e1 in e2) = collect(c, p → (xi = e1),e2)  S collect c, p,(case a of pi ei, default edef ) = collect(c, pde f ,ede f ) ∪ collect(c, pi,ei) i V where pi = p → (a = pati); pde f = p → ( a 6= pati) i S a collect(c, p,(@ f ai) = {p → q} ∪ {p→ q→ r [ i/xi, resulti/r fi] → R} i a where q = precond( f ,c)[ i/xi]; xi = args( f ,c); r = postcond( f ,c); [r fi] = results( f ,c)

Figure 10: collect function for VC generation

1. Proper goals. These are formulas with no placeholders, i.e. they start and end with real asser- tions, and have no symbolic assertions anywhere. These can be directly output without further processing.

2. Protogoals starting or ending, or both, in a symbolic assertion, i.e. they have the form Q f → ϕ → A, or A → ϕ → Q f , or Q f → ϕ → Qg, where A is a real assertion. 3. Protogoals having ellipsis. These are protogoals with missing parts, i.e. those containing a sym- bolic postcondition at some point, except at the beginning or at the end of the logic formula.

The assemble function receives the protogoal set, two sets of respectively symbolic preconditions and postconditons, and produces the set of assembled goals. Its type is the following:

assemble :: Set Protogoal → Set SymPre → Set SymPost → Set Goal (1) 160 Processing an Intermediate Representation Written in Lisp

> (load "runtime/init.lisp") > (load "vc-gen/init.lisp") > (in-package :ir.rt.user) > (in-package :ir.vc.user) # # > (load-eval-file "test/qsort.clir") > (load-eval-file "test/qsort.clir") (QUICKSORT::QUICKSORT) (QUICKSORT::QUICKSORT) > (quicksort::quicksort #(5 98 8 3 2933)) > (quicksort::quicksort) #(3 5 8 98 2933) #

(a) Executable (b) VCGEN

Figure 11: Evaluation of the IR with two different semantics

The algorithm of assemble is not shown, but it can be succinctly explained as a graph traversal problem: the real assertions at the beginning of some protogoals are considered as source vertices of the graph, the ones at the end, as sink vertices, and the symbolic assertions, as intermediate vertices; the protogoals themselves are the graph edges; the protogoals having ellipsis are dealt with in a special way: each symbolic postcondition R f for a function f is replaced by the protogoal starting at Q f end ending at R f ; after building the graph in this way, the goals are all the paths from a source to a sink.

5 An Example

The quickSort algorithm is a good example for testing our transformation because it presents most of the problems we have explained in this section. Namely, quickSort has a case construct for determining the base case, and two mutually recursive functions, one of them called in a non-tail position, and so endowed with a real precondition and postcondition. The other one, f1, has symbolic ones. The algorithm also uses an axiomatized external function partition, whose code is not included in the verification unit. Our IR executable semantics gives rise to the dialog reproduced in Fig. 11a: the quicksort CL-IR is loaded, translated into Lisp code, then compiled, and then our top-level function is called within the Common Lisp environment, by passing it an appropriate list argument. On the other hand, our VCGEN semantics allows the same CL-IR program to generate a goal set, i.e. theverification conditions, as it can be seen in Fig. 11b. From the user perspective, the only difference was a change in the current package, and then the same code would perform a different operation. In Fig. 12 we show some of the protogoals produced by the collect phase of VCGEN. The goal- set produced by the VCGEN semantics is written in the syntax of the IR assertions, so translating it to Why3’s syntax does not require much more than printing the operators in infix notation and taking care of the operator precedence. An example of the goals produced by the VCGEN semantics can be seen in Fig. 13.

6 Conclusions

We have presented the abstract syntax of the CAVI-ART IR (Intermediate Representation) and have chosen a textual representation for it, called CL-IR, as a Common Lisp S-expression. We have shown how the same CL-IR file can be processed in different ways by the Common Lisp environment by just changing the current package in the context of which the CL-IR file is read. This makes the name space to be resolved differently, and consequently the collection of macros expanded to be also different. We R. Pena,˜ S. Saavedra, J. Sanchez-Hern´ andez´ 161

Qquicksort = 0 < length(v) ∧ n = length(v) Rquicksort = sorted(v) ∧ permut all(v,vres) Qqsort = 0 <= i <= j < length(v) Rqsort = sorted sub(vsort,i, j + 1) ∧ permut sub(v,vsort,i, j + 1)

Q f1 = true resulti R f1 = ∀v :: array int,i :: int, j :: int. Q f1 → ∀v1 :: array int, p :: int. Qpartition → Rpartition[ /(v1, p)] → ∀p1 : int. p1 = p − 1 → ∀v2 :: array int. 0 <= i <= p1 + 1 <= length(v1) → sorted sub(v2,i, p1 + 1) ∧ permut sub(v1,v2,i, p1 + 1) → ∀p2 :: int. p2 = p + 1 → 0 <= p2 <= j + 1 <= length(v2) → ∀result :: array int. sorted sub(result, p2, j + 1) ∧ permut sub(v2,result, p2, j + 1)

Figure 12: Extracted pre/postconditions for quickSort

goal QSORT__4: (**) forall v:(array int),i:int,j:int. (*QSORT_*) ((0) <= (i) <= (((1) + (j))) <= ((length v))) -> (**) forall b:bool. (* inlet*) (b) = (((i) <= (j))) -> (* case_0*) ( true ) = (b) -> (**) forall v:(array int),i:int,j:int. (* letpre*) ((0) <= (i) <= (j) /\ (j) < ((length v))) -> (**) forall v:(array int),i:int,j:int. (**) forall v1:(array int),p:int. (* inlet*) ((0) <= (i) <= (j) /\ (j) < ((length v))) -> (**) (permut_sub v v1 i ((1) + (j)) /\ (i) <= (p) <= (j) /\ forall j1:int. (((i) <= (j1) /\ (j1) < (p))) -> (((v1[j1])) <= ((v1[p]))) /\ forall j2:int. (((p) < (j2) /\ (j2) <= (j))) -> (((v1[j2])) >= ((v1[p])))) -> (**) forall p1:int. (* inlet*) (p1) = (((p) - (1))) -> (* letpre*) ((0) <= (i) <= (((1) + (p1))) <= ((length v1))) -> (**) forall v:(array int),i:int,j:int. (**) forall v1:(array int),p:int. (* inlet*) ((0) <= (i) <= (j) /\ (j) < ((length v))) -> (**) (permut_sub v v1 i ((1) + (j)) /\ (i) <= (p) <= (j) /\ forall j1:int. (((i) <= (j1) /\ (j1) < (p))) -> (((v1[j1])) <= ((v1[p]))) /\ forall j2:int. (((p) < (j2) /\ (j2) <= (j))) -> (((v1[j2])) >= ((v1[p])))) -> (**) forall p1:int. (* inlet*) (p1) = (((p) - (1))) -> (**) forall v2:(array int). (* inlet*) ((0) <= (i) <= (((1) + (p1))) <= ((length v1))) -> (**) (sorted_sub v2 i ((1) + (p1)) /\ permut_sub v1 v2 i ((1) + (p1))) -> (**) forall p2:int. (* inlet*) (p2) = (((p) + (1))) -> (* pre*) ((0) <= (p2) <= (((1) + (j))) <= ((length v2)))

Figure 13: One of the generated goals for quickSort 162 Processing an Intermediate Representation Written in Lisp have illustrated this compilation scheme with two different processing: executing the CL-IR file, and extracting the verification conditions that should be discharged by the provers in order the program to be correct. We make note that all the manipulations of the CL-IR described in this paper are done automatically. The CL-IR files are generated, transformed, and consumed by the platform. We do not claim that directly using the CL-IR as a programming language has any advantage with respect to conventional languages. The proposed scheme is general enough to accommodate new kinds of processing. By using it, a new processing of the CL-IR file would essentially need a new current package, with a different collection of macros. The navigation of the AST associated to the CL-IR file, is performed for free by the compilation environment.

References

[1] Andrew W. Appel (1998): Modern Compiler Implementation in ML. Cambridge University Press. [2] Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovic, Tim King, An- drew Reynolds & Cesare Tinelli (2011): CVC4. In Ganesh Gopalakrishnan & Shaz Qadeer, editors: Com- puter Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, Lecture Notes in Computer Science 6806, Springer, pp. 171–177. [3] Jean-Christophe Filliatreˆ & Andrei Paskevich (2013): Why3 - Where Programs Meet Provers. In Matthias Felleisen & Philippa Gardner, editors: ESOP, Lecture Notes in Computer Science 7792, Springer, pp. 125– 128. [4] John Hughes (2009): with QuickCheck. In Zoltan´ Horvath,´ Rinus Plasmeijer & Viktoria´ Zsok,´ editors: Central European Functional Programming School - Third Summer School, CEFP 2009, Bu- dapest, Hungary, May 21-23, 2009 and Komarno,´ Slovakia, May 25-30, 2009, Revised Selected Lectures, Lecture Notes in Computer Science 6299, Springer, pp. 183–223. [5] John McCarthy (1960): Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. Commun. ACM 3(4), pp. 184–195. [6] Manuel Montenegro, Susana Nieva, Ricardo Pena˜ & Clara Segura (2016): Extending Liquid Types to Arrays. In: Submitted to PROLE 2016, Salamanca, Spain, pp. 1–15. [7] Manuel Montenegro, Ricardo Pena˜ & Jaime Sanchez-Hern´ andez´ (2015): A Generic Intermediate Represen- tation for Verification Condition Generation. In Moreno Falaschi, editor: Logic-Based Program Synthesis and Transformation - 25th International Symposium, LOPSTR 2015, Siena, Italy, July 13-15, 2015. Revised Selected Papers, Lecture Notes in Computer Science 9527, Springer, pp. 227–243. [8] Leonardo Mendonc¸a de Moura & Nikolaj Bjørner (2008): Z3: An Efficient SMT Solver. In C. R. Ramakr- ishnan & Jakob Rehof, editors: Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Prac- tice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, Lecture Notes in Computer Science 4963, Springer, pp. 337–340. [9] Ken Pintman et al. (1994): ANSI INCITS 226-1994 (R2008) (formerly X3.226-1994 (R1999), r2008 edition. ANSI. [10] Patrick Maxim Rondon, Ming Kawaguchi & Ranjit Jhala (2008): Liquid types. In Rajiv Gupta & Saman P. Amarasinghe, editors: PLDI, ACM, pp. 159–169.