Support for Algebraic Data Types in Viper Bachelor of Science Thesis Project Description

David Rohr

Supervised by Arshavir Ter-Gabrielyan Prof. Dr. Peter Müller

Chair of Programming Methodology Department of ETH Zürich

September 30, 2016

1 Introduction

Viper [1] is a suite of verication tools developed at ETH Zurich. A central part of Viper is its intermediate verication language, which can be used to describe, for instance, object-oriented programs and their properties (in the form of preconditions, postconditions and assertions) as well as complex data structures like trees and other types of graphs. These data structures can be specied in Viper using recursive predicates [1, pp. 9f.], quantied permissions [1, pp. 12f.] or, potentially, algebraic data types (sometimes referred to as ADTs, but not to be confused with abstract data types), which are commonly used in several functional and multi-paradigm programming languages like Haskell [2], F# [3], Scala [4] or Rust [5]. Although it is not possible to directly reference or dereference their instances (because they are value types), their simplicity and the fact that operations on algebraic data types don't cause any side eects - you could also call them pure - make them a useful tool for the specication of data structures. However, Viper does currently not support the direct (native) declaration of algebraic data types. Instead, they need to be encoded in Viper using custom domains [1, pp. 16f.], which is, depending on the encoded data type, considerably more complicated and error-prone than, for example, a native ADT declaration in Haskell. The main goal of this project is to design and implement a language extension that facilitates the denition and usage of algebraic data types in Viper.

1 2 Language Features

This section contains an overview of the language features that will (core features) or might (extensions) be added to Viper within the framework of this project.

2.1 Algebraic data types (core) In the scope of this project, algebraic data types are the main feature added to Viper as part of the developed language extension. According to [2] and [6, p. 454], there exist two major categories of algebraic data types, which should both be supported:

• Product types or record types, whose instances are tuples containing values or references of possibly (but not necessarily) several dierent types.

• Sum types or variant types, whose instances can hold an instance or a reference of one of typically several dierent types or options. Enu- merated types (enums) can also be seen as a form of sum type; as a side note, sum types are in fact called enums in Rust [5].

Of course, sum types and product types can be combined. For example, the List data type dened in listing 1 is a sum type whose second option (Node) is a product of e and (List e). Listing 1: Recursive declaration of an ADT (linked list) in Haskell

1 data List e = Empty | Node e (List e)

2.2 Pattern matching (core) The concept of pattern matching is closely related to algebraic data types as it provides an easy way to decompose their instances. As a result, pat- tern matching mechanisms are often provided by programming languages supporting algebraic data types and should as well be part of the language extension developed over the course of this project. The language extension should at least support the following kinds of patterns:

• A wildcard pattern that matches all possible objects. Example: div _ 0 = undefined (the wildcard is denoted by _)

• Patterns that match only a specic instance of a certain type. Example: mul 2 x = x + x (2 is a pattern that only matches the inte- ger 2)

2 • Patterns that bind an instance of a certain type to a free variable of this type. Example: x ++ [] = x (the value of the rst argument is bound to the free variable x)

• Patterns for ADTs, lists and other composite data types. Example: listSum (x:xs) = x + listSum xs (the list is decomposed into a head and a tail)

The following example demonstrates how these kinds of patterns can be used in Haskell:

1 --An ADT representing a binary tree: 2 data Tree2 e = Leaf | Node (Tree2 e) e (Tree2 e) 3 4 --Checks whether a tree contains a specific element 5 containsElement :: Eq e => Tree2 e -> e -> Bool 6 7 --Leaf: Pattern that matches only the "Leaf" instance 8 --_: Wildcard pattern 9 containsElement Leaf _ = False 10 11 --(Node t1 val t2): ADT Pattern that matches "Node" instances and binds their content to free variables 12 --x: Pattern that binds the second argument (element) to a free variable x 13 containsElement (Node t1 val t2) x = val == x || containsElement t1 x || containsElement t2 x 14 15 --Output: "True" 16 main = putStrLn (show (containsElement (Node Leaf 1 (Node (Node Leaf 2 Leaf) 5 Leaf)) 2))

2.2.1 Proof of exhaustiveness (core) In addition to dening possible patterns, the project should also describe ways to determine whether a set of patterns is exhaustive (i.e., whether every instance of the corresponding type is matched by at least one pattern in the set). This proof of exhaustiveness could either be done during the context sensitive analysis phase of the verier extension or be integrated into the generated Viper code.

Listing 2: A set of Haskell patterns that is not exhaustive (B True is missing)

1 data IntBool = I Int | B Bool 2

3 3 toInt :: IntBool -> Int 4 toInt (I x) = x 5 toInt (B False) = 0 6 7 --Output: "Non-exhaustive patterns in function toInt" 8 main = putStrLn (show (toInt (B True)))

Listing 3: Pseudocode that demonstrates how a Viper assertion could be used to check whether the patterns of a match statement are exhaustive

if (pattern 1 matches value) { ... } elseif (pattern 2 matches value) { ... } ... else { assert false; }

2.3 Termination analysis (core) Another major part of the project is focused on the termination of programs that are encoded in Viper. Some of the questions that need to be answered in this part are:

• What are the conditions that need to be fullled by a method, func- tion, predicate, statement or expression evaluation to be considered terminating?

• Which language features could be introduced to facilitate the creation of termination proofs?

• How can these new features (e.g., termination metrics) be implemented using basic context sensitive analyses and existing features of the Viper intermediate language?

Some central language elements of the termination analysis are loops and recursive functions, methods and predicates, which can possibly cause a pro- gram to not terminate.

2.3.1 Termination metrics (core) Another aspect of the project considers features that simplify the declaration of loop variants and recursion bounds in order to prove the termination of a loop, function, method or program. For example, the programming language

4 [7] provides a decreases clause to dene termination metrics for loops and recursive methods, which could, in a similar way, also be introduced into Viper. The programming language ATS [8] oers a comparable feature for functions. Like in Dafny or ATS, it should also be possible to provide tuples of integer values/expressions as termination metrics. As explained in more detail in [9], the standard well-founded lexicographical ordering on natural numbers can then be used to order these tuples and to determine whether the tuple is decreased in each iteration. The following two examples demonstrate the usage of the decreases clause in Dafny. In both programs, the variables d1 and d0 represent the most and least signicant digit of a two-digit decimal integer. The integer is printed and reduced until both digits reach the value 0. Listing 4: A simple Dafny program containing a decreases clause method Main ( ) { var d1 := 4 ; var d0 := 5 ; w h i l e ( d1 > 0 ∨ d0 > 0) decreases d1 , d0 { p r i n t d1 ; p r i n t d0 ; p r i n t "\n" ; i f ( d0 = 0) { d0 := 9 ; d1 := d1 − 1 ; } e l s e { d0 := d0 − 1 ; } } }

Listing 5: An implementation of the same algorithm using a recursive method method Main ( ) { out ( 4 , 5) ; } method out ( d1 : i n t , d0 : i n t ) r e q u i r e s d1 ≥ 0 r e q u i r e s d0 ≥ 0 decreases d1 , d0 { i f ( d1 > 0 ∨ d0 > 0) { p r i n t d1 ; p r i n t d0 ; p r i n t "\n" ; i f ( d0 = 0) { out ( d1 − 1 , 9) ; } e l s e { out ( d1 , d0 − 1) ; } } }

5 2.3.2 Using heap footprints as a termination measure (core) A further goal of this project is to provide the possibility of using the heap footprints of data structures as a termination measure. This feature can be particularly useful if we want to prove the termination of algorithms that need to analyze complex data structures (e.g., graphs) or if, more generally, the data structure's footprint is the only available termination measure. Listing 6: Using the size of an ADT (list) as a decreases measure in Dafny datatype List = Empty | Node(element : T, t a i l : L i s t ) method Main ( ) { // Output : "123" printIntegerList(Node(1, Node(2, Node(3, Empty)))); } method printIntegerList(list : L i s t ) { var l := l i s t ; w h i l e ( l . Node ?) decreases l { p r i n t l . element ; l := l . t a i l ; } } A problem we face when reasoning about heap data structures in Viper is that the heap is not exposed in Viper, which makes it harder to compare the heap footprints of two data structures.

2.3.3 Declaring termination without providing any termination metrics (core) In some cases, it might be possible that we know or at least assume that a loop, function or method terminates even though we are not able to provide any adequate termination measures. If this is the case, it should be possible to state that we want to prove the total correctness of the program based on the assumption that this loop, function or method terminates. Furthermore, it is possible that a program is actually supposed to run forever and that, as a result, the program shouldn't be considered incorrect if it doesn't terminate. The language extension should also provide a feature to handle these kinds of situations.

2.3.4 Default termination metrics (extensions) Sometimes, the termination metric of a loop or recursive method can be determined automatically. For example, if the condition of a while loop is (a < b), one can reasonably assume that the loop variant is (b - a) (although it is of course not guaranteed that this value is actually a loop variant). In such cases, it might be convenient if the developer doesn't need to declare the termination metrics explicitly.

6 2.3.5 Syntactic patterns (extensions) This project extension explores the possibilities of using syntactic patterns as a means of proving termination. For instance, we could perform a rela- tively simple syntactic analysis to prove that, in the following example, the predicate Person, which is unfolded inside the subsequently dened function personAge, terminates: field age: Int; field name: Seq[Int]; predicate Person(this: Ref) { acc(this.age) && acc(this.name) } function personAge(person: Ref): Int requires acc(Person(person)) { unfolding Person(person) in person.age }

2.3.6 Termination of function calls in postconditions (extensions) This project extension will be concerned with function calls that appear inside the postconditions of (possibly dierent) functions. Consider the fol- lowing code example written in Viper, which is successfully veried by both Viper backends: function a(x: Bool): Bool ensures result == true { (x == false || x == true) ? a(x) : true } function b(x: Bool): Bool ensures result == a(x) { true }

When looking at the postcondition of b, one might be tempted to conclude that the functions a and b are equivalent. This is, however, not the case as in fact a never terminates. Still, the postconditions of a and b are considered correct by the verier because the only possible result of a is true (although it is actually never returned by a). The goal of this part of the project is the introduction of methods to prove that a postcondition does not contain any non-terminating function calls (and to detect postconditions that might do).

7 3 Workow (Tasks)

This section contains a description of the tasks that need to be carried out for each of the language features that have been introduced in section 2.

3.1 Searching for examples and encoding them into Viper The rst major step includes a search for code examples that, to a large extent, cover the possible applications of the new language features. These examples are then manually encoded into Viper to demonstrate that and how the new language features can later be specied using only the original (core) features of the Viper language. Besides self-created code, ocial language documentations and open-source projects could also serve as possible sources for examples.

3.2 Performance tuning As a next step, the Viper specications of our examples can be veried and iteratively improved. In particular, the time that is needed to verify the examples should be evaluated and reduced. Over the course of these evaluations, we can also identify bottlenecks and determine how strong the inuence of certain factors (e.g., the number of patterns in a match state- ment) on the verication performance of dierent specications is and decide which specication is the most suitable in a certain situation. The results of this step can later be used to optimize the code that is (internally) generated by the translator developed in the fourth step.

3.3 Specication of the language extension Based on the examples gathered in step 1 and veried in step 2, we can now start to create a basic semantic and syntactic specication of the new language elements and concepts. This specication should not only contain a general description of a feature's semantic properties and the corresponding syntax rules, but also discuss possible restrictions and explain how the usage of these features might aect the correctness of a program (for example, a program is not correct if it contains non-exhaustive patterns). Because the main purposes of this specication are the documentation of the extension and the facilitation of the development of the extension in the next step, it should suce to describe the semantic properties of the new features using a natural language (English) and basic mathematics. When designing the syntax of the new features, it must be ensured that there are no conicts between the core part of the Viper language and the language extension.

8 3.4 Integration The goal of the nal major step of this project is the integration of the new language features into the existing Viper architecture. The software that is implemented in this part of the project will be based on the existing Viper verier and include extensions to the parser and intermediate representa- tion(s) of the Viper intermediate language as well as a code generator that maps the new language elements onto elements of the Viper core. Ideally, the created software is not only able to verify programs writ- ten in the extended verication language but also provides meaningful error messages to the user. In order to do this, the program needs to be able to either interpret the output that is generated by the core verier or to specify custom error messages for violated assertions.

3.5 Native implementation of ADTs in Z3 (extensions) As described in [10], the theorem prover Z3 oers the possibility to natively dene algebraic data types. In a possible extension of the project, we could investigate whether and how we could use this feature to implement ADTs in the two Viper backends Carbon and Silicon and whether there are any signif- icant performance dierences between our current implementation (created in step 4) and an implementation that exploits the ADT features of Z3.

4 Schedule

Start date: Project deadline: Final presentation:

Task Time Examples (3.1), Performance tuning (3.2) 8 weeks Specication of the language extension (3.3), Integration (3.4) 8 weeks Extensions (2.3.4, 2.3.5, 2.3.6 and 3.5) 4 weeks Report and Finalization 4 weeks Final presentation 1 week Total time: 25 weeks.

References

[1] P. Müller, M. Schwerho, and A. J. Summers. Viper: A Verication Infrastructure for Permission-Based Reasoning. In: Verication, Model Checking, and Abstract Interpretation (VMCAI). Ed. by B. Jobstmann

9 and K. R. M. Leino. Vol. 9583. LNCS. Springer-Verlag, 2016, pp. 41 62.

[2] HaskellWiki. Algebraic data type. url: https://wiki.haskell.org/ Algebraic_data_type (visited on 09/07/2016).

[3] Corporation. Discriminated Unions. url: https : / / docs . microsoft.com/en-us/dotnet/articles/fsharp/language-reference/ discriminated-unions (visited on 09/07/2016).

[4] EPFL. Case Classes. url: http://docs.scala-lang.org/tutorials/ tour/case-classes.html (visited on 09/07/2016).

[5] The Rust Project Developers. Enums. url: https://doc.rust-lang. org/book/enums.html (visited on 09/07/2016). [6] Benjamin C. Pierce. Advanced Topics in Types and Programming Lan- guages. The MIT Press, 2004. isbn: 0262162288. [7] K. Rustan M. Leino. Dafny: An Automatic Program Verier for Func- tional Correctness. In: Proceedings of the 16th International Confer- ence on Logic for Programming, Articial Intelligence, and Reasoning. LPAR'10. Dakar, Senegal: Springer-Verlag, 2010, pp. 348370. isbn: 3-642-17510-4, 978-3-642-17510-7. url: http://dl.acm.org/citation. cfm?id=1939141.1939161. [8] Chiyan Chen and Hongwei Xi. Combining Programming with The- orem Proving. In: Proceedings of the Tenth ACM SIGPLAN Inter- national Conference on Functional Programming. ICFP '05. Tallinn, Estonia: ACM, 2005, pp. 6677. isbn: 1-59593-064-7. doi: 10.1145/ 1086365 . 1086375. url: http : / / doi . acm . org / 10 . 1145 / 1086365 . 1086375.

[9] Hongwei Xi. Termination Metrics. url: https : / / www . cs . bu . edu / ~hwxi/ATS/TUTORIAL/contents/termination-metrics.html (visited on 09/22/2016). [10] Microsoft Corporation. Getting Started with Z3: A Guide (section 8 - Datatypes). url: http://rise4fun.com/z3/tutorialcontent/guide# h27 (visited on 09/28/2016).

10