Support for Algebraic Data Types in Viper Bachelor of Science Thesis Project Description
Total Page:16
File Type:pdf, Size:1020Kb
Support for Algebraic Data Types in Viper Bachelor of Science Thesis Project Description David Rohr Supervised by Arshavir Ter-Gabrielyan Prof. Dr. Peter Müller Chair of Programming Methodology Department of Computer Science ETH Zürich September 30, 2016 1 Introduction Viper [1] is a suite of verication tools developed at ETH Zurich. A central part of Viper is its intermediate verication language, which can be used to describe, for instance, object-oriented programs and their properties (in the form of preconditions, postconditions and assertions) as well as complex data structures like trees and other types of graphs. These data structures can be specied in Viper using recursive predicates [1, pp. 9f.], quantied permissions [1, pp. 12f.] or, potentially, algebraic data types (sometimes referred to as ADTs, but not to be confused with abstract data types), which are commonly used in several functional and multi-paradigm programming languages like Haskell [2], F# [3], Scala [4] or Rust [5]. Although it is not possible to directly reference or dereference their instances (because they are value types), their simplicity and the fact that operations on algebraic data types don't cause any side eects - you could also call them pure - make them a useful tool for the specication of data structures. However, Viper does currently not support the direct (native) declaration of algebraic data types. Instead, they need to be encoded in Viper using custom domains [1, pp. 16f.], which is, depending on the encoded data type, considerably more complicated and error-prone than, for example, a native ADT declaration in Haskell. The main goal of this project is to design and implement a language extension that facilitates the denition and usage of algebraic data types in Viper. 1 2 Language Features This section contains an overview of the language features that will (core features) or might (extensions) be added to Viper within the framework of this project. 2.1 Algebraic data types (core) In the scope of this project, algebraic data types are the main feature added to Viper as part of the developed language extension. According to [2] and [6, p. 454], there exist two major categories of algebraic data types, which should both be supported: • Product types or record types, whose instances are tuples containing values or references of possibly (but not necessarily) several dierent types. • Sum types or variant types, whose instances can hold an instance or a reference of one of typically several dierent types or options. Enu- merated types (enums) can also be seen as a form of sum type; as a side note, sum types are in fact called enums in Rust [5]. Of course, sum types and product types can be combined. For example, the List data type dened in listing 1 is a sum type whose second option (Node) is a product of e and (List e). Listing 1: Recursive declaration of an ADT (linked list) in Haskell 1 data List e = Empty | Node e (List e) 2.2 Pattern matching (core) The concept of pattern matching is closely related to algebraic data types as it provides an easy way to decompose their instances. As a result, pat- tern matching mechanisms are often provided by programming languages supporting algebraic data types and should as well be part of the language extension developed over the course of this project. The language extension should at least support the following kinds of patterns: • A wildcard pattern that matches all possible objects. Example: div _ 0 = undefined (the wildcard is denoted by _) • Patterns that match only a specic instance of a certain type. Example: mul 2 x = x + x (2 is a pattern that only matches the inte- ger 2) 2 • Patterns that bind an instance of a certain type to a free variable of this type. Example: x ++ [] = x (the value of the rst argument is bound to the free variable x) • Patterns for ADTs, lists and other composite data types. Example: listSum (x:xs) = x + listSum xs (the list is decomposed into a head and a tail) The following example demonstrates how these kinds of patterns can be used in Haskell: 1 --An ADT representing a binary tree: 2 data Tree2 e = Leaf | Node (Tree2 e) e (Tree2 e) 3 4 --Checks whether a tree contains a specific element 5 containsElement :: Eq e => Tree2 e -> e -> Bool 6 7 --Leaf: Pattern that matches only the "Leaf" instance 8 --_: Wildcard pattern 9 containsElement Leaf _ = False 10 11 --(Node t1 val t2): ADT Pattern that matches "Node" instances and binds their content to free variables 12 --x: Pattern that binds the second argument (element) to a free variable x 13 containsElement (Node t1 val t2) x = val == x || containsElement t1 x || containsElement t2 x 14 15 --Output: "True" 16 main = putStrLn (show (containsElement (Node Leaf 1 (Node (Node Leaf 2 Leaf) 5 Leaf)) 2)) 2.2.1 Proof of exhaustiveness (core) In addition to dening possible patterns, the project should also describe ways to determine whether a set of patterns is exhaustive (i.e., whether every instance of the corresponding type is matched by at least one pattern in the set). This proof of exhaustiveness could either be done during the context sensitive analysis phase of the verier extension or be integrated into the generated Viper code. Listing 2: A set of Haskell patterns that is not exhaustive (B True is missing) 1 data IntBool = I Int | B Bool 2 3 3 toInt :: IntBool -> Int 4 toInt (I x) = x 5 toInt (B False) = 0 6 7 --Output: "Non-exhaustive patterns in function toInt" 8 main = putStrLn (show (toInt (B True))) Listing 3: Pseudocode that demonstrates how a Viper assertion could be used to check whether the patterns of a match statement are exhaustive if (pattern 1 matches value) { ... } elseif (pattern 2 matches value) { ... } ... else { assert false; } 2.3 Termination analysis (core) Another major part of the project is focused on the termination of programs that are encoded in Viper. Some of the questions that need to be answered in this part are: • What are the conditions that need to be fullled by a method, func- tion, predicate, statement or expression evaluation to be considered terminating? • Which language features could be introduced to facilitate the creation of termination proofs? • How can these new features (e.g., termination metrics) be implemented using basic context sensitive analyses and existing features of the Viper intermediate language? Some central language elements of the termination analysis are loops and recursive functions, methods and predicates, which can possibly cause a pro- gram to not terminate. 2.3.1 Termination metrics (core) Another aspect of the project considers features that simplify the declaration of loop variants and recursion bounds in order to prove the termination of a loop, function, method or program. For example, the programming language 4 Dafny [7] provides a decreases clause to dene termination metrics for loops and recursive methods, which could, in a similar way, also be introduced into Viper. The programming language ATS [8] oers a comparable feature for functions. Like in Dafny or ATS, it should also be possible to provide tuples of integer values/expressions as termination metrics. As explained in more detail in [9], the standard well-founded lexicographical ordering on natural numbers can then be used to order these tuples and to determine whether the tuple is decreased in each iteration. The following two examples demonstrate the usage of the decreases clause in Dafny. In both programs, the variables d1 and d0 represent the most and least signicant digit of a two-digit decimal integer. The integer is printed and reduced until both digits reach the value 0. Listing 4: A simple Dafny program containing a decreases clause method Main ( ) { var d1 := 4 ; var d0 := 5 ; w h i l e ( d1 > 0 _ d0 > 0) decreases d1 , d0 { p r i n t d1 ; p r i n t d0 ; p r i n t "\n" ; i f ( d0 = 0) { d0 := 9 ; d1 := d1 − 1 ; } e l s e { d0 := d0 − 1 ; } } } Listing 5: An implementation of the same algorithm using a recursive method method Main ( ) { out ( 4 , 5) ; } method out ( d1 : i n t , d0 : i n t ) r e q u i r e s d1 ≥ 0 r e q u i r e s d0 ≥ 0 decreases d1 , d0 { i f ( d1 > 0 _ d0 > 0) { p r i n t d1 ; p r i n t d0 ; p r i n t "\n" ; i f ( d0 = 0) { out ( d1 − 1 , 9) ; } e l s e { out ( d1 , d0 − 1) ; } } } 5 2.3.2 Using heap footprints as a termination measure (core) A further goal of this project is to provide the possibility of using the heap footprints of data structures as a termination measure. This feature can be particularly useful if we want to prove the termination of algorithms that need to analyze complex data structures (e.g., graphs) or if, more generally, the data structure's footprint is the only available termination measure. Listing 6: Using the size of an ADT (list) as a decreases measure in Dafny datatype List<T> = Empty | Node(element : T, t a i l : L i s t <T>) method Main ( ) { // Output : "123" printIntegerList(Node(1, Node(2, Node(3, Empty)))); } method printIntegerList(list : L i s t <i n t >) { var l := l i s t ; w h i l e ( l . Node ?) decreases l { p r i n t l . element ; l := l . t a i l ; } } A problem we face when reasoning about heap data structures in Viper is that the heap is not exposed in Viper, which makes it harder to compare the heap footprints of two data structures.