Decision Procedures for Program Synthesis and Verification

DECISION PROCEDURES FOR PROGRAM SYNTHESIS AND VERIFICATION Ruzica Piskac Thèse n. 5220 2011 présenté le 17 Octobre 2011 à la Faculté Informatique et Communications Laboratoire d'analyse et de raisonnement automatisés (LARA) programme doctoral en informatique, communications et information École Polytechnique Fédérale de Lausanne pour l'obtention du grade de Docteur ès Sciences par Ruzica Piskac acceptée sur proposition du jury: Prof Arjen Lenstra, président du jury Prof Viktor Kuncak, directeur de thèse Dr Nikolaj Bjørner, rapporteur Prof Rupak Majumdar, rapporteur Prof Martin Odersky, rapporteur Lausanne, EPFL, 2011 Za mamu i tatu. Acknowledgments First and foremost, I would like to thank my advisor Viktor Kuncak for his support, without which this dissertation would not have been possible. I very much enjoyed our discussions, the challenges he put in front of me, even our little arguments (I still prefer "where" over "let"). Viktor is a person of great knowledge and a brilliant mind, he cares a lot about his students, and I consider myself lucky and privileged for having the chance to work with him. I am grateful to Arjen Lenstra, Nikolaj Bjørner, Rupak Majumdar, and Martin Odersky, for agreeing to serve on my thesis committee. I thank them for their time, I am aware of their busy schedules, and I thank them for their feedback on my thesis. I would like to additionally thank Nikolaj Bjørner for being my mentor during a summer internship in 2008 at Microsoft Research, Redmond. Working with Nikolaj in those three summer months helped me to significantly increase my knowledge of the SMT world. Special thanks go to Tim King. Tim agreed to proofread this thesis and found a number of places where further clarifications were needed. I am thankful to Thomas Wies and Philippe Suter for the discussions on the structure and some of the technical subtleties of my thesis. I am further in debt to Utkarsh Upadhyay for his help on submitting and printing this thesis in time. During the past four years I had the pleasure to work with a number of amazing people around the world. In particular, I want to thank all my colleagues with whom I co-authored a paper: Nikolaj Bjørner, Tihomir Gvero, Viktor Kuncak, Mikaël Mayer, Leonardo de Moura, Philippe Suter, Thomas Wies, and Kuat Yessenov. One of the reasons why my PhD studies were so enjoyable is because of the great atmosphere in the LARA group. I thank my colleagues Eva Darulová, Tihomir Gvero, Hossein Hojjat, Swen Jacobs, Giuliano Losa, Andrej Spielmann, and Philippe Suter for creating such a motivating working environment. I thank Danielle Chamberlain, Yvette Gallay, and Fabien Salvi for their help with bureaucratic and technical tasks. My road towards a PhD degree was long and winding. On this path I had an honor to work in various groups, on various topics. I would therefore like to thank to all the great people with whom I was working: I believe that the experience that I gained in each of those groups helped me in completing this thesis. In particular, I am thankful to Dieter Fensel and his group at the University of Innsbruck, late Harald Ganzinger and his group at the Max-Planck Institute for Computer science in Saarbrücken, Tome Anticic and his group at the Institute Rudjer Boskovic in Zagreb, and Robert Manger and his group at the University of Zagreb. At the beginning of my studies at EPFL, I was a co-organizer of a workshop at the International v Acknowledgments Semantic Web Conference in Busan, Korea. I would like to thank the other co-organizer, Frank van Harmelen, for this unforgettable experience. I am also grateful to Manfred Jeusfeld for introducing me to the world of on-line publishing. I am grateful for the inspiring discussions with the outstanding people that I met during my PhD studies. I thank Shankar Natarajan for being the kindest host. My first scientific visit was to SRI International, and Shankar also made it one of the most memorable. His understanding and kind attitude encouraged me to speak freely about my research. I met Rupak Majumdar at EPFL, just when I was starting my PhD. I thank him for introducing me to the world of semilinear sets, and without knowing it, helping me to solve some of the research questions that I was working on. During my PhD studies I considered Sabine Süsstrunk as a mentor and role model. I am grateful to her for finding time for me, for her wise advices and her support. Finally, I want to thank my family and my friends for their support and their love through the years. I thank my parents, Bernarda and Josip, and my brother, Tomislav, for being there for me, even when we were physically apart. A very special thanks go to Thomas Wies. This final gratitude also extends to the families Piskac, Slunjski, Puklavec, Wies and Wagner. Lausanne, November 23, 2011 Ruzica Piskac vi Preface Recent progress in verification technology has started to improve our ability to automatically check the correctness of software systems. Yet an important challenge remains ahead of us: can we verify deeper program properties, beyond the absence of low-level errors? Such deep analysis requires reasoning that is specific to the software application being verified, so analysis techniques that target particular classes of properties stop being sufficient. To introduce application-specific reasoning into verification, it is promising to consider methods that integrate software verification with software development. Among key methodologies supporting this direction is compositional verification based on pre-conditions and post- conditions. To support such compositional techniques, the ability to reason about logical formulas is essential, because formulas become part of the program itself. Dealing with formulas also arises when modeling program semantics, both in approaches based on verification-condition generation and in more automated approaches, such as counterexample-guided predicate abstraction. Modeling programs with formulas allows us to achieve precision (for example, path-sensitivity), scalability (because we can use efficient algorithms to explore exponentially many program paths), and tool reuse (because we can develop tools largely independently from the programming language semantics). Due to the need to reason about formulas, theorem proving technology becomes an indispens- able component of these verification approaches. The leading automated theorem provers for these tasks are satisfiability modulo theory (SMT) solvers. They are among the most re- markable reasoning tools developed, combining great expressive power with the ability to handle megabyte-sized formulas. The key to this power is specialized reasoning based on decision procedures and a principled technique to combine them, preserving soundness and completeness. Interestingly, despite the great progress in the tools developed, the foundations of decision procedures and their combination techniques has been evolving relatively slowly since the introduction of the approach by Nelson and Oppen in the 1970ies. As a result, verification tools must model many constructs using quantifiers, often resulting in unpredictable reasoning and the inability to generate counterexamples. This thesis introduces new theorem proving algorithms, qualitatively extending the reach of existing technology. Much of these results is formulated in terms of decision procedures for classes of constraints of interest, and immediately leads to more predictable verification. Moreover, the thesis shows that, in many cases, these algorithms can also be used to directly construct software fragments through a notion of synthesis procedure. In several cases, the vii Preface class of logical constraints has not been identified as decidable before, whereas in others the class was known to be decidable but the known algorithms were exponentially worse than the optimal ones introduced in this thesis. The thesis contributes a number of results in decision procedures. Chapter 2 presents decision procedures for multiset constraints, whereas Chapter 3 presents initial implementation results. A number of extensions of this logic are the focus of Chapter 4. This includes relations, resulting in a proper extension of an important description logic. Another direction are collec- tions with fractional membership, which open up the possibility of applications in dealing with uncertainty. A parametrized family of decision procedures for program termination are the subject of Chapter 5. A new, more widely applicable method for composing decision procedures of individual logical theories into a decision procedure for the combined (union) theory is the subject of Chapter 6. We can broadly classify these results into two categories: 1) new decidable logical fragments, and 2) new methods to combine decidable fragments. Both kinds of results are a crucial starting point for the development of SMT solvers. An example of contributions of the first kind are the results on decision procedures for multiset constraints with the cardinality operator. This is a very natural fragment, because multisets describe data structures while preserving element multiplicity. They are needed to precisely describe, e.g., a sorting algorithm, or a precise external behavior of a data structure such as red-black tree. The cardinality operator on multisets is a natural measure to describe many data structure invariants. Yet, the very basic questions about the logics of multisets with cardinality constraints were unanswered before this work. The work shows that quantified multisets constraints with cardinalities are undecidable (in contrast to quantified set constraints with cardinalities). The most technically involved were the results on the complexity of the quantifier-free constraints. These constraints previously had a decision procedure giving NEXPTIME upper bound on the decision problem. Despite the exponential lower bound on the sizes of explicit models, the thesis shows the decision problem to be in NP (using, among others, a technique for proving the existence of sparse solutions of compactly represented integer linear programming problems with bounded coefficients, and bounds on the sizes of vectors in semilinear set representations).

Decision Procedures for Program Synthesis and Verification

Program Synthesis with Grammars and Semantics in Genetic Programming

Learning Functional Programs with Function Invention and Reuse

Program Synthesis with Types

Program Synthesis

Template-Based Program Verification and Program Synthesis

Logic Program Synthesis

Satisfiability-Based Program Reasoning and Program Synthesis 1

Synthesis of Programs in Computational Logic

Interpretable Program Synthesis

Justification of the Structural Synthesis of Programs

On Program Synthesis Knowledge

Automatic Program Synthesis in Second-Order Logic