UNIVERSITY of CALIFORNIA Los Angeles Data-Driven Learning Of
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA Los Angeles Data-Driven Learning of Invariants and Specifications A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Saswat Padhi 2020 © Copyright by Saswat Padhi 2020 ABSTRACT OF THE DISSERTATION Data-Driven Learning of Invariants and Specifications by Saswat Padhi Doctor of Philosophy in Computer Science University of California, Los Angeles, 2020 Professor Todd D. Millstein, Chair Although the program verification community has developed several techniques for analyzing software and formally proving their correctness, these techniques are too sophisticated for end users and require significant investment in terms of time and effort. In this dissertation, I present techniques that help programmers easily formalize the initial requirements for verifying their programs — specifications and inductive invariants. The proposed techniques leverage ideas from program synthesis and statistical learning to automatically generate these formal requirements from readily available program-related data, such as test cases, execution traces etc. I detail three of these data-driven learning techniques – FlashProfile and PIE for specification learning, and LoopInvGen for invariant learning. I conclude with some principles for building robust synthesis engines, which I learned while refining the aforementioned techniques. Since program synthesis is a form of function learning, it is perhaps unsurprising that some of the fundamental issues in program synthesis have also been explored in the machine learning community. I study one particular phenomenon — overfitting. I present a formalization of overfitting in program synthesis, and discuss two mitigation strategies, inspired by existing techniques. ii The dissertation of Saswat Padhi is approved. Adnan Darwiche Sumit Gulwani Miryung Kim Jens Palsberg Todd D. Millstein, Committee Chair University of California, Los Angeles 2020 iii To my parents ... for everything To the hackers who taught me so much, and made our software secure To those upholding truth, honesty, and integrity across our institutions To the heroes who keep up the fight for privacy, liberty, and sovereignty iv TABLE OF CONTENTS 1 Introduction :::::::::::::::::::::::::::::::::::::: 1 1.1 Post-hoc Validation................................2 1.1.1 Formal Verification............................3 1.1.2 Software Testing.............................4 1.2 Correctness by Construction...........................5 1.2.1 Formal Synthesis.............................5 1.2.2 Programming by Examples........................5 1.3 Thesis Statement and Contributions.......................6 2 Learning Program Invariants ::::::::::::::::::::::::::: 9 2.1 Overview...................................... 11 2.1.1 Data-Driven Precondition Inference................... 12 2.1.2 Feature Learning via Program Synthesis................ 14 2.1.3 Feature Learning for Loop Invariant Inference............. 16 2.2 Algorithms..................................... 20 2.2.1 Precondition Inference.......................... 20 2.2.2 Loop Invariant Inference......................... 26 2.3 Evaluation..................................... 30 2.3.1 Precondition Inference.......................... 30 2.3.2 Loop Invariants for C++ Code..................... 35 2.4 Related Work................................... 39 2.5 Applications and Extensions........................... 42 v 3 Learning Input Specifications ::::::::::::::::::::::::::: 43 3.1 Overview...................................... 50 3.1.1 Pattern-Specific Clustering........................ 52 3.1.2 Pattern Learning via Program Synthesis................ 55 3.2 Hierarchical Clustering.............................. 57 3.2.1 Syntactic Dissimilarity.......................... 58 3.2.2 Adaptive Sampling of Patterns..................... 60 3.2.3 Dissimilarity Approximation....................... 62 3.2.4 Hierarchy Construction and Splitting.................. 64 3.2.5 Profiling Large Datasets......................... 65 3.3 Pattern Synthesis................................. 67 3.3.1 The Pattern Language LFP ....................... 68 3.3.2 Synthesis of LFP Patterns........................ 70 3.3.3 Cost of Patterns in LFP ......................... 75 3.4 Evaluation..................................... 77 3.4.1 Syntactic Similarity............................ 79 3.4.2 Profiling Accuracy............................ 80 3.4.3 Performance................................ 83 3.4.4 Comparison of Learned Profiles..................... 85 3.5 Applications in PBE Systems.......................... 88 3.6 Related Work................................... 91 4 Overfitting in Program Synthesis ::::::::::::::::::::::::: 94 4.1 Motivation..................................... 98 vi 4.1.1 Grammar Sensitivity of SyGuS Tools.................. 98 4.1.2 Evidence for Overfitting......................... 99 4.2 SyGuS Overfitting in Theory........................... 102 4.2.1 Preliminaries............................... 102 4.2.2 Learnability and No Free Lunch..................... 104 4.2.3 Overfitting................................. 107 4.3 Mitigating Overfitting.............................. 109 4.3.1 Parallel SyGuS on Multiple Grammars................. 110 4.3.2 Hybrid Enumeration........................... 111 4.4 Experimental Evaluation............................. 120 4.4.1 Robustness of PLearn .......................... 121 4.4.2 Performance of Hybrid Enumeration.................. 121 4.4.3 Competition Performance........................ 123 4.5 Related Work................................... 123 5 Conclusion ::::::::::::::::::::::::::::::::::::::: 125 References ::::::::::::::::::::::::::::::::::::::::: 127 vii LIST OF FIGURES 1.1 Approaches for developing reliable software, and my research focus relative to them. 7 2.1 An example of data-driven precondition inference.................. 13 2.2 A C++ implementation of sub............................ 16 2.3 Algorithm for precondition generation........................ 20 2.4 The core PIE algorithm................................ 21 2.5 The feature learning algorithm............................ 23 2.6 The boolean function learning algorithm....................... 24 2.7 Verified precondition generation........................... 26 2.8 Loop invariant inference using PIE.......................... 27 2.9 Comparison of PIE configurations. The left plot shows the effect of different numbers of tests. The right plot shows the effect of different conflict group sizes. 33 3.1 Custom atoms, and refinement of profiles with FlashProfile .......... 45 3.2 FlashProfile’s interaction model: thick edges denote input and output to the system, dashed edges denote internal communication, and thin edges denote optional parameters.................................. 48 3.3 The main profiling algorithm............................ 51 3.4 Default atoms in FlashProfile, with the corresponding regex.......... 52 3.5 A hierarchy with suggested and refined clusters: Leaf nodes represent strings, and internal nodes are labelled with patterns describing the strings below them. Atoms are concatenated using “ ”. A dashed edge denotes the absence of a pattern that describes the strings together...................... 53 3.6 Learning the best pattern for a dataset....................... 55 viii 3.7 Algorithms for pattern-similarity-based hierarchical clustering of string datasets 58 3.8 Adaptively sampling a small set of patterns.................... 61 3.9 Approximating a complete dissimilarity matrix................... 63 3.10 Profiling large datasets................................ 65 3.11 Limiting the number of patterns in a profile.................... 66 3.12 Formal syntax and semantics of our DSL LFP for defining syntactic patterns over strings......................................... 69 3.13 Computing the maximal set of compatible atoms................. 74 3.14 Number and length of strings across datasets.................... 78 3.15 Similarity prediction accuracy of FlashProfile (FP) vs. a character-based measure (JarW), and random forests (RF1:::3) trained on different distributions. 80 3.16 FlashProfile’s partitioning accuracy with different hµ, θi-configurations... 81 3.17 Quality of descriptions at hµ = 4:0; θ = 1:25i .................... 82 3.18 Quality of descriptions from current state-of-the-art tools............. 83 3.19 Impact of sampling on performance (using the same colors and markers as Figure 3.16) 84 3.20 Performance of FlashProfile over real-life datasets............... 85 3.21 Ordering partitions by mutual dissimilarity..................... 89 3.22 Examples needed with and without FlashProfile ................ 90 4.1 Grammars of quantifier-free predicates over integers................ 98 4.2 For each grammar, each tool, the ordinate shows the number of benchmarks that fail with the grammar but are solvable with a less-expressive grammar...... 99 4.3 The fib_19 benchmark [GJ07]........................... 100 4.4 The PLearn framework for SyGuS tools...................... 110 ix 4.5 Hybrid enumeration to combat overfitting in SyGuS................ 114 4.6 An algorithm to divide a given size budget among subexpressions........ 115 4.7 The number of failures on increasing grammar expressiveness, for state-of-the-art SyGuS tools, with and without the PLearnframework (Figure 4.4)....... 120 4.8 L - LoopInvGen, H - HE+LoopInvGen, P - PLearn(LoopInvGen). H is not only significantly robust against increasing grammar expressiveness, but it also has a smaller