Adaptive Eager Boolean Encoding for Arithmetic Reasoning in Verification
Total Page:16
File Type:pdf, Size:1020Kb
Adaptive Eager Boolean Encoding for Arithmetic Reasoning in Verification Sanjit A. Seshia May 2005 CMU-CS-05-134 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Thesis Committee: Prof. Randal E. Bryant, Chair Prof. Edmund M. Clarke Prof. Jeannette M. Wing Prof. David L. Dill, Stanford University Copyright c 2005 Sanjit A. Seshia This research was sponsored in part by a National Defense Science and Engineering Graduate Fellowship, the National Science Foundation under grant CCR-9805366, and the U.S. Army under ARO grant DAAD19-01-1-0485. The views and conclusions contained in this document are those of the author and should not be interpreted as repre- senting the official policies, either expressed or implied, of any sponsoring institution, the U.S. Government, or any other entity. Keywords: Decision procedures, automated theorem proving, model checking, Boolean satisfia- bility, integer linear programming, quantified Boolean formulas, first-order logic, timed automata, difference constraints, timed circuits, infinite-state systems, software security, machine learning, verification, reliability, security, UCLID, TMV. Dedicated to Appa, Amma, Ashwin, and Sunny Abstract Decision procedures for first-order logics are widely applicable in design verifica- tion and static program analysis. However, existing procedures rarely scale to large systems, especially for verifying properties that depend on data or timing, in addition to control. This thesis presents a new approach for building efficient, automated decision pro- cedures for first-order logics involving arithmetic. In this approach, decision prob- lems involving arithmetic are transformed to problems in the Boolean domain, such as Boolean satisfiability solving, thereby leveraging recent advances in that area. The transformation automatically detects and exploits problem structure based on new theo- retical results and machine learning. The results of experimental evaluations show that our decision procedures can outperform other state-of-the-art procedures by several or- ders of magnitude. The decision procedures form the computational engines for two verification sys- tems, UCLID and TMV. These systems have been applied to problems in computer security, electronic design automation, and software engineering that require efficient and precise analysis of system functionality and timing. This thesis describes two such applications: finding format-string exploits in software, and verifying circuits that op- erate under timing assumptions. ii Contents 1 Introduction 1 1.1 Boolean Encoding Techniques . 3 1.2 Thesis Contributions . 5 1.3 Thesis Overview . 6 2 Preliminaries 9 2.1 Notation . 9 2.2 Variable Classes . 11 2.3 Fourier-Motzkin Elimination . 11 3 Difference Logic 13 3.1 Constraint Graph . 15 3.2 Small-Domain Encoding . 16 3.3 Direct Encoding . 19 3.4 Related Work . 23 3.5 Discussion . 24 I SAT-Based Decision Procedures 25 4 Generalized 2SAT Constraints 27 4.1 Previous Work . 28 4.2 Background . 29 iii 4.3 Theoretical Results . 31 4.3.1 Minimal Face Solutions of G2SAT Polyhedra . 31 4.3.2 Rounding and Semi-Rounding . 34 4.3.3 Main Theorems . 38 4.3.4 Approximation Results for Optimization . 41 4.4 Experimental Evaluation . 42 4.4.1 Implementation . 42 4.4.2 Setup . 42 4.4.3 Comparison . 43 4.5 Summary . 45 5 Quantifier-Free Presburger Arithmetic 47 5.1 Related Work . 49 5.2 Background . 51 5.2.1 Preliminaries . 51 5.2.2 Previous Results . 53 5.3 Main Theoretical Results . 53 5.3.1 Bounds for a System of Difference Constraints . 54 5.3.2 Bounds for a Sparse System of Mainly Difference Constraints . 56 5.3.3 Bounds for Arbitrary Quantifier-Free Presburger Formulas . 59 5.4 Improvements . 61 5.4.1 Variable Classes . 61 5.4.2 Large Coefficients and Widths . 62 5.4.3 Large Constant Terms . 64 5.5 Experimental Evaluation . 65 5.5.1 Implementation . 66 5.5.2 Experimental Results . 66 5.6 Discussion . 72 6 Automated Selection of Boolean Encoding 75 iv 6.1 The Need for Algorithm Selection . 75 6.1.1 Comparing the SD and DIRECT Methods . 75 6.1.2 Automated Algorithm Selection . 79 6.2 Learning-Based Approach . 80 6.2.1 Complexity of Counting Transitivity Constraints . 80 6.2.2 Feature Selection . 82 6.2.3 Machine Learning Technique . 83 6.2.4 Hybrid Encoding Algorithm . 83 6.3 Experimental Evaluation . 86 6.4 Discussion . 88 7 Extended Logic and Applications 91 7.1 Extended Logic . 91 7.1.1 Uninterpreted Function Symbols . 93 7.1.2 Lambda Expressions . 94 7.2 Decision Procedure Extensions . 97 7.2.1 Elimination of Lambda Expressions . 97 7.2.2 Elimination of Function and Predicate Applications . 98 7.2.3 Summary . 99 7.3 Verification Techniques in UCLID . 100 7.4 Case Study: Finding Format-String Exploits . 102 7.4.1 Background . 103 7.4.2 Formal Specification . 106 7.4.3 Results . 109 7.5 Summary . 113 II Model Checking Timed Systems 115 8 Quantified Difference Logic 117 8.1 Quantifier Elimination Using Boolean Methods . 118 v 8.2 Satisfiability Checking of DL Formulas over ¡ . 123 8.3 Representation and Manipulation of DL Formulas . 123 8.4 Optimizations . 124 8.4.1 Determining if Bounds are Conjoined . 125 8.4.2 Quantifier Elimination by Eliminating Upper Bounds on ¢¤£ . 125 8.4.3 Eliminating Infeasible Paths in BDDs . 133 8.5 Summary . 133 9 Model Checking and Timed Circuits 135 9.1 Related Work . 136 9.2 Background . 137 9.2.1 Timed Automata . 137 9.2.2 Timed ¥ Calculus and TCTL . 139 9.3 Fully Symbolic Model Checking . 139 9.3.1 Implementation and Results . 143 9.4 Verification of Timed Circuits . 146 9.4.1 Previous Work . 148 9.4.2 Modeling Timed Circuits . 149 9.4.3 From Circuits to Timed Automata . 153 9.4.4 Case Studies . 156 9.5 Summary . 162 10 Conclusion 163 10.1 Summary of Contributions . 163 10.2 Open Problems . 164 10.3 Looking Ahead . 165 A UCLID 169 A.1 The UCLID Specification Language . 169 A.1.1 Format . 169 vi A.1.2 Language Overview . 170 A.1.3 Keywords and Lexical Conventions . 173 A.1.4 Data Types and Type Declarations . 173 A.1.5 Constants . 175 A.1.6 Input Variables . 175 A.1.7 State Variables . 176 A.1.8 Macro Definitions . ..