Ph.D Thesis: WYSINWYX What You See Is Not What You Execute

WYSINWYX: WHAT YOU SEE IS NOT WHAT YOU EXECUTE by Gogul Balakrishnan A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences Department) at the UNIVERSITY OF WISCONSIN–MADISON 2007 c Copyright by Gogul Balakrishnan 2007 All Rights Reserved i To my beloved grandma Rajakrishnammal::: ii ACKNOWLEDGMENTS First of all, I would like to thank my parents Mr. Balakrishnan and Mrs. Manickam for making my education a priority even in the most trying circumstances. Without their love and constant support, it would not have been possible for me to obtain my Ph.D. Next, I would like to thank my advisor, Prof. Thomas Reps, who, without an iota of exagger- ation, was like a father to me in the US. He taught me the importance of working from the basics and the need for clarity in thinking. I have learnt a lot of things from him outside work: writing, sailing, safety on the road, and so on; the list is endless. He was a source of constant support and inspiration. He usually goes out of his way to help his students—he was with me the whole night when we submitted our first paper! I don’t think my accomplishments would have been possible without his constant support. Most of what I am as a researcher is due to him. Thank you, Tom. I would also like to thank GrammaTech, Inc. for providing the basic infrastructure for CodeSurfer/x86. I am really grateful to Prof. Tim Teitelbaum for allocating the funds and the time at GrammaTech to support our group at the University of Wisconsin. Special thanks go to Radu Gruian and Suan Yong, who provided high-quality software and support. I have always enjoyed the discussions and the interactions I had with them. Furthermore, I would like to thank Prof. Nigel Boston, Prof. Susan Horwitz, Prof. Ben Liblit, and Prof. Mike Swift for being on my committee and for the stimulating discussions during my defense. I would like to specially thank Prof. Susan Horwitz, Prof. Tom Reps, and Prof. Mike Swift for their insightful comments on a draft of my dissertation. Their comments have definitely improved the readability of this dissertation. I would also like to thank Prof. V. Uma Maheswari, who introduced me to compilers at Anna University. She also taught me to take life as it is. iii I would like to specially thank Junghee Lim for being a great office mate and also for taking over the implementation of some parts of the analysis algorithms in CodeSurfer/x86. In addition, I would like to thank the other students in PL research: Evan Driscoll, Denis Gopan, Nick Kidd, Raghavan Komondoor, Akash Lal, Alexey Loginov, Dave Melski, Anne Mulhern, and Cindy Rubio Gonzalez.´ I have always had interesting discussions with them and their feedback on my presentations have always been helpful. I would also like to thank the members of the Wisconsin Safety Analyzer (WiSA) group: Prof. Somesh Jha, Prof. Bart Miller, Mihai Christodorescu, Vinod Ganapathy, Jon Giffin, Shai Rubin, and Hao Wang. I have always enjoyed being part of the WiSA group, and the bi-yearly trips for the project reviews were always fun. My dissertation research was supported by a number of sources, including the Office of Naval Research, under grant N00014-01-1-0708, the Homeland Security Advanced Research Projects Agency (HSARPA) under AFRL contract FA8750-05-C-0179, and the Disruptive Technology Of- fice (DTO) under AFRL contract FA8750-06-C-0249. I am thankful to our ONR program man- agers Dr. Ralph Wachter and Gary Toth for their support. Finally, I would like to thank my sister Arini, and my friends, Anto, George, Jith, Muthian, Piramanayagam, Prabu, Pranay, Sanjay, Senthil, Veeve, Vicky, and Vinoth, for making my six years in Madison less stressful. I will always remember the Tamil, Telugu, Hindi, and English movies we watched late in the night on a BIG screen at Oaktree. Moreover, I cannot forget our Idlebrain Cricket Club. DISCARD THIS PAGE iv TABLE OF CONTENTS Page LIST OF TABLES ....................................... vii LIST OF FIGURES ...................................... viii ABSTRACT .......................................... xi 1 Introduction ........................................1 1.1 Advantages of Analyzing Executables........................8 1.2 Challenges in Analyzing Executables........................ 10 1.2.1 No Debugging/Symbol-Table Information.................. 11 1.2.2 Lack Of Variable-like Entities........................ 12 1.2.3 Information About Memory-Access Expressions.............. 12 1.3 CodeSurfer/x86: A Tool for Analyzing Executables................. 13 1.4 The Scope of Our Work................................ 16 1.5 Contributions and Organization of the Dissertation................. 18 2 An Abstract Memory Model ............................... 20 2.1 Memory-Regions................................... 21 2.2 Abstract Locations (A-Locs)............................. 23 3 Value-Set Analysis (VSA) ................................. 26 3.1 Value-Set....................................... 28 3.2 Abstract Environment (AbsEnv)........................... 30 3.3 Representing Abstract Stores Efficiently....................... 31 3.4 Intraprocedural Analysis............................... 32 3.4.1 Idioms.................................... 37 3.4.2 Predicates for Conditional Branch Instructions............... 38 3.5 Interprocedural Analysis............................... 39 3.5.1 Abstract Transformer for call!enter Edge................. 41 3.5.2 Abstract Transformer for exit!end-call Edge................ 43 v Page 3.5.3 Interprocedural VSA algorithm....................... 44 3.6 Indirect Jumps and Indirect Calls........................... 45 3.7 Context-Sensitive VSA................................ 46 3.7.1 Call-Strings.................................. 46 3.7.2 Context-Sensitive VSA Algorithm...................... 48 3.7.3 Memory-Region Status Map......................... 49 3.8 Soundness of VSA.................................. 49 4 Value-Set Arithmetic ................................... 51 4.1 Notational Conventions................................ 51 4.2 Strided-Interval Arithmetic.............................. 51 4.2.1 Addition (+si)................................. 53 si 4.2.2 Unary Minus (−u).............................. 55 4.2.3 Subtraction (−si), Increment (++si), and Decrement (−−si)........ 56 4.2.4 Bitwise Or (jsi)................................ 56 4.2.5 Bitwise not (∼si), And (&si), and Xor (^si).................. 60 4.2.6 Strided-Interval Arithmetic for Different Radices.............. 60 4.3 Value-Set Arithmetic................................. 62 4.3.1 Addition (+vs)................................ 63 4.3.2 Subtraction (−vs)............................... 64 4.3.3 Bitwise And (&vs), Or (jvs), and Xor (^vs).................. 64 4.3.4 Value-Set Arithmetic for Different Radices................. 65 5 Improving the A-loc Abstraction ............................. 66 5.1 Overview of our Approach.............................. 67 5.1.1 The Problem of Indirect Memory Accesses................. 67 5.1.2 The Problem of Granularity and Expressiveness............... 69 5.2 Background...................................... 71 5.2.1 Aggregate Structure Identification (ASI)................... 72 5.3 Recovering A-locs via Iteration............................ 75 5.4 Generating Data-Access Constraints......................... 77 5.5 Interpreting Indirect Memory-References...................... 81 5.6 Hierarchical A-locs.................................. 85 5.7 Convergence..................................... 86 5.8 Pragmatics....................................... 87 5.9 Experiments...................................... 87 5.9.1 Comparison of A-locs with Program Variables............... 88 5.9.2 Usefulness of the A-locs for Static Analysis................. 91 vi Page 6 Recency-Abstraction for Heap-Allocated Storage ................... 97 6.1 Problems in Using the Allocation-Site Abstraction in VSA............. 100 6.1.1 Contents of the Fields of Heap-Allocated Memory-Blocks......... 101 6.1.2 Resolving Virtual-Function Calls in Executables.............. 101 6.2 An Abstraction for Heap-Allocated Storage..................... 104 6.3 Formalizing The Recency-Abstraction........................ 106 6.4 Experiments...................................... 110 7 Other Improvements to VSA ............................... 113 7.1 Widening....................................... 113 7.2 Affine-Relation Analysis (ARA)........................... 116 7.3 Limited Widening................................... 118 7.4 Priority-based Iteration................................ 119 7.4.1 Experiments................................. 122 7.5 GMOD-based Merge Function............................ 122 7.5.1 Experiments................................. 125 8 Case Study: Analyzing Device Drivers ......................... 134 8.1 Background...................................... 135 8.2 The Need For Path-Sensitivity In Device-Driver Analysis.............. 135 8.3 Path-Sensitive VSA.................................. 138 8.4 Experiments...................................... 140 9 Related Work ....................................... 148 9.1 Information About Memory Accesses in Executables................ 148 9.2 Identification of Structures.............................. 154 9.3 Recency-Abstraction For Heap-Allocated Storage.................. 156 10 Conclusions And Future Directions ........................... 159 LIST OF REFERENCES ................................... 163 DISCARD THIS PAGE vii LIST OF TABLES Table Page

Ph.D Thesis: WYSINWYX What You See Is Not What You Execute

Advanced Slicing of Sequential and Concurrent Programs Jens Krinke

Workshop on the Evaluation of Software Defect Detection Tools

Self-Adjusting Computation with Delta ML

Adaptive Functional Programming∗

Secure Programs Via Game-Based Synthesis

Incremental Attribute Evaluation for Multi-User Semantics-Based Editors