Information Flow Control with System Dependence Graphs Improving Modularity, Scalability and Precision for Object Oriented Languages
Total Page:16
File Type:pdf, Size:1020Kb
Information Flow Control with System Dependence Graphs Improving Modularity, Scalability and Precision for Object Oriented Languages zur Erlangung des akademischen Grads eines Doktors der Ingenieurswissenschaften der Fakultät für Informatik des Karlsruher Instituts für Technologie (KIT) genehmigte Dissertation von Jürgen Graf aus Bad Kötzting Tag der mündlichen Prüfung: 14.11.2016 Erster Gutachter: Prof. Dr.-Ing. Gregor Snelting Zweiter Gutachter: Prof. Dr. Walter F. Tichy This document is licensed under the Creative Commons Attribution – Share Alike 3.0 DE License (CC BY-SA 3.0 DE): http://creativecommons.org/licenses/by-sa/3.0/de/ Contents 1 Introduction1 1.1 Information flow control...................2 1.1.1 Example: A program with illegal information flow2 1.1.2 Attacker model....................3 1.1.3 Noninterference and low-deterministic security.3 1.1.4 Observational determinism for multithreaded programs.......................4 1.1.5 Declassification....................5 1.1.6 Ideal functionality..................5 1.2 Program slicing and information flow control.......6 1.2.1 Program slicing with dependence graphs.....6 1.2.2 Dependence graphs and information flow control6 1.2.3 Slicing object-oriented languages..........8 1.2.4 Slicing multithreaded programs..........9 1.3 Contributions.........................9 1.4 Related work.......................... 12 2 Information Flow Control with System Dependence Graphs for Object-oriented Languages 15 2.1 Analyzing object-oriented languages: Challenges and opportunities................ 15 2.1.1 Static variables.................... 16 2.1.2 Runtime libraries................... 17 2.1.3 Static initialization.................. 18 2.1.4 Reflection....................... 19 2.1.5 Types and object-fields................ 20 2.1.6 Aliasing........................ 21 2.1.7 Dynamic dispatch.................. 23 2.1.8 Exceptions....................... 24 2.1.9 Threads........................ 26 2.2 Precision in static analyses.................. 28 2.2.1 Context-sensitive................... 28 CONTENTS 2.2.2 Object-sensitive.................... 29 2.2.3 Field-sensitive..................... 30 2.2.4 Flow-sensitive..................... 31 2.2.5 Precision options for multithreaded programs.. 32 2.3 Intraprocedural analysis................... 34 2.3.1 Overview....................... 34 2.3.2 Intermediate representation............. 35 2.3.3 Control flow graph.................. 38 2.3.4 Control dependence graph............. 39 2.3.5 Data dependencies.................. 40 2.3.6 Procedure dependence graph............ 42 2.4 Enhancing the intraprocedural analysis.......... 44 2.4.1 Control-flow optimizations for exceptions.... 44 2.4.2 Termination-sensitive control dependencies... 49 2.4.3 Fine-grained field access............... 52 2.5 Interprocedural analysis................... 60 2.5.1 Call graph....................... 60 2.5.2 Points-to analysis................... 63 2.5.3 System dependence graph.............. 81 2.5.4 Summary edges and HRB slicing.......... 85 2.6 Parameter-model....................... 89 2.6.1 Unstructured model................. 91 2.6.2 Object-tree model................... 93 2.6.3 Object-graph model................. 96 2.6.4 Computation..................... 98 2.6.5 Evaluation....................... 112 2.6.6 Conclusion...................... 125 3 A Modular Approach to Information Flow with SDGs 129 3.1 Overview — Goals and limitations............. 132 3.2 Information flow in unknown context........... 136 3.2.1 Enumerating all contexts.............. 136 3.2.2 Context configurations: Order and monotonicity. 139 3.2.3 Inferring relevant context conditions........ 152 3.3 FlowLess: A language for information flow annotations. 170 3.3.1 Overview and syntax................. 172 3.3.2 Building context stubs from annotation...... 175 3.3.3 Example........................ 183 iv CONTENTS 3.4 Modular SDG......................... 185 3.4.1 Overview....................... 187 3.4.2 Conditional data dependencies........... 188 3.4.3 Access paths...................... 189 3.4.4 Precomputation of summary information..... 217 3.4.5 Evaluation....................... 221 4 Applications of Information Flow Control 225 4.1 Information flow control in practice with Joana...... 225 4.2 Proving cryptographic indistinguishabiliy with Joana.. 230 4.2.1 Parameterized initialization and IFC........ 234 4.2.2 Example specific analysis code........... 240 4.3 Combining analysis tools for hybrid verification – An example............................ 243 4.4 Verifying a simple example with a novel hybrid approach 248 5 Conclusion 257 A Sourcecode of Client-Server Example 261 B Sourcecode of the Simple e-Voting Example 271 List of Figures 293 List of Tables 297 List of Algorithms 299 Bibliography 301 v Acknowledgements First of all, I wish to thank my advisor Gregor Snelting for giving me to opportunity to do this work. I thank my parents for their support during all that time from my early education to enabling my studies in computer science. A special thanks to my partner Nicole that went with me through all ups and downs during the work on this thesis. I thank all my current and former coworkers Martin Mohr, Martin Hecker, Christian Hammer and Dennis Giffhorn for the interesting dis- cussions and their ideas and work that resulted in many improvements. My other coworkers working in a different field also helped through numerous dicussions and provided new insights with their different point of view. A “thank you” goes to: Matthias Braun, Joachim Breitner, Sebastian Buchwald, Andreas Lochbihler, Denis Lohner, Manuel Mohr, Daniel Wasserrab and Andreas Zwinkau. I also thank Brigitte Sehan for providing support in the background and the numerous students that worked on parts of Joana. Finally I want to thank my former teacher Werner Steidl for sparking my interest in computer science with the voluntary computer science classes he taught in his spare time. Abstract This work is concerned with the field of static program analysis —in particular with analyses aimed to guarantee certain security properties of programs, like confidentiality and integrity. Our approach uses so- called dependence graphs to capture the program behavior as well as the information flow between the individual program points. Using this technique, we can guarantee for example that a program does not reveal any information about a secret password. In particular we focus on techniques that improve the dependence graph computation —the basis for many advanced security analyses. We incorporated the presented algorithms and improvements into our analysis tool Joana and published its source code as open source. Several collaborations with other researchers and publications using Joana demonstrate the relevance of these improvements for practical research. This work consists essentially of three parts. Part 1 deals with improve- ments in the computation of the dependence graph, Part 2 introduces a new approach to the analysis of incomplete programs and Part 3 shows current use cases of Joana on concrete examples. In the first part we describe the algorithms used to compute a de- pendence graph, with special attention to the problems and challenges that arise when analyzing object-oriented languages such as Java. For example we present an analysis that improves the precision of detected control flow by incorporating the effects of exceptions. The main im- provement concerns the way side effects —caused by communication over methods boundaries— are modelled. Dependence graphs capture side effects —memory locations read or changed by a method— in the form of additional nodes called parameter nodes. We show that the structure and computation of these nodes have a huge impact on both the precision and scalability of the entire analysis. The so-called parameter model describes the algorithms used to compute these nodes. We explain the weakness of the old parameter model based on object-trees and present our improvements in form of a new model using object-graphs. The new graph structure merges redundant information of multiple nodes into a single node and thus reduces the number of overall parameter nodes ABSTRACT significantly — which in turn speeds up the analysis without sacrificing the precision of the resulting dependence graph. Theses changes are already visible when analyzing smaller programs with a few thousand lines of code: We achieve on average a 8 times faster runtime while the precision of the result remains intact and is usually even enhanced. The differences are even more pronounced for larger programs. Some of our test cases and all tested programs larger then 20,000 lines of code could only be analyzed with the object-graph parameter model. Due to these enhancements Joana is now able to analyze much larger programs and also profits from enhanced precision with smaller programs. In the second part we tackle the problem that security analyses based on dependence graphs previously required a whole program in order to compute. For example it was impossible to preprocess or analyze program parts like library code without knowledge of the application code using it. We discovered a monotonicity property in the current analysis that allows us to reuse analysis results from a program part at a given usage point to conservatively approximate the expected results at another point without the need to reanalyze the program part. Due to monotonicity we are able to make valid statements about the security properties of a program part in general, without knowledge