An Executable Formal Semantics of PHP with Applications to Program Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Imperial College London Department of Computing An Executable Formal Semantics of PHP with Applications to Program Analysis Daniele Filaretti Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Computing of the Imperial College London and the Diploma of the Imperial College, 2015 Abstract Nowadays, many important activities in our lives involve the web. However, the software and protocols on which web applications are based were not designed with the appropriate level of security in mind. Many web applications have reached a level of complexity for which testing, code reviews and human inspection are no longer sufficient quality-assurance guarantees. Tools that employ static analysis techniques are needed in order to explore all possible execution paths through an application and guarantee the absence of undesirable behaviours. To make sure that an analysis captures the properties of interest, and to navigate the trade-offs between efficiency and precision, it is necessary to base the design and the development of static analysis tools on a firm understanding of the language to be analysed. When this underlying knowledge is missing or erroneous, tools can’t be trusted no matter what advanced techniques they use to perform their task. In this Thesis, we introduce KPHP, the first executable formal semantics of PHP, one of the most popular languages for server-side web programming. Then, we demonstrate its practical relevance by developing two verification tools, of increasing complexity, on top of it - a simple verifier based on symbolic execution and LTL model checking and a general purpose, fully configurable and extensible static analyser based on Abstract Interpretation. Our LTL-based tool leverages the existing symbolic execution and model checking support offered by K, our semantics framework of choice, and constitutes a first proof-of-concept of the usefulness of our semantics. Our abstract interpreter, on the other hand, represents a more significant and novel contribution to the field of static analysis of dynamic scripting languages (PHP in particular). Although our tool is still a prototype and therefore not well suited for handling large real-world codebases, we demonstrate how our semantics-based, principled approach to the development of verification tools has lead to the design of static analyses that outperform existing tools and approaches, both in terms of supported language features, precision, and breadth of possible applications. i ii © The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, dis- tribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work iii iv Statement of Originality The work contained in this thesis has not been previously submitted for a degree or diploma at any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due references are made. v vi Acknowledgements I would like to thank my supervisor Sergio Maffeis for offering me this great opportunity and for guiding and supporting me during all the stages of my PhD; my examiners Sophia Drossopoulou and Marco Carbone for their time, feedback and help in improving my work. I also would like to thank my parents for their patience, understanding and support, and the friends I met here in London for all the fun and the good times. Finally, I would like to thank EPSRC for their financial support. vii ‘For if one’s starting point is something unkonwn, and one’s conclusion and intermediate steps are made up of unknowns also, how can the resulting consistency ever by any manner of means become knowldege?’ Plato, The Republic, Book VII viii Contents Abstract i Acknowledgements vii I Introduction 1 1 Problem description and motivations 2 1.1 The (in)security of web applications . .2 1.2 The role of formal methods . .3 1.3 Contributions and thesis outline . .5 II Background 7 2 Semantics engineering 8 2.1 Introduction to the formal semantics of programming languages . .8 2.2 Tools for semantics engineering . 14 2.2.1 Overview . 14 2.2.2 The K Framework . 15 2.3 Mechanised specifications or real-world programming languages . 31 2.3.1 λJS ......................................... 33 2.3.2 λπ ......................................... 34 2.3.3 The semantics of C in K ............................. 34 2.3.4 JSCert . 35 ix x CONTENTS 2.3.5 K Java....................................... 36 2.3.6 K JS........................................ 37 2.3.7 Conclusion . 38 3 Web Applications 41 3.1 Basics . 41 3.1.1 The HTTP protocol . 41 3.1.2 Client and server-side technologies . 44 3.1.3 State: cookies and sessions . 45 3.2 PHP . 46 3.2.1 Hello World Wide Web . 48 3.2.2 A Closer Look . 49 3.2.3 Facebook’s PHP specification . 54 3.3 Web security . 55 3.3.1 Overview . 55 3.3.2 Taint style vulnerabilities . 58 4 Static analysis and security 64 4.1 The nature of static analysis . 66 4.2 Design tradeoffs . 68 4.2.1 Soundness vs completeness . 68 4.2.2 Precision vs scalability . 70 4.3 Approximating all possible runtime behaviours . 70 4.4 Static analysis vocabulary . 73 4.4.1 Exploring all paths . 74 4.4.2 Abstraction . 76 4.4.3 Increasing precision with sensitivities . 79 4.5 A primer on Abstract Interpretation . 84 4.5.1 A bird’s eye view . 84 4.5.2 Case study: abstract interpretation of arithmetic expressions . 86 4.5.3 Fixpoint abstraction . 101 CONTENTS xi 4.6 Related techniques: symbolic and concolic execution . 102 4.7 Static analysis tools for PHP . 105 III Contributions 114 5 KPHP: an executable formal semantics of PHP 115 5.1 Syntax . 118 5.2 Semantics . 119 5.2.1 Memory Layout . 119 5.2.2 Configuration . 126 5.2.3 Copy-on-write in the Zend engine . 129 5.2.4 Semantic rules . 130 5.3 Putting it all together: KPHP . 162 5.4 Testing and Validation . 163 5.4.1 Test-driven semantics development . 164 5.4.2 Comparison with Facebook’s PHP specification . 164 5.4.3 Validation . 165 5.4.4 Coverage . 166 5.5 Limitations, Perspective and Future Work . 167 6 Verification via LTL model checking 170 6.1 Model checking and symbolic execution in K ...................... 170 6.2 Temporal verification of PHP programs . 171 6.2.1 Symbolic values in PHP programs . 171 6.2.2 State transitions . 172 6.2.3 LTL . 172 6.2.4 A temporal logic for PHP . 176 6.2.5 Example . 178 6.3 Case studies . 180 6.3.1 Input Validation . 180 6.3.2 Cryptographic Key Generation . 183 xii CONTENTS 6.4 Discussion and Limitations . 184 7 A general purpose static analyser for PHP 186 7.1 Methodology and overview . 187 7.2 Parametrising the domain of computation . 189 7.2.1 Abstract domains . 190 7.2.2 Abstraction . 196 7.2.3 Putting the domain into action . 197 7.2.4 Partitioning the semantics by isolating domain operations . 199 7.2.5 Concrete domain . 202 7.2.6 Example . 203 7.3 Lifting the semantics . 204 7.3.1 Conditionals . 206 7.3.2 While loop . 212 7.3.3 Functions and recursion . 219 7.4 Merging configurations . 231 7.4.1 Challenges . 232 7.4.2 Updates to the memory model . 236 7.4.3 Overview of the merging procedure . 238 7.4.4 Merging return values . 238 7.4.5 Merging status codes . 240 7.4.6 Merging arrays and memories . 241 7.5 Array and object access, revisited . 259 7.5.1 Strong vs Weak updates . 261 7.5.2 Overlapping keys . ..