Dissertation Presented in Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Total Page:16
File Type:pdf, Size:1020Kb
Design and Implementation of an Ahead-of-Time Compiler for PHP by Paul Biggar Dissertation Presented in fulfillment of the requirements for the Degree of Doctor of Philosophy TRINITY COLLEGE DUBLIN APRIL, 2010 Declaration I, the undersigned, declare that this work has not previously been submitted as an exercise for a degree at this, or any other University, and that unless otherwise stated, is my own work. Paul Biggar 14th April, 2010 ii Permission to Lend and/or Copy I, the undersigned, agree that Trinity College Library may lend or copy this disserta- tion upon request. Paul Biggar 14th April, 2010 iii Abstract In recent years the importance of dynamic scripting languages — such as PHP, Python, Ruby and Javascript — has grown as they are used for an increasing amount of software development. Scripting languages provide high-level language features, a fast compile- modify-test environment for rapid prototyping, strong integration with database and web development systems, and extensive standard libraries. PHP powers many of the most popular web applications such as Facebook, Wikipedia and Yahoo. In general, there is a trend towards writing an increasing amount of an application in a scripting language rather than in a traditional programming language, not least to avoid the complexity of crossing between languages. Despite their increasing popularity, most scripting language implementations remain interpreted. Typically, these implementations are slow, between one and two orders of magnitude slower than C. Improving the performance of scripting language imple- mentations would be of significant benefit, however many of the features of scripting languages make them difficult to compile ahead of time. In this dissertation we argue that ahead-of-time compilation of scripting languages is both possible and valuable. We present phc, an ahead-of-time compiler for the PHP language. We describe the design and implementation of the compiler, and identify specific challenges in the design of a compiler for a dynamic scripting language. We determine that there are three important features of scripting languages that are difficult to compile or reimplement. Since scripting languages are defined primarily through the semantics of their original implementations, they often change semantics between releases. They provide C APIs, used both for foreign-function interfaces and to write third-party extensions. These APIs typically have tight integration with the original implementation, and are used to provide large standard libraries, which are difficult to re-use, and costly to reimplement. Finally, they support run-time code generation. These features make the important goal of correctness difficult to achieve for compilers and reimplementations. We present a technique to support these features in our ahead-of-time compiler for PHP. Our technique uses the original PHP implementation through the provided PHP C API, both in our compiler, and in our generated code. We support all of v vi these important scripting language features, particularly focusing on the correctness of compiled programs. Additionally, our approach allows us to automatically support limited future language changes. We present a discussion and performance evaluation of this technique, which we show increases the execution speed of PHP programs by 1.55x in our benchmarks. In order to improve this performance, static analysis and optimization are required. However, static analysis of scripting languages such as PHP is difficult due to the features found in these languages. These features include dynamic typing with implicit type conversions, dynamic aliasing, implicit object and array creation, and overloading of simple operators. We find that as a result, simple analysis techniques such as SSA and def-use chains are not straightforward to use, and that a single unconstrained variable can ruin our analysis. We describe the challenging semantics of PHP, and a static analyser to model them. Our analysis combines alias analysis, type-inference and constant-propagation for PHP, computing results that are essential for other analyses and optimizations. Our empirical results show that our analysis is capable of determining that almost all program variables are unaliased, and that dynamic types for over 60% of variables can be statically determined by our analysis. Acknowledgements More than anyone, I’d like to thank David Gregg for his supervision, help and advice throughout the last six and a half years. He guided me through my undergraduate research, through my PhD applications, and throughout my PhD. I think it is fair to say that nearly everything I know about research comes from David. Many supervisors are hands-off, David is certainly not. He has read everything I produced, he met me frequently to guide my research and his door is always open. His expertise on all manner of topics related to compilers and research has been invaluable. It is the rare problem that David could not help me on. David made it his business to understand everything I did. I’m not sure he knew anything about scripting languages when I met him, but he spent a considerable amount of time and effort to understand my research thoroughly. From the moment I started this PhD, David has been there for me. Any time there was a problem, he had time to help with it. In particular, at the start of 2007 I went through a particularly bad time. My then-current research wasn’t going anywhere, and a change was needed. At the time, neither of us was really sure I’d make it to finish my PhD. During that time in particular, his help and guidance was very much appreciated. Many of my colleagues merit thanks, in particular our research group: Kevin Williams, Nicholas Nash, Robert (Bobb) Crosbie, Jason McCandless and (Raymond) Scott Manley. They listened to my ideas and critiqued my work. Kevin also worked on scripting languages, and I learned a lot from chatting to him. Nick did an amazing job in our first paper, and I’m very proud of the result. John Gilbert and Edsko de Vries were collaborators on phc. I sat in their office for the first year of my research, and I think I learned more about compilers from them than from any other source. Of course, they also started phc, and invited me to work on it, to which I owe them a great debt. I don’t think the compiler I would have otherwise worked on would have been anything as successful. I want to thank Edsko in particular. Working with Edsko is a lesson in how to code vii viii well, and I learned a great deal from working on phc with him. He read a lot of my code, my commit logs, and all my papers. Arguing with Edsko helped me flesh out a great many ideas; no unclear idea gets past him. He also took on the herculean effort of reading my whole PhD in one night. I also want to thank Kevin, Nick, Bobb, Jason, Scott, John and Edsko for their friendship. It would have been a long four years without it. Jimmy Cleary interned with me during the summer of 2009. He worked on many aspects of phc, writing features, fixing bugs, and running a massive amount of exper- iments. His output far outstripped the time it took to train him in, and I honestly believe I would not have this PhD finished without his help. Many people took the time to read my papers and dissertation, and offer comments, including David, Kevin, Nick, Bobb, Jason, Scott, Edsko, John and Jimmy. Thanks also to Nuno Lopes, Graham Kelly, Christopher Jones, Alex Gaynor, James Stanier, Cornelius Riemenschneider, and Etienne Kneuss, who all provided valuable feedback. Peter Hayes taught me the trick of reading my papers out loud. I believe this made a considerable difference to the quality of my writing. I travelled considerably throughout my PhD. I learned a great deal from travelling to PLDI, ASPLOS and HOPL, from speaking to too many people to list here. Seminars and summer schools also taught me a great deal. Thanks to Markus Schordan, Sebas- tian Hack and Fabrice Rastello, and Koen de Bosschere for accepting me to Dagstuhl, the SSA seminar and the ACACES summer school, respectively. I learned a great deal, and met a lot of interesting people. Lectures from Mike Hind, Kathryn McKinley and Barbara Ryder were particularly interesting and inspiring. I had many opportunities thanks to the kind help of others. I want to thank Dan Quinlan for his invitation to Livermore, to Ian Taylor for arranging the Google Tech Talk, to Brian Shire for having me to Facebook, and to Daniel Berlin for mentoring me on gcc during the Google Summer of Code. These seemingly small opportunities have benefited me much more than you might know. My work was funded by the Irish Research Council for Science, Engineering and Technology funded by the National Development Plan. I would have been completely unable to work on a PhD without this funding. The Dun Laoghaire/Rathdown County Council paid my fees, and gave me a small stipend. I want to thank them for their help. ix My family and friends have been very supportive throughout this PhD. No amount of thanks can express how much I value their friendship and love. I could never have gotten to where I am without my fiancée, Orla McHenry. The last two years have been a lot of work and stress, and without her to welcome me home, I would have struggled to make it through. She brought me cookies when I worked late, cheered me up when I was down, and told me it would be fine when I worried. She was always there when I needed her. Knowing how proud she is of me always makes me smile.