Dissertation Presented in Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Total Page:16

File Type:pdf, Size:1020Kb

Dissertation Presented in Fulfillment of the Requirements for the Degree of Doctor of Philosophy Design and Implementation of an Ahead-of-Time Compiler for PHP by Paul Biggar Dissertation Presented in fulfillment of the requirements for the Degree of Doctor of Philosophy TRINITY COLLEGE DUBLIN APRIL, 2010 Declaration I, the undersigned, declare that this work has not previously been submitted as an exercise for a degree at this, or any other University, and that unless otherwise stated, is my own work. Paul Biggar 14th April, 2010 ii Permission to Lend and/or Copy I, the undersigned, agree that Trinity College Library may lend or copy this disserta- tion upon request. Paul Biggar 14th April, 2010 iii Abstract In recent years the importance of dynamic scripting languages — such as PHP, Python, Ruby and Javascript — has grown as they are used for an increasing amount of software development. Scripting languages provide high-level language features, a fast compile- modify-test environment for rapid prototyping, strong integration with database and web development systems, and extensive standard libraries. PHP powers many of the most popular web applications such as Facebook, Wikipedia and Yahoo. In general, there is a trend towards writing an increasing amount of an application in a scripting language rather than in a traditional programming language, not least to avoid the complexity of crossing between languages. Despite their increasing popularity, most scripting language implementations remain interpreted. Typically, these implementations are slow, between one and two orders of magnitude slower than C. Improving the performance of scripting language imple- mentations would be of significant benefit, however many of the features of scripting languages make them difficult to compile ahead of time. In this dissertation we argue that ahead-of-time compilation of scripting languages is both possible and valuable. We present phc, an ahead-of-time compiler for the PHP language. We describe the design and implementation of the compiler, and identify specific challenges in the design of a compiler for a dynamic scripting language. We determine that there are three important features of scripting languages that are difficult to compile or reimplement. Since scripting languages are defined primarily through the semantics of their original implementations, they often change semantics between releases. They provide C APIs, used both for foreign-function interfaces and to write third-party extensions. These APIs typically have tight integration with the original implementation, and are used to provide large standard libraries, which are difficult to re-use, and costly to reimplement. Finally, they support run-time code generation. These features make the important goal of correctness difficult to achieve for compilers and reimplementations. We present a technique to support these features in our ahead-of-time compiler for PHP. Our technique uses the original PHP implementation through the provided PHP C API, both in our compiler, and in our generated code. We support all of v vi these important scripting language features, particularly focusing on the correctness of compiled programs. Additionally, our approach allows us to automatically support limited future language changes. We present a discussion and performance evaluation of this technique, which we show increases the execution speed of PHP programs by 1.55x in our benchmarks. In order to improve this performance, static analysis and optimization are required. However, static analysis of scripting languages such as PHP is difficult due to the features found in these languages. These features include dynamic typing with implicit type conversions, dynamic aliasing, implicit object and array creation, and overloading of simple operators. We find that as a result, simple analysis techniques such as SSA and def-use chains are not straightforward to use, and that a single unconstrained variable can ruin our analysis. We describe the challenging semantics of PHP, and a static analyser to model them. Our analysis combines alias analysis, type-inference and constant-propagation for PHP, computing results that are essential for other analyses and optimizations. Our empirical results show that our analysis is capable of determining that almost all program variables are unaliased, and that dynamic types for over 60% of variables can be statically determined by our analysis. Acknowledgements More than anyone, I’d like to thank David Gregg for his supervision, help and advice throughout the last six and a half years. He guided me through my undergraduate research, through my PhD applications, and throughout my PhD. I think it is fair to say that nearly everything I know about research comes from David. Many supervisors are hands-off, David is certainly not. He has read everything I produced, he met me frequently to guide my research and his door is always open. His expertise on all manner of topics related to compilers and research has been invaluable. It is the rare problem that David could not help me on. David made it his business to understand everything I did. I’m not sure he knew anything about scripting languages when I met him, but he spent a considerable amount of time and effort to understand my research thoroughly. From the moment I started this PhD, David has been there for me. Any time there was a problem, he had time to help with it. In particular, at the start of 2007 I went through a particularly bad time. My then-current research wasn’t going anywhere, and a change was needed. At the time, neither of us was really sure I’d make it to finish my PhD. During that time in particular, his help and guidance was very much appreciated. Many of my colleagues merit thanks, in particular our research group: Kevin Williams, Nicholas Nash, Robert (Bobb) Crosbie, Jason McCandless and (Raymond) Scott Manley. They listened to my ideas and critiqued my work. Kevin also worked on scripting languages, and I learned a lot from chatting to him. Nick did an amazing job in our first paper, and I’m very proud of the result. John Gilbert and Edsko de Vries were collaborators on phc. I sat in their office for the first year of my research, and I think I learned more about compilers from them than from any other source. Of course, they also started phc, and invited me to work on it, to which I owe them a great debt. I don’t think the compiler I would have otherwise worked on would have been anything as successful. I want to thank Edsko in particular. Working with Edsko is a lesson in how to code vii viii well, and I learned a great deal from working on phc with him. He read a lot of my code, my commit logs, and all my papers. Arguing with Edsko helped me flesh out a great many ideas; no unclear idea gets past him. He also took on the herculean effort of reading my whole PhD in one night. I also want to thank Kevin, Nick, Bobb, Jason, Scott, John and Edsko for their friendship. It would have been a long four years without it. Jimmy Cleary interned with me during the summer of 2009. He worked on many aspects of phc, writing features, fixing bugs, and running a massive amount of exper- iments. His output far outstripped the time it took to train him in, and I honestly believe I would not have this PhD finished without his help. Many people took the time to read my papers and dissertation, and offer comments, including David, Kevin, Nick, Bobb, Jason, Scott, Edsko, John and Jimmy. Thanks also to Nuno Lopes, Graham Kelly, Christopher Jones, Alex Gaynor, James Stanier, Cornelius Riemenschneider, and Etienne Kneuss, who all provided valuable feedback. Peter Hayes taught me the trick of reading my papers out loud. I believe this made a considerable difference to the quality of my writing. I travelled considerably throughout my PhD. I learned a great deal from travelling to PLDI, ASPLOS and HOPL, from speaking to too many people to list here. Seminars and summer schools also taught me a great deal. Thanks to Markus Schordan, Sebas- tian Hack and Fabrice Rastello, and Koen de Bosschere for accepting me to Dagstuhl, the SSA seminar and the ACACES summer school, respectively. I learned a great deal, and met a lot of interesting people. Lectures from Mike Hind, Kathryn McKinley and Barbara Ryder were particularly interesting and inspiring. I had many opportunities thanks to the kind help of others. I want to thank Dan Quinlan for his invitation to Livermore, to Ian Taylor for arranging the Google Tech Talk, to Brian Shire for having me to Facebook, and to Daniel Berlin for mentoring me on gcc during the Google Summer of Code. These seemingly small opportunities have benefited me much more than you might know. My work was funded by the Irish Research Council for Science, Engineering and Technology funded by the National Development Plan. I would have been completely unable to work on a PhD without this funding. The Dun Laoghaire/Rathdown County Council paid my fees, and gave me a small stipend. I want to thank them for their help. ix My family and friends have been very supportive throughout this PhD. No amount of thanks can express how much I value their friendship and love. I could never have gotten to where I am without my fiancée, Orla McHenry. The last two years have been a lot of work and stress, and without her to welcome me home, I would have struggled to make it through. She brought me cookies when I worked late, cheered me up when I was down, and told me it would be fine when I worried. She was always there when I needed her. Knowing how proud she is of me always makes me smile.
Recommended publications
  • Types and Programming Languages by Benjamin C
    < Free Open Study > . .Types and Programming Languages by Benjamin C. Pierce ISBN:0262162091 The MIT Press © 2002 (623 pages) This thorough type-systems reference examines theory, pragmatics, implementation, and more Table of Contents Types and Programming Languages Preface Chapter 1 - Introduction Chapter 2 - Mathematical Preliminaries Part I - Untyped Systems Chapter 3 - Untyped Arithmetic Expressions Chapter 4 - An ML Implementation of Arithmetic Expressions Chapter 5 - The Untyped Lambda-Calculus Chapter 6 - Nameless Representation of Terms Chapter 7 - An ML Implementation of the Lambda-Calculus Part II - Simple Types Chapter 8 - Typed Arithmetic Expressions Chapter 9 - Simply Typed Lambda-Calculus Chapter 10 - An ML Implementation of Simple Types Chapter 11 - Simple Extensions Chapter 12 - Normalization Chapter 13 - References Chapter 14 - Exceptions Part III - Subtyping Chapter 15 - Subtyping Chapter 16 - Metatheory of Subtyping Chapter 17 - An ML Implementation of Subtyping Chapter 18 - Case Study: Imperative Objects Chapter 19 - Case Study: Featherweight Java Part IV - Recursive Types Chapter 20 - Recursive Types Chapter 21 - Metatheory of Recursive Types Part V - Polymorphism Chapter 22 - Type Reconstruction Chapter 23 - Universal Types Chapter 24 - Existential Types Chapter 25 - An ML Implementation of System F Chapter 26 - Bounded Quantification Chapter 27 - Case Study: Imperative Objects, Redux Chapter 28 - Metatheory of Bounded Quantification Part VI - Higher-Order Systems Chapter 29 - Type Operators and Kinding Chapter 30 - Higher-Order Polymorphism Chapter 31 - Higher-Order Subtyping Chapter 32 - Case Study: Purely Functional Objects Part VII - Appendices Appendix A - Solutions to Selected Exercises Appendix B - Notational Conventions References Index List of Figures < Free Open Study > < Free Open Study > Back Cover A type system is a syntactic method for automatically checking the absence of certain erroneous behaviors by classifying program phrases according to the kinds of values they compute.
    [Show full text]
  • Typed Open Programming
    Typed Open Programming A higher-order, typed approach to dynamic modularity and distribution Preliminary Version Andreas Rossberg Dissertation zur Erlangung des Grades des Doktors der Ingenieurwissenschaften der Naturwissenschaftlich-Technischen Fakult¨aten der Universit¨at des Saarlandes Saarbr¨ucken, 5. Januar 2007 Dekan: Prof. Dr. Andreas Sch¨utze Erstgutachter: Prof. Dr. Gert Smolka Zweitgutachter: Prof. Dr. Andreas Zeller ii Abstract In this dissertation we develop an approach for reconciling open programming – the development of programs that support dynamic exchange of higher-order values with other processes – with strong static typing in programming languages. We present the design of a concrete programming language, Alice ML, that consists of a conventional functional language extended with a set of orthogonal features like higher-order modules, dynamic type checking, higher-order serialisation, and concurrency. On top of these a flexible system of dynamic components and a simple but expressive notion of distribution is realised. The central concept in this design is the package, a first-class value embedding a module along with its interface type, which is dynamically checked whenever the module is extracted. Furthermore, we develop a formal model for abstract types that is not invalidated by the presence of primitives for dynamic type inspection, as is the case for the standard model based on existential quantification. For that purpose, we present an idealised language in form of an extended λ-calculus, which can express dynamic generation of types. This calculus is the first to combine and explore the interference of sealing and type inspection with higher-order singleton kinds, a feature for expressing sharing constraints on abstract types.
    [Show full text]
  • Definition of a Type System for Generic and Reflective Graph
    Definition of a Type System for Generic and Reflective Graph Transformations Vom Fachbereich Elektrotechnik und Informationstechnik der Technischen Universitat¨ Darmstadt zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von Dipl.-Ing. Elodie Legros Geboren am 09.01.1982 in Vitry-le-Franc¸ois (Frankreich) Referent: Prof. Dr. rer. nat. Andy Schurr¨ Korreferent: Prof. Dr. Bernhard Westfechtel Tag der Einreichung: 17.12.2013 Tag der mundlichen¨ Prufung:¨ 30.06.2014 D17 Darmstadt 2014 Schriftliche Erklarung¨ Gemaߨ x9 der Promotionsordnung zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Uni- versitat¨ Darmstadt Ich versichere hiermit, dass ich die vorliegende Dissertation allein und nur unter Verwendung der angegebenen Literatur verfasst habe. Die Arbeit hat bisher noch nicht zu Prufungszwecken¨ gedient. Frederiksberg (Danemark),¨ den 17.12.2013 ..................................... Elodie Legros Acknowledgment My very special thanks go to my advisor, Prof. Dr. rer. nat. Andy Schurr,¨ who has supported me during all phases of this thesis. His many ideas and advices have been a great help in this work. He has always been available every time I had questions or wanted to discuss some points of my work. His patience in reviewing and proof-reading this thesis, especially after I left the TU Darmstadt and moved to Denmark, deserves my gratitude. I am fully convinced that I would not have been able to achieve this thesis without his unswerving support. For all this: thank you! Another person I am grateful to for his help is Prof. Dr. Bernhard Westfechtel. Reviewing a thesis is no easy task, and I want to thank him for having assumed this role and helped me in correcting details I missed in my work.
    [Show full text]
  • Programming Language 1 Programming Language
    Programming language 1 Programming language A programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human communication. The earliest programming languages predate the invention of the computer, and were used to direct the behavior of machines such as Jacquard looms and player pianos. Thousands of different programming languages have been created, mainly in the computer field, with many more being created every year. Most programming languages describe computation in an imperative style, i.e., as a sequence of commands, although some languages, such as those that support functional programming or logic programming, use alternative forms of description. A programming language is usually split into the two components of syntax (form) and semantics (meaning) and many programming languages have some kind of written specification of their syntax and/or semantics. Some languages are defined by a specification document, for example, the C programming language is specified by an ISO Standard, while other languages, such as Perl, have a dominant implementation that is used as a reference. Definitions A programming language is a notation for writing programs, which are specifications of a computation or algorithm.[1] Some, but not all, authors restrict the term "programming language" to those languages that can express all possible algorithms.[1] [2] Traits often considered important for what constitutes a programming language include: • Function and target: A computer programming language is a language[3] used to write computer programs, which involve a computer performing some kind of computation[4] or algorithm and possibly control external devices such as printers, disk drives, robots,[5] and so on.
    [Show full text]
  • Proceedings of the 2014 Scheme and Functional Programming Workshop
    PROCEEDINGS OF THE 2014 SCHEME AND FUNCTIONAL PROGRAMMING WORKSHOP Washington, DC, 19 November 2014 Edited by Jason Hemann Indiana University John Clements California Polytechnic State University 2015 Preface This volume contains the papers presented at Scheme '14, the 2014 Scheme and Functional Programming Workshop held on November 19, 2014 in Washington, DC. This year's workshop had more than 60 registered participants from insti- tutions throughout the United States and across the globe. For the second time this year's workshop was co-located with Clojure/conj. There were 13 submissions. Each submission was reviewed by at least 3 pro- gram committee members. The committee decided to accept 8 papers. Not in- cluded in these proceedings are the update on R7RS Working Group 2's progress by John Cowan, and Andy Wingo's keynote talk, What Scheme Can Learn from Javascript. Papers are listed by their order of presentation at the workshop. We would like to acknowledge the hard work of everyone who helped make Scheme '14 possible, including the administrative staff at Indiana University, the events management group at the Grand Hyatt Washington, the organizers of Clojure/conj, and the rest of our program committee. We would also like to thank Beckman Coulter, Cisco, and the Computer Science department at Indiana University for their sponsorship. September 11, 2015 Jason Hemann Bloomington, Indiana John Clements v Table of Contents Implementing R7RS on R6RS Scheme system :::::::::::::::::::::::: 1 Takashi Kato Code Versioning and Extremely Lazy
    [Show full text]
  • Please Note That Some Names Will Be Duplicated in Capitalized Form
    Index Please note that some names will be duplicated in capitalized form. Following Java style, the capitalized names refer to Java classes, while lowercase names refer to a general concept. @SuppressWarnings · 1060 @Target · 1061 ! @Test · 1060 @Test, for @Unit · 1084 ! · 105 @TestObjectCleanup, @Unit tag · 1092 != · 103 @TestObjectCreate, for @Unit · 1089 @throws · 87 @Unit · 1084; using · 1084 & @version · 85 & · 111 && · 105 [ &= · 111 [ ], indexing operator · 193 . ^ .NET · 57 .new syntax · 350 ^ · 111 .this syntax · 350 ^= · 111 @ | @ symbol, for annotations · 1059 | · 111 @author · 86 || · 105 @Deprecated, annotation · 1060 |= · 111 @deprecated, Javadoc tag · 87 @docRoot · 85 @inheritDoc · 85 @interface, and extends keyword · 1070 + @link · 85 @Override · 1059 + · 101; String conversion with operator + · @param · 86 95, 118, 504 @Retention · 1061 @return · 86 @see · 85 @since · 86 1463 addListener · 1321 < Adler32 · 975 agent-based programming · 1299 < · 103 aggregate array initialization · 193 << · 112 aggregation · 32 <<= · 112 aliasing · 97; and String · 504; arrays · 194 <= · 103 Allison, Chuck · 4, 18, 1449, 1460 allocate( ) · 948 allocateDirect( ) · 948 = alphabetic sorting · 418 alphabetic vs. lexicographic sorting · 783 == · 103 AND: bitwise · 120; logical (&&) · 105 annotation · 1059; apt processing tool · 1074; default element values · 1062, 1063, 1065; default value · 1069; > elements · 1061; elements, allowed types for · 1065; marker annotation · 1061; > · 103 processor · 1064; processor based on >= · 103 reflection
    [Show full text]
  • Before We Begin, All Sources Were Accessed on 29 April 2016
    Types Before we begin, all sources were accessed on 29 April 2016. Type - to “write (something) on a typewriter or computer by pressing the keys.” — Google “define type”. Nah, just kidding, this is not the definition we are talking about. “a category of people or things having common characteristics.” — Google “define type” “a label given to a group sharing common characteristics” — my refined definition, building from above. More specifically, for our purposes, “[a] set of values from which a variable, constant, function, or other expression may take its value.” — Free On-Line Dictionary Of Computing “Assigning a data type—typing—gives meaning to a sequences of bits such as a value in memory or some object such as a variable. The hardware of a general purpose computer is unable to discriminate between for example a memory address and an instruction code, or between a character, an integer, or a floating-point number, because it makes no intrinsic distinction between any of the possible values that a sequence of bits might mean.” — Wikipedia, Type system#Fundamentals Pop quiz Static vs. Dynamic Strong vs. Weak Where does Ruby fall? Java? JavaScript? Python? Haskell? C? Strong Weak Haskell Static C Java Ruby Dynamic JavaScript Python Static typing – “[e]nforcement of type rules at compile time” — Free On-Line Dictionary Of Computing Dynamic typing – “[e]nforcement of type rules at run time” — Free On-Line Dictionary Of Computing Latent typing – “types are associated with values and not variables. […] This typically requires run-time type checking and so is commonly used synonymously with dynamic typing.” — Wikipedia, Latent typing Strong typing – no commonly agreed upon definition — C2 Wiki, Strongly Typed Some attempted definitions I find useful: “[a] language is strongly typed if there are checks for type constraint violations.
    [Show full text]
  • Essentials of Programming Languages , Second Edition
    Essentials of Programming Languages second edition This page intentionally left blank. Essentials of Programming Languages second edition Daniel P. FriedmanMitchell WandChristopher T. Haynes © 2001 Massachusetts Institute of TechnologyAll rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. Typeset by the authors using .Printed and bound in the United States of America. Library of Congress Cataloging-in- Publication Information DataFriedman, Daniel P. Essentials of programming languages / Daniel P. Friedman, Mitchell Wand, Christopher T. Haynes —2nd ed. p. cm. Includes bibliographical references and index. ISBN 0-262-06217- 8 (hc. : alk. paper) 1. Programming Languages (Elecronic computers). I. Wand, Mitchell. II. Haynes, Christopher Thomas. III. Title.QA76.7. F73 2001005.13—dc21 00-135246 Contents Foreword vii Preface xi Acknowledgments xvii 1 Inductive Sets of Data 1 1.1 Recursively Specified Data 1 1.2 Recursively Specified Programs 9 1.3 Scoping and Binding of Variables 28 2 Data Abstraction 39 2.1 Specifying Data via Interfaces 39 2.2 An Abstraction for Inductive Data Types 42 2.3 Representation Strategies for Data Types 55 2.4 A Queue Abstraction 66 3 Environment-Passing Interpreters 69 3.1 A Simple Interpreter 71 3.2 The Front End 75 3.3 Conditional Evaluation 80 3.4 Local Binding 81 3.5 Procedures 84 3.6 Recursion 92 3.7 Variable Assignment 98 3.8 Parameter-Passing
    [Show full text]
  • A Management Information System (MIS) Is a System Or Process That Provides Information Needed to Manage Organizations Effectively [1]
    A management information system (MIS) is a system or process that provides information needed to manage organizations effectively [1]. Management information systems are regarded to be a subset of the overall internal controls procedures in a business, which cover the application of people, documents, technologies, and procedures used by management accountants to solve business problems such as costing a product, service or a business-wide strategy. Definition: Management Information Systems (MIS) is the term given to the discipline focused on the integration of computer systems with the aims and objectives on an organisation. The development and management of information technology tools assists executives and the general workforce in performing any tasks related to the processing of information. MIS and business systems are especially useful in the collation of business data and the production of reports to be used as tools for decision making. Applications of MIS With computers being as ubiquitous as they are today, there's hardly any large business that does not rely extensively on their IT systems. However, there are several specific fields in which MIS has become invaluable. * Strategy Support While computers cannot create business strategies by themselves they can assist management in understanding the effects of their strategies, and help enable effective decision-making. MIS systems can be used to transform data into information useful for decision making. Computers can provide financial statements and performance reports to assist in the planning, monitoring and implementation of strategy. MIS systems provide a valuable function in that they can collate into coherent reports unmanageable volumes of data that would otherwise be broadly useless to decision makers.
    [Show full text]