Optimizing PHP Bytecode Using Type-Inferred SSA Form
Total Page:16
File Type:pdf, Size:1020Kb
Bachelor Thesis Optimizing PHP Bytecode using Type-Inferred SSA Form Nikita Popov Matriculation Number: 347863 Technische Universität Berlin Faculty IV · Electrical Engineering and Computer Science Department of Computer Engineering and Microelectronics Embedded Systems Architectures (AES) Einsteinufer 17 · D-10587 Berlin A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science according to the examination regulations at the Technische Uni- versität Berlin for the Bachelor in Computer Science of May 28, 2014. Department of Computer Engineering and Microelectronics Embedded Systems Architectures (AES) Technische Universität Berlin Berlin Author Nikita Popov Thesis period February 16, 2016 to July 5, 2016 Referees Prof. Dr. B. Juurlink, Embedded Systems Architectures (AES) Prof. Dr. S. Glesner, Software Engineering for Embedded Sys- tems Supervisor Dr. B. Cosenza, Embedded Systems Architectures (AES) Declaration According to §46(8) of the Examination Regulations I hereby confirm to have written the following thesis on my own, not having used any other sources or resources than those listed. Berlin, July 4, 2016 Nikita Popov Abstract PHP is a dynamically typed programming language, which is commonly used for the server-side implementation of web applications. As such its performance is often critical to the response time, throughput and resource utilization of such applications. This thesis aims to reduce runtime overhead by applying classical data-flow optimiza- tions to the PHP bytecode in static single assignment form. Type inference is used both to enable the use of type-specialized instructions and to ensure the correctness of other optimizations, which are commonly only applicable to certain types. Next to type- specialization, we also implement flow-sensitive constant propagation, dead code elimina- tion and copy propagation. Additionally, inlining is used to increase the applicability of other optimizations. The main challenge is to reconcile classical compiler optimizations, that have been developed in the context of statically typed and compiled languages, with a programming language that is not only dynamically and weakly typed, but also supports a plethora of other dynamic language features. This requires a careful analysis of language semantics and modification of standard algorithms to support them. Our approach results in significant performance gains for numerically intensive and tightly looped code, as is typically found in benchmark scripts. We achieve a mean speedup of 1.42× on PHP’s own benchmark suit. However, when considering real appli- cations we have found the speedup to be limited to 1-2%. v Zusammenfassung PHP ist eine dynamisch typisierte Programmiersprache, welche häufig für die Server- seitige Implementierung von Web-Applikationen genutzt wird. Aus diesem Grund ist die Effizienz der PHP-Implementierung kritisch für die Ausführungszeit, den Durchsatz und den Ressourcen-Verbrauch derartiger Applikationen. Ziel dieser Bachelorarbeit ist es die Performanz der PHP-Implementierung zu ver- bessern, indem klassische Datenfluss-Optimierungsmethoden auf den PHP Bytecode in Single-Static-Assignment Form angewendet werden. Typ-Inferenz wird genutzt, sowohl um die Nutzung von Typ-spezialisierten Instruktionen zu ermöglichen, als auch um die Korrektheit anderer Optimierungen sicherzustellen, welche oftmals nicht auf alle Typen anwendbar sind. Neben Typ-Spezialisierung wurde auch Kontrollfluss-sensitive Propaga- tion von Konstanten, Elimination von totem Code, sowie Propagation von Kopien umge- setzt. Zusätzlich wird durch Inline-Ersetzung von Funktionen die Anwendbarkeit anderer Optimierungen erhöht. Hierbei ist die hauptsächliche Herausforderung, dass klassische Compiler-Optimierungs- methoden im Kontext von statisch typisierten und kompilierten Sprachen entwickeln wur- den, während PHP nicht nur dynamisch und schwach typisiert ist, sondern auch eine Vielzahl anderer dynamischer Sprachelemente unterstützt. Dies erfordert eine sorgfälti- ge Analyse der Sprachsemantik und Anpassung von Standard-Algorithmen, um diese zu unterstützen. Unsere Herangehensweise führt zu signifikanten Verbesserungen in der Ausführungszeit von numerisch intensivem Code, wie er typischerweise in Benchmarks gefunden wird. In PHP’s eigener Benchmark-Suite verringert sich die Ausführungszeit im Durchschnitt um einen Faktor von 1,42. Für realistische Applikationen begrenzt sich die Verbesserung jedoch auf 1-2%. vi Contents List of Figures ix List of Listings x 1. Introduction 1 1.1. Historical context . .2 1.2. Prior work . .2 1.3. Attribution . .5 1.4. Outline . .5 2. PHP Language Semantics 7 2.1. Dynamic and weak typing . .7 2.2. References . .8 2.3. The use-def nature of assignments . .9 2.4. Dynamic scope introspection and modification . 10 2.5. Error handling . 11 2.6. Global variables and the pseudo-main scope . 12 2.7. $this binding in methods . 13 2.8. Type annotations . 14 3. Prerequisites 16 3.1. Compilation and execution pipeline . 16 3.2. Instruction format . 17 3.3. Control flow graph . 18 3.4. Dominance, dominator trees and dominance frontiers . 19 3.5. Live-variable analysis . 21 3.6. Static single assignment form . 21 3.6.1. Motivation . 21 3.6.2. SSA properties: Minimal, pruned, strict . 23 3.6.3. Construction of SSA form . 24 3.6.4. Specifics of SSA form in PHP . 25 3.6.5. Extended SSA form: Pi nodes . 27 3.6.6. Phi placement after pi placement . 29 4. Analysis and Optimization 31 4.1. Sparse conditional propagation of data-flow properties . 31 4.1.1. Requirements . 31 vii Contents 4.1.2. Algorithm . 32 4.1.3. Properties . 34 4.2. Type inference . 35 4.2.1. Type lattice . 36 4.2.2. Join operator and transfer function . 37 4.2.3. Flow-sensitivity: Feasible successors . 39 4.2.4. Flow-sensitivity: Pi type constraints . 40 4.2.5. Type narrowing . 41 4.3. Constant propagation . 44 4.3.1. Constant propagation lattice . 45 4.3.2. Transfer function and feasible successors . 45 4.3.3. Specifics of constant propagation in PHP . 46 4.3.4. Combining type inference and constant propagation . 49 4.4. Dead code elimination . 50 4.4.1. Algorithm . 50 4.4.2. PHP specific considerations . 51 4.5. Type specialization . 52 4.6. SSA liveness checks . 53 4.7. Copy propagation on conventional SSA form . 56 4.8. Function inlining . 58 4.9. Propagating information along the dominator tree . 59 4.10. Testing and verification . 60 5. Results 61 5.1. Microbenchmarks . 61 5.2. Real applications . 63 6. Conclusion and Outlook 65 A. Source Code 67 Acronyms 68 Bibliography 69 viii List of Figures 3.1. PHP compilation and execution pipeline . 17 3.2. Control flow graph orderings . 19 3.3. Dominator trees . 20 3.4. Motivation for SSA form . 22 3.5. Strict SSA form . 23 3.6. SSA form with assignments treated as uses . 26 3.7. Motivational example for extended SSA form . 27 3.8. φ-placement after π-placement . 29 4.1. Motivating example for type narrowing . 42 4.2. Constant propagation lattice . 45 4.3. Handling of references during constant propagation . 47 4.4. Related variable interference after copy propagation . 56 5.1. Normalized execution times for microbenchmarks . 62 5.2. Effect of indiviual optimizations on microbenchmarks . 63 ix List of Listings 1. Main loop of propagation framework . 33 2. Handling of individual instructions in the propagation framework . 34 3. Marking edges as feasible in propagation framework . 35 4. Main component of type narrowing algorithm . 43 5. Dead code elimination algorithm . 50 6. Instruction granularity live-in oracle using Boissinot’s algorithm . 56 x 1. Introduction Dynamic scripting languages are commonly chosen over classic statically typed languages, because their use of dynamic typing and lack of an explicit compilation step enables a higher degree of productivity. Unfortunately, the same features that make these languages productive, also make them hard to implement efficiently: In order to support their dynamic features, scripting languages are traditionally implemented using interpreters. An increasingly common avenue used to improve the performance of such languages, is the employment of a just-in-time (JIT) compiler, which generates native machine code at runtime. However, the implementation of JIT compilers is not only a major feat of engineering, but also carries a significant increase in implementation complexity. This thesis pursues a different approach, namely the use of classical data-flow optimization techniques to improve the quality of the interpreted bytecode. We implement our performance optimizations for the PHP programming language, which is commonly used for the server-side implementation of web applications. PHP powers both some of the largest websites such as Facebook1, Wikipedia and Yahoo, but also countless small websites, like personal blogs, discussion forums, etc. End-to-end web application performance depends on more factors than only the per- formance of the server-side programming language, in particular it also includes net- work transmissions, processing of database queries and client-side rendering. Nonetheless PHP’s performance plays an important role in determining the response time, throughput and resource utilization of web applications. Our approach to optimization is to reduce runtime overhead by applying classical data- flow optimizations, such as constant propagation, dead code elimination and copy prop- agation, to the PHP bytecode in static single assignment (SSA) form. Type inference is used both to enable the use of type-specialized