Static Optimization in PHP 7
Total Page:16
File Type:pdf, Size:1020Kb
Static Optimization in PHP 7 Nikita Popov Biagio Cosenza Ben Juurlink Dmitry Stogov Technische Universitat¨ Berlin, Germany Zend Technologies, Russia [email protected], fcosenza, [email protected] [email protected] Abstract In order to support its more dynamic features, PHP, like many PHP is a dynamically typed programming language commonly other scripting languages, has traditionally been implemented using used for the server-side implementation of web applications. Ap- an interpreter. While this provides a relatively simple and portable proachability and ease of deployment have made PHP one of the implementation, interpretation is notoriously slower than the exe- most widely used scripting languages for the web, powering im- cution of native code. For this reason, an increasingly common av- portant web applications such as WordPress, Wikipedia, and Face- enue to improving the performance of dynamic languages is the book. PHP’s highly dynamic nature, while providing useful lan- implementation of just-in-time (JIT) compilers [2], such as the guage features, also makes it hard to optimize statically. HHVM compiler for PHP [3]. On the other hand, JIT compilers This paper reports on the implementation of purely static byte- carry a large cost in terms of implementation complexity. code optimizations for PHP 7, the last major version of PHP. We In this work, we pursue a different approach: purely static, trans- discuss the challenge of integrating classical compiler optimiza- parent, bytecode-level optimization. By this we mean that a) run- tions, which have been developed in the context of statically-typed time feedback is not used in any form, b) no modification to the languages, into a programming language that is dynamically and virtual machine or other runtime components is required and c) op- weakly typed, and supports a plethora of dynamic language fea- timizations occur on the bytecode of the reference PHP implemen- tures. Based on a careful analysis of language semantics, we adapt tation. The latter point implies that, unlike many alternative PHP static single assignment (SSA) form for use in PHP. Combined with implementations, we must support the full scope of the language, type inference, this allows type-based specialization of instructions, including little used and hard to optimize features. as well as the application of various classical SSA-enabled com- This static approach is motivated by the PHP execution model, piler optimizations such as constant propagation or dead code elim- which uses multiple processes to serve short-running requests ination. based on a common shared memory bytecode cache. As this makes We evaluate the impact of the proposed static optimizations on runtime bytecode updates problematic, many dynamic optimization a wide collection of programs, including micro-benchmarks, li- methods become inapplicable or less efficient. We pursue interpre- braries and web frameworks. Despite the dynamic nature of PHP, tative optimizations partly due to the success of PHP 7, whose our approach achieves an average speedup of 50% on micro- optimized interpreter implementation performs within 20% of the benchmarks, 13% on computationally intensive libraries, as well HHVM JIT compiler for many typical web applications [4]. as 1.1% (MediaWiki) and 3.5% (WordPress) on web applications. Our optimization infrastructure is based on static single assign- ment (SSA) form [5] and makes use of type inference, both to en- Categories and Subject Descriptors D.3.4 [Programming Lan- able type-based instruction specialization and to support a range of guages]: Processors—Compilers; D.3.4 [Programming Languages]: classical SSA-based optimizations. Because PHP is dynamically Processors—Optimization typed and supports many dynamic language features such as scope introspection, the application of classical data-flow optimizations, Keywords PHP, static optimization, SSA form which have been developed in the context of statically typed lan- guages, is challenging. This requires a careful analysis of problem- 1. Introduction atic language semantics and some adaptations to SSA form and the In order to keep pace with the rapidly increasing growth of the used optimization algorithms. Web, web application development predominantly favors the use of Parts of the described optimization infrastructure will be part of scripting languages, whose increased productivity due to dynamic PHP 7.1. Our main contributions are: typing and an interactive development workflow is valued over the better performance of compiled languages. 1. A new approach to introducing SSA form into the PHP lan- PHP is one the most popular [1] scripting languages used for the guage, including adaptation for special assignment semantics server-side implementation of web applications. It powers some of and enhancement of type inference using π-nodes. the largest websites such as Facebook, Wikipedia and Yahoo, but 2. The implementation and analysis of a wide range of SSA- also countless small websites like personal blogs. enabled optimizations for a dynamic language. Permission to make digital or hard copies of all or part of this work for personal or 3. An experimental evaluation on a collection of micro-bench- classroom use is granted without fee provided that copies are not made or distributed marks, libraries and applications, including WordPress and for profit or commercial advantage and that copies bear this notice and the full citation MediaWiki. on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a The remainder of the paper is structured as follows: section 2 fee. Request permissions from [email protected]. describes related work on dynamic language optimization. Sec- CC’17, February 5–6, 2017, Austin, TX, USA tion 3 presents relevant PHP language semantics and section 4 dis- c 2017 ACM. 978-1-4503-5233-8/17/02...$15.00 cusses the use of SSA form in PHP. SSA-enabled static optimiza- http://dx.doi.org/10.1145/3033019.3033026 tions investigated in this work are described in section 5. An experi- 65 mental evaluation on micro-benchmarks, libraries and applications representation is used, instead all operations are performed on is presented in section 6. The paper closes with a discussion and the AST level. HPHPc does not support some of PHP’s dynamic conclusion in sections 7 and 8. language features and requires all code to be known in advance. The phc compiler [30] also translates PHP to C. A large focus of 2. Related Work the phc implementation is on accurately modeling the aliasing be- havior of references. To achieve this, flow- and context-sensitive SSA Static single assignment form [5] has become the preferred alias analysis, type inference and constant propagation are per- intermediate representation for program analysis and optimizing formed simultaneously and prior to construction of Hashed SSA code transformations, and is used by many modern optimizing form. In our work we will largely ignore this aspect, because accu- compilers [6–8]. Data-flow algorithms are often simpler to imple- rate handling of references has become much less important after ment, more precise and more performant when implemented on PHP 5.4 removed support for call-time pass-by-reference. Addi- SSA form. Typical examples include sparse conditional constant tionally, issues that will be discussed in section 3.5 effectively pre- propagation [9] and global value numbering [10]. More recently, vent this kind of analysis if PHP’s error handling model is fully SSA form has become of interest for compiler backends as well, supported. because the chordality of the SSA inference graph simplifies regis- A number of alternative PHP implementations leverage existing ter allocation [11]. JIT implementations. Phalanger [31] and its successor Peachpie Specific applications often require or benefit from extensions [32] target the .NET CLR, while Quercus [33] and JPHP [34] target of the basic SSA paradigm. Array SSA form [12] modifies SSA the JVM. HippyVM [35] uses the RPython toolchain. While many to capture precise element-level data-flow information for arrays of these projects report improvements over PHP 5, they cannot for use in parallelization. Hashed SSA form [13] extends SSA to achieve the same level of performance as a special-purpose JIT handle aliasing, by introducing additional µ (may-use) and χ (may- compiler such as HHVM. define) nodes. The ABCD algorithm [14] introduces π-nodes to improve the accuracy of value range inference. In this work, we further extend this idea for use in type inference. 3. Optimization Constraints A focus of recent research has been on the formal verification PHP supports a number of language features that complicate static of SSA-based optimizations [15–17], as well as SSA construction analysis. In the following, we discuss how they affect optimization [18], and destruction [19]. and also justify why we consider certain optimization approaches to be presently impractical. Some of the mentioned issues apply to Dynamic language optimization Many different approaches to many scripting languages (dynamic typing), while others are PHP improving the performance of traditionally interpreted dynamic specific (references). As we operate on the bytecode of the ref- languages have been investigated. The most successful in terms of erence PHP implementation, a few implementation-specific con- raw performance are JIT compilers [2]. straints are also covered. Another avenue is the translation of code to a lower-level lan- While the following discussion primarily deals with features guage. For example, the Starkiller project [20] translates Python that inhibit optimization, there are also two properties of the PHP code to C++, using an augmented Cartesian product algorithm [21] language that make it more amenable to static optimization than for type inference. However, this approach is often not able to sup- many other scripting languages: First, PHP has a strictly separated port all language semantics.