The Hiphop Compiler for PHP
Total Page:16
File Type:pdf, Size:1020Kb
The HipHop Compiler for PHP Haiping Zhao Iain Proctor Minghui Yang Xin Qi Mark Williams Qi Gao Guilherme Ottoni ∗ Andrew Paroski Scott MacVicar Jason Evans Stephen Tu † Facebook, Inc. Abstract 1. Introduction Scripting languages are widely used to quickly accomplish General-purpose scripting languages like Perl, Python, PHP, a variety of tasks because of the high productivity they en- Ruby, and Lua have been widely used to quickly accom- able. Among other reasons, this increased productivity re- plish a broad variety of tasks. Although there are several sults from a combination of extensive libraries, fast devel- reasons for the widespread adoption of these languages, the opment cycle, dynamic typing, and polymorphism. The dy- key common factor across all of them is the high program- namic features of scripting languages are traditionally asso- mers’ productivity that they enable [14]. This high produc- ciated with interpreters, which is the approach used to imple- tivity is the result of a number of factors. First, these lan- ment most scripting languages. Although easy to implement, guages typically provide a rich set of libraries, which re- interpreters are generally slow, which makes scripting lan- duce the amount of code that programmers need to write, guages prohibitive for implementing large, CPU-intensive test, and debug. Second, the dynamically-typed nature of applications. This efficiency problem is particularly impor- scripting languages provides flexibility and high degrees of tant for PHP given that it is the most commonly used lan- dynamic polymorphism. Last, and most important, dynamic guage for server-side web development. languages are in general purely interpreted, thus providing This paper presents the design, implementation, and an a fast cycle to modify and test source changes. By “purely evaluation of the HipHop compiler for PHP. HipHop goes interpreted” we mean that there is no compilation step nec- against the standard practice and implements a very dy- essary to run a program after writing or modifying it. This namic language through static compilation. After describ- advantage is particularly relevant in the development of very ing the most challenging PHP features to support through large systems, which typically take substantial time to stati- static compilation, this paper presents HipHop’s design and cally compile and link. techniques that support almost all PHP features. We then On the down side, the major issue with scripting lan- present a thorough evaluation of HipHop running both stan- guages in general is performance. Because of their dynamic dard benchmarks and the Facebook web site. Overall, our features, these languages are typically implemented via in- experiments demonstrate that HipHop is about 5.5× faster terpreters. Experiments have shown that programs tend to than standard, interpreted PHP engines. As a result, HipHop take an order of magnitude more CPU time to run when im- has reduced the number of servers needed to run Facebook plemented in interpreted languages compared to correspond- and other web sites by a factor between 4 and 6, thus drasti- ing implementations in compiled languages [22]. cally cutting operating costs. Although performance is often not important for small Categories and Subject Descriptors D.3.4 [Programming scripts written to accomplish simple tasks, it does become a Languages]: Processors – compilers, optimization, code major problem when developing large CPU-intensive appli- generation cations. This has spurred a number of efforts to try to im- General Terms Languages, Design, Performance prove the performance of scripting languages [3, 8, 10, 19, Keywords PHP, dynamic languages, compilation, C++ 21, 24]. Among all general-purpose scripting languages, this ∗ Contact author. E-mail address: [email protected] problem is probably most pronounced for PHP, since it is the † Currently affiliated with MIT. most commonly used language for server-side web develop- ment [23]. Given the scale of users and data that Internet Permission to make digital or hard copies of all or part of this work for personal or companies have, improving server efficiency is a key factor classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation to reduce their operating costs incurred by running huge data on the first page. To copy otherwise, to republish, to post on servers or to redistribute centers. to lists, requires prior specific permission and/or a fee. OOPSLA’12, October 19–26, 2012, Tucson, Arizona, USA. Copyright c 2012 ACM 978-1-4503-1561-6/12/10. .. $10.00 This paper describes the approach that we took at Face- 01) <?php book to address the performance problem of PHP. Our main 02) insight to improve performance was to implement PHP via 03) define("confName", "OOPSLA"); a static compiler, called HipHop. Although static compila- 04) define("firstYear", 1986); 05) tion requires all the code to be known in advance, this is a 06) function year($edition) { minor imposition for server-side applications, which are the 07) return firstYear - 1 + $edition; core usage of PHP. Going against the standard practice of 08) } implementing PHP via interpreters, HipHop has proved to 09) be a high risk, high reward project. As this paper describes, 10) echo "Hello ". confName . "’" . year(27); there are a number of challenges to enable static compila- tion of a dynamic language like PHP. Despite these chal- Figure 1. PHP code example lenges, HipHop supports a very large subset of PHP, and it only lacks a few, less useful language features. HipHop is a source-to-source compiler, and it transforms PHP programs 2. PHP Background into semantically equivalent C++ programs. This design en- ables leveraging high-quality optimizing C++ compilers to PHP is an imperative scripting language, originally con- produce machine code, and we currently use GCC [9]. To ceived in 1995 for creating dynamic web pages. After its ini- obtain high performance, in addition to leveraging GCC op- tial definition, PHP has evolved significantly. Since version timizations, HipHop also implements a number of optimiza- 3, PHP is an object-oriented programming language and to- tions. As our experiments show, this approach has enabled day, at version 5.4, it is a general-purpose language used be- HipHop to run both benchmark and real-world applications yond web development. Like Java, PHP combines interfaces 5.5× faster on average than the standard PHP implementa- and single inheritance. However, since version 5.4, PHP also tion [25]. Finally, although HipHop only supports PHP, we supports traits [5]. In the absence of a formal specification, believe that many of its techniques can be used to improve the language is defined by its standard implementation [25]. the performance of other scripting languages as well. Although PHP syntactically resembles C++, PHP is in- HipHop has been released by Facebook as an open-source herently much more dynamic. Figure 1 illustrates a simple project [6], and it is currently deployed not only within PHP example, which prints “Hello OOPSLA’2012”. The Facebook but also in other companies. In all these cases, defines in lines 3 and 4 are calls to a built-in function that HipHop has reduced the number of servers needed to run declares a constant named after its first argument and with web sites by a factor between 4 and 6, thus drastically cutting value given by its second argument. In line 10, echo prints operating costs. to the output and ‘.’ is string concatenation. Syntactically, This paper describes our experience in the design and some key differences from C++ are: (1) names of variables implementation of the HipHop compiler. More specifically, start with ‘$’, and (2) variables are not declared. this paper makes the following main contributions: Given that C++ well represents the class of static, high- performing languages, while PHP is representative of the much worse performing scripting languages1, it is worth 1. It describes the set of techniques that enabled implement- pointing out the main differences between the two lan- ing a general-purpose, dynamic scripting language with a guages. After describing these differences in Section 2.1, static compiler. Section 2.2 briefly introduces the approach used in PHP’s 2. It describes the set of compilation techniques that enable standard implementation. This brief overview provides key HipHop to achieve outstanding performance for PHP. insights into the performance gap between PHP and C++. 3. It presents a thorough experimental evaluation of HipHop 2.1 Comparison between PHP and C++ running both benchmarks and real-world applications Table 1 summarizes the key language features that are more from Facebook. This evaluation includes comparisons general in PHP than in C++. In general, these PHP features with other PHP execution engines. incur significant runtime costs, and their more restricted forms in C++ enable more efficient execution. Below, we The rest of this paper is organized as follows. Section 2 briefly discuss and illustrate each of these features. This gives a brief introduction to PHP, its key dynamic features, discussion not only gives insights into the performance gap and its standard implementation. Section 3 then presents the between the two languages, but it also demonstrates the design of the HipHop compiler and its main techniques. Fol- biggest challenges that HipHop faces to translate PHP code lowing that, Section 4 presents a detailed experimental eval- into C++. uation of HipHop and compares it to other PHP execution engines. Finally, Section 5 discusses related work and Sec- 1 The performance gap is 36.8× for measured benchmarks using Zend tion 6 concludes the paper. PHP [22]. Feature PHP C++ to oo. In the last statement, this example creates an object of types dynamic static class oo. function and class name binding dynamic static The following example illustrates dynamic properties: dynamic name resolution yes no class C{ dynamic symbol inspection yes no public $declProp = 1; reflection yes no } dynamic code evaluation (eval) yes no $obj = new C; Table 1.