Compiling Dynamic Languages

Compiling Dynamic Languages Anders Schlichtkrull, Rasmus T. Tjalk-Bøggild Kongens Lyngby 2013 Compute-B.Sc.-2013-15 Technical University of Denmark Department of Applied Mathematics and Computer Science Matematiktorvet, Building 303 B, DK-2800 Kongens Lyngby, Denmark Phone +45 45253031 [email protected] http://www.compute.dtu.dk/ Compute-B.Sc.-2013-15 Abstract The purpose of this thesis is to examine if ahead-of-time compilation is a viable solution for executing programs written in Dynamic Languages when compared to interpretation and just-in-time compilation, and to find out how such a solution can be made. We achieve this by first describing and classifying Dynamic Languages in terms of their type systems and programming concepts. Specifically we will show how the Dynamic Languages differ from Statically Typed Languages. We then build an ahead-of-time compiler, called the project compiler, for a subset of JavaScript. The subset is large enough to constitute a Dynamic Language and large enough to run an industry standard benchmark. The ahead-of-time compiler can also do optimizations. Furthermore, we benchmark this compiler implementation and compare it to the benchmark results for two contemporary JavaScript implementations: Rhino which is a JavaScript interpreter and compiler, and V8 which is a JavaScript just-in-time compiler. We also profile the implementation to measure the effect of the implemented optimizations. Based on these results we find that the performance of the project compiler is better than that of the interpreter, but the performance is worse than that of the two other compilers. The performance of the project compiler is influenced negatively by its implementation of JavaScript values in memory. We also find that the implemented optimizations give significant performance benefits for simple examples, but do not improve the performance significantly when running more complex benchmarks. We conclude that ahead-of-time compilation is a viable alternative to interpre- ii tation of dynamic languages, but based on the collected results we are unable to make any generalizations with regards to ahead-of-time compilation compared to just-in-time compilation. Contribution of work We have both contributed equal amounts of work to the project. Acknowledgements We would like to thank our supervisors Christian W. Probst and Sven Karlsson for the chance to work on this project as well for their valuable input and feedback. We would also like to thank our families for their understanding and support throughout the project period. iv Contents Abstract i Acknowledgements iii 1 Introduction 1 2 Preliminaries and theory background 3 2.1 Dictionary . .3 2.2 Overview of Dynamic Languages . .6 2.2.1 Included languages . .6 2.2.2 JavaScript . .6 2.2.3 Python . .7 2.2.4 Ruby . .8 2.2.5 Perl . .8 2.2.6 PHP . .9 2.3 Common characteristics for Dynamic Languages . .9 2.4 A typical Compiler Design . 10 2.5 Special requirements for Dynamic Languages . 12 2.5.1 Delayed type checking . 12 2.5.2 Type coercion . 12 2.5.3 Closures . 12 2.5.4 Build-in data structures . 13 2.5.5 Garbage Collection . 13 2.5.6 Context . 13 2.6 Related works . 14 3 Analysis of the problem 17 3.1 Choice of output language . 17 3.2 Representing nested functions . 18 vi CONTENTS 3.3 Representing values and variables . 20 3.4 Representing objects . 21 3.5 Meta programming issues . 22 3.6 Garbage Collection . 23 3.7 Possible optimizations . 24 4 Implementation work 25 4.1 Compiler structure . 25 4.1.1 Phases included in the project compiler . 26 4.1.2 Lexing and parsing . 26 4.1.3 Context handling . 29 4.1.4 Intermediate code generation . 34 4.1.5 Translating TAC to IR . 40 4.1.6 Output code generation . 40 4.1.7 Build native executable . 40 4.1.8 Environments and values . 40 4.1.9 Operators . 44 4.1.10 Garbage Collector . 45 4.2 Implementation of optimizations . 46 4.3 Limitations of our implementation . 55 4.3.1 Lex and Parse level . 56 4.3.2 IR generation level . 56 4.4 Verification of the compiler . 64 4.5 Description of the tool . 65 5 Testing and performance analysis 67 5.1 Benchmark Design . 67 5.1.1 Comparison to contemporary implementations . 68 5.1.2 Measure the effect of optimizations . 68 5.2 Benchmark setup . 68 5.2.1 Chosen Benchmarks . 68 5.2.2 Benchmarking against contemporary implementations . 71 5.2.3 Measurements . 72 5.2.4 Profiling the optimizations we have made . 73 5.2.5 Machine . 73 5.2.6 Limitations of the benchmark setup . 74 5.3 Benchmark results . 77 5.3.1 Profiling the project compiler . 77 5.3.2 Results of optimizations . 82 5.3.3 Performance comparison between the project compiler and NodeJs . 83 5.3.4 Performance comparison between the project compiler and RingoJs . 89 5.3.5 Compile timings . 92 CONTENTS vii 5.4 The project compiler performance bottlenecks . 95 5.4.1 Object properties . 96 5.4.2 Garbage collection . 97 5.4.3 Run-time support . 99 6 Discussion 101 6.1 Compiler Implementation . 101 6.1.1 Compiler design . 102 6.2 Just-in-time compilation vs Ahead-of-time compilation in Dy- namic Languages . 102 6.2.1 Improving the project implementation . 103 6.2.2 Testing just-in-time vs ahead-of-time compilation models 104 7 Conclusions 105 7.1 Future work . 106 7.1.1 Performance improvements . 106 7.1.2 Compile timings . 107 A Appendix A - Benefits of Dynamic Languages 109 A.1 Primary arguments for the benefits of Dynamic Languages . 109 A.2 Examples of problems suitable for Dynamic Languages . 111 A.3 Context . 113 B Appendix B - Run-time Data 115 C Appendix C - Memory usage Data 125 D Appendix D - Compile time data 129 E Appendix E - Profiling data 131 F Appendix F - NodeJs data 141 F.1 NodeJs optimization flag scores . 141 F.2 NodeJs Garbage Collection data for DeltaBlue . 143 F.3 NodeJs profiling data for the DeltaBlue benchmark . 143 F.4 NodeJs assembly output . 148 G Appendix G - RingoJs compiler output 153 Bibliography 159 viii CONTENTS Chapter 1 Introduction Dynamic languages, and in particular JavaScript, are among the most widely used programming languages [Wel13] [Sof13] [Git13]. The dynamic languages have many features considered beneficial to the programmer such as a type system that does not impose restrictions on the types at compile time, easy access to data structures and concepts such as meta programming, higher-order functions and object oriented programming. Furthermore, the languages are used in many different contexts including web services, game scripting and rapid prototyping [OZPSG10]. Classically, dynamic languages have been interpreted, that is, evaluated at run- time by an interpreter program. In recent years, dynamic languages have also been complied, that is, translated to another language such as native code before being executed. In particular, the compilation model for translating the program to native machine code, known as just-in-time compilation has been used. In this model the compilation for a section of the program is performed just before it is used. This allows for faster start-up times than compiling the entire program ahead of time when a compiled version of the program is not available. These approaches especially make sense for executing JavaScript when used for client side web-programming, where it is important that the execution can start immediately. However, Dynamic languages, including Javascript, are also used on servers, 2 Introduction where the same code is run many times. For this purpose it would make sense to do the translation to native code only once, so that the run-time is only used on the actual execution of the program code. This technique is called ahead-of-time compilation. The purpose of the project is to build an ahead-of-time compiler for the dynamic programming language JavaScript and measure the performance of this compiler against contemporary implementations. Specifically, the project will examine if ahead-of-time compilation is a viable solution when compared to both interpretation and just-in-time compilation. To do this, an overview of the major contemporary dynamic languages is given first. The languages are introduced with a specific focus on how they differ from statically typed languages as well as the special requirements that the dynamic languages have during compilation and run-time. JavaScript is then placed in this framework to describe how it fits among the dynamic languages. With JavaScript as the base, the major moving parts of a compiler is then analysed with a focus on how specific dynamic aspects of the language are translated. The actual implementation of these aspects in the project compiler is then de- scribed as well as how the optimization of the code is performed. Finally, the project compiler is benchmarked. Firstly, to measure the performance against contemporary implementations (both a JavaScript interpreter, a JavaScript ahead-of-time compiler and a JavaScript just-in-time compiler). Secondly, to measure the effect of the optimizations implemented. Based on these benchmarks we will evaluate the viability of the compiler when compared to contemporary JavaScript implementations. Chapter 2 Preliminaries and theory background This chapter will provide the theoretical background needed to understand the report. The central terms of the report will be defined in a dictionary. Fur- thermore it will give an overview of the subject of Dynamic Languages and compiling. 2.1 Dictionary The purpose of the dictionary is to establish unambiguous definitions of the central terms in this paper, to avoid the confusion sometimes associated with some of the terms.

Load more