
Compiling dynamic languages via typed functional languages Raj Bandyopadhyay Walid Taha Rice University Rice University [email protected] [email protected] Abstract Writing a good native code compiler from scratch for each Dynamic languages enable rapid prototyping, while statically typed dynamic language is hard. In addition to language and platform- languages offer early error-detection and efficient execution. As specific idiosyncrasies, dynamic languages present several chal- a result, the usual practice in software development is to build a lenges that make them harder to compile compared to statically prototype in a dynamic language and then rewrite the application typed languages. These include automatic memory management in C or Fortran for high performance. Our thesis is that this costly and the lack of static typing information. In addition, many com- rewriting step can be avoided if we have good native code compilers mon scripting languages such as Python, Perl and R have evolving for dynamic languages. semantics, or the language semantics is not well-defined enough To overcome the difficulties in building good native code com- for formal reasoning, making the compiler developer’s task even pilers from scratch, we propose that dynamic languages can be harder. compiled into high-performance native code executables by trans- To overcome these difficulties, we propose to develop good lating to typed functional languages and reusing existing functional native code compilers for dynamic languages by translating them language compilers. We demonstrate this approach by compiling to a strongly typed functional languages as intermediate. Using a popular dynamic language, Python, by translating it to OCaml, typed functional languages allows us to utilize the large body of a strongly typed functional language. On performing a compara- existing infrastructure developed by the functional programming tive evaluation against several available Python implementations on community, both in theoretical research and practical tools. This both Windows and Linux platforms, we obtain highly encouraging includes efficient native code compilers and memory management results . runtimes. In this paper, we use Python as proof-of-concept to demonstrate We demonstrate the effectiveness of our approach by devel- that our approach delivers efficient native code compilers for dy- oping a compiler for a popular dynamic language, Python us- namic languages. We describe how source dynamic language ob- ing a strongly typed functional language, OCaml as intermediate. jects and constructs can be expressed in terms of target typed func- Python is a highly dynamic object-oriented language. The standard tional language data types. Finally, we present a comparative per- Python implementation, CPython, is a bytecode interpreter writ- formance analysis against different Python implementations such ten in C. We present a comparative evaluation of our implementa- as CPython, IronPython, PyPy and Jython to illustrate the effec- tion against several available Python implementations such as Iron- tiveness of our approach. Python, PyPy and Jython on Windows and Linux platforms. The technical contributions of this paper are: 1. Introduction • A methodology for building native code compilers for dynamic Dynamic scripting languages have become ubiquitous in a wide languages using statically typed functional languages, range of programming domains, including scientific computing. Some of these languages which are widely used include Python, • A formal representation of the source dynamic language (Python), Perl, Matlab and R. These languages provide increased flexibility target functional language (OCaml), and the translation seman- and improve productivity by providing domain-specific features tics and abstracting many of the details associated with lower-level • A representation of source dynamic language objects in terms programming languages such as C and Fortran. of target functional language data types On the other hand, statically typed languages offer a slew of • A native code compiler for Python via translation to OCaml as advantages including higher safety, early error detection and higher proof-of-concept execution efficiency. As a result, the standard practice is to use dynamic languages for prototyping and use a lower-level language • A detailed comparative performance evaluation of our imple- to ’harden’ the application for performance. Our thesis is that we mentation versus several available Python implementations on can avoid this costly rewriting step if we have good native code multiple platforms compilers for dynamic languages. In this paper, we first describe some of the challenges in com- piling dynamic languages in general, and Python in particular. We then describe an abstract syntax for our source and target lan- guages. The following section describes the translation process, in- cluding features implemented in runtime environment, the architec- ture of our compiler and the translation semantics for specific lan- guage constructs. We then perform a comparative analysis of our implementation against other Python implementations and discuss [Copyright notice will appear here once ’preprint’ option is removed.] the effectiveness of our approach. 1 2007/10/19 2. Challenges in compiling dynamic languages 2.2 Garbage Collection Dynamic languages such as Python, Perl, Matlab and R present Automatic memory management is implemented in most scripting some common challenges for language developers. languages for higher productivity. The CPython implementation uses a strategy called Reference Counting. Every object contains a reference count, which is incremented or decremented based • Memory Management Most scripting languages abstract the on whether a reference to the object is added or removed in the intricacies of memory allocation and deallocation away from program. When the reference count becomes zero, the object’s the user. This improves runtime safety by automating the re- memory can be freed. sponsibility of freeing allocated memory chunks as needed. The main advantages of reference counting are that it is fairly However, this necessitates a larger amount of work performed easy to implement and highly portable. It only requires that the by the runtime environment. functions malloc() and free() be available, which is guaranteed by • Lack of compile-time type information Having type informa- the C standard. Therefore, it is used in scripting languages such as tion at compile time enables the compiler to optimize memory Python and Perl, which are implemented in C. The chief drawback allocation and reduce overhead of dynamic method lookup. Dy- of reference counting is that it cannot be used in the presence of namic languages usually have no type declarations, and types cycles, when an object refers to itself (usually indirectly) . Python can be coerced to one another at runtime making precise type offers an additional optional gc module to manage memory in inference hard. the presence of cycles. In addition, reference counts take up extra • Imprecise definition of semantics Scripting languages such memory and need to be updated correctly. as Python and Perl have been developed cooperatively over a Automatic garbage collection algorithms not only manage al- period of years, incorporating features and code developed by location and deallocation correctly, but are also more efficient in many users. As a result, many features of the language and many cases due to their use of heap compaction and better cache their behavior in corner cases are not always well documented. performance. The most common algorithms are known as genera- In some cases, behaviors are left undefined or implementation- tional collectors, which are implemented both in modern JVMs, as dependent. well as in functional languages such as OCaml. Most generational collectors split the heap into a young and old generation (minor and major heaps in OCaml). Objects first move into the minor heap, and 2.1 Benefits of using typed functional languages if they have survived a fixed number of collection cycles, move into the major heap. In addition, OCaml’s garbage collector is incremen- The most important benefit of using a typed functional language tal i.e. it interleaves collection with computation, thus maintaining as an intermediate is that these languages provide highly efficient performance. automatic memory management and garbage collection. Any lan- guage that is effectively translated to a functional language can im- 2.3 Challenges in Python mediately take advantage of these facilities. The type systems of functional languages such as ML and The following Python script illustrates many of the syntactic fea- Haskell are expressive enough to model both untyped and fully tures of Python as well as the challenges it presents in compilation. dynamic computation as well as highly precise typed values. This The program creates a new class MyInt by inheriting from the built- is done using a combination of primitive types, structures such in type int and overrides the default + operator to compute the sum as tuples and records, and tagged unions with pattern-matching modulo 2. facilities over tags. Functional languages have an extremely well-defined and well- glob = 2 # a global variable understood semantics. Translating to a functional language gives class MyInt(int): us a formal semantics for the source language in the form of the # Constructor function translator. def __init__(self, x = 0): In this research, we
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-