Bypassing Portability Pitfalls of High-Level Low-Level Programming

Bypassing Portability Pitfalls of High-level Low-level Programming Yi Lin, Stephen M. Blackburn Australian National University [email protected], [email protected] Abstract memory-safety, encapsulation, and strong abstraction over hard- Program portability is an important software engineering consider- ware [12], which are desirable goals for system programming as ation. However, when high-level languages are extended to effec- well. Thus, high-level languages are potential candidates for sys- tively implement system projects for software engineering gain and tem programming. safety, portability is compromised—high-level code for low-level Prior research has focused on the feasibility and performance of programming cannot execute on a stock runtime, and, conversely, applying high-level languages to system programming [1, 7, 10, a runtime with special support implemented will not be portable 15, 16, 21, 22, 26–28]. The results showed that, with proper ex- across different platforms. tension and restriction, high-level languages are able to undertake We explore the portability pitfall of high-level low-level pro- the task of low-level programming, while preserving type-safety, gramming in the context of virtual machine implementation tasks. memory-safety, encapsulation and abstraction. Notwithstanding the Our approach is designing a restricted high-level language called cost for dynamic compilation and garbage collection, the perfor- RJava, with a flexible restriction model and effective low-level ex- mance of high-level languages when used to implement a virtual tensions, which is suitable for different scopes of virtual machine machine is still competitive with using a low-level language [2]. implementation, and also suitable for a low-level language bypass Using high-level languages to architect large systems is bene- for improved portability. Apart from designing such a language, ficial because of their merits in software engineering and safety. another major outcome from this work is clearing up and sharpen- However, high-level languages are not a perfect fit for system pro- ing the philosophy around language restriction in virtual machine gramming. In order to effectively undertake system programming design. In combination, our approach to solving portability pitfalls tasks, extensions and restrictions are two essentials of high-level with RJava favors virtual machine design and implementation in low-level programming — both leave unsolved challenges. terms of portability and robustness. Extensions cause portability pitfalls. Portability pitfalls of high- level low-level programming include poor portability of low-level Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors—Run-time environments code across platforms and poor portability of high-level extensions across runtimes. High-level languages (HLLs) are designed to ab- General Terms Design, Languages stract over low-level details, and most of them do not provide nec- Keywords Virtual machine, Restricted language, Portability, High-level low-level essary semantics for low-level operations, which is a key require- programming ment in system programming projects. In order to undertake a low- level programming task, high-level languages need to be extended 1. Introduction and require special support from the runtime for those extensions. This leads to the fact that VM components written in the extended Current hardware trends are increasingly exposing software devel- HLL cannot execute on a stock runtime, and, conversely, a runtime opers to hardware complexity. Novel techniques such as multicore with special support implemented will not be portable across dif- and heterogeneous architectures increase hardware capacity, but ferent platforms. Both break portability. also leave programmers a list of challenges if they wish to fulfill These portability pitfalls limit code reusability of high-level the hardware’s potential. Dealing with complex hardware increases low-level programming. Efficient implementation of modern lan- the difficulty of systems programming. In the meantime, the com- guage runtimes requires experts from different areas, such as mem- plexity of system software grows in pace with hardware evolution. ory management, concurrency, scheduling, JIT compilation. This With the increasing software complexity, it is even harder to retain leads to a trade-off between the high cost of hiring a group of spe- correctness, security and productivity. cialists and the risk of failure for lacking expertise. One possible Modern high-level languages are widely used for application solution to this tension is to encourage reusability. However, when programming for the assurance of correctness and security as well the implementing language cannot execute with a proper hosting as boosting productivity. High-level languages provide type-safety, runtime on the target platform, reusability is difficult to achieve — any given runtime that wishes to host high-level code for low- level programming needs to be modified to support the new semantics, otherwise the newly introduced low-level semantics need to be carefully dealt with in other ways [13, 14]. Both involve non-trivial work for each porting. Low-level languages (LL languages) do not have such issues. Low-level languages such as C/C++ usually have available compil- ers across most target platforms. Thus, a bypass approach of trans- [Copyright notice will appear here once ’preprint’ option is removed.] lating a HLL into a LL language could be one possible way to solve Bypassing Portability Pitfalls of High-level Low-level Programming, VMIL 2012 1 2012/12/3 this pitfall. However, generally a LL language bypass is not easy to strictions exist for more principled reasons: correctness and achieve — a HLL’s precise exceptions, reflection and dependence performance. on its standard libraries are hard to map into a LL language, and ef- Correctness is one challenge for engineering complex system ficient dynamic dispatch requires non-trivial work when targeting projects. The use of a HLL introduces much better software a non-OO LL language such as C. engineering, such as abstraction, a strong type system and au- Restrictions lack of definitions. System programming with a tomatic memory management, which promotes correctness to HLL also relies on restrictions for performance-critical operations a more manageable level. However, the correctness of a VM or avoidance of possible program failure. We observed that some implemented using a HLL still needs careful consideration, es- VM components are written by following strict restrictions. Re- pecially in metacircular cases when using a HLL. When imple- strictions include omitting high-level language features that are un- menting a language in the same language, one pitfall threaten- necessary or problematic for certain contexts of low-level program- ing correctness is infinite regress. The code to support language ming. These restrictions bring the HLL closer to a LL language. feature X needs to avoid using the feature X itself, otherwise This justifies the possibility of translating from the restricted HLL that code would recursively invoke itself and be unable to fin- to a lower-level language to solve portability pitfalls. Currently ish. For example, the scheduler code needs to generally avoid such restrictions do not have clear definitions in the program, and any language feature regarding threading and scheduling, but are achieved by careful ad hoc hand coding. Explicit definitions instead use more basic primitives such as locking to fulfill its of the restrictions and automatic checking are more principled and function. Another example is the memory manager, which pro- more robust. vided the impetus for RJava. The memory manager has to avoid triggering object allocation during allocation code, or triggering Thus, this paper focuses on two topics: a) high-level language another garbage collection during garbage collection, so prim- restriction in system programming, and b) translation from re- itives to operate on raw memory have to be added to the HLL stricted high-level language to LL language. These two topics are and used to implement the memory manager. The lessons here independent but quite coherent in our context: the natural existence are that a HLL for VM implementation has different restrictions of language restriction leads to the possibility of our HLL-to-LLL when applied to different scopes. Our initial focus for RJava bypassing, which further provides a solution to portability pitfalls was the scope of implementing a portable memory manager, of high-level low-level programming. however, we generalize our approach to be as flexible as possi- In this paper, we first discuss the important concerns in our ble so that RJava can be adapted and used in other scopes with design of RJava and the proper position of restriction within a trivial effort. whole system programming project, based on some observations Performance is critical in systems programming. This is prob- we made on an existing high-level low-level programming project, ably the most powerful argument as to why people stick with Jikes RVM [1]. Then we propose an explicitly-restricted language lower-level languages for such tasks. However, using a HLL to called RJava with proper low-level extensions, and use MMTk [5] implement a VM needs proper restrictions so as to achieve a as an example to show how the elements of RJava are used in prac- very compatible performance. For example, dynamic dispatch

Bypassing Portability Pitfalls of High-Level Low-Level Programming

Here I Led Subcommittee Reports Related to Data-Intensive Science and Post-Moore Computing) and in CRA’S Board of Directors Since 2015

Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior Gilberto Contreras Margaret Martonosi

Fedora Core, Java™ and You

Java in Embedded Linux Systems

Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine

Full-System Simulation of Java Workloads with RISC-V and the Jikes Research Virtual Machine

A Deployable Sampling Strategy for Data Race Detection

Jikes RVM and IBM DK for Java Understanding System Behavior Other Issues 4

Code Coagulation Is an Effective Compilation Technique

Valor: Efﬁcient, Software-Only Region Conﬂict Exceptions ∗

GNU Classpath

Guiding Inlining Decisions Using Post-Inlining Transformations Erick