Tail Calling Between Code Generated by C and Native Backends

Tail Calling Between Code Generated by C and Native Backends Laurent Huberdeau Marc Feeley DIRO DIRO Universitéde Montréal Universitéde Montréal Canada Canada [email protected] [email protected] Abstract The Gambit system uses the Gambit Virtual Machine Compiler backends are typically incompatible; it is not (GVM) [4] as the IR of a compiler that generates code possible to call the code generated by one backend from for several target languages, including C, JavaScript, code generated by another. When writing a new back- PHP, Python, Ruby, and Java. In a nutshell, each GVM end, not being able to use the runtime library of an instruction is translated into a sequence of instructions existing backend makes the development process longer of equivalent effect in the target language. and more complex than it should be as everything has Gambit's C backend is the most mature. It handles to be rewritten from scratch. Having interoperability of all of the Gambit Scheme features and is in a sense the code generated by different backends allows code shar- Gambit Scheme reference implementation. However, the ing for non-performance critical parts and simplifies the C language imposes code generation constraints that development of new backends. This paper presents an affect performance, notably in regard to the handling of approach used in the Gambit Scheme compiler to seam- tail calls and continuations. This has prompted an effort lessly mix the execution of code generated by C and to develop a family of backends that target the x86-32, native backends, by making tail calls and non-tail calls x86-64 and ARM architectures at the machine code level. between the two worlds possible. Our approach uses a They are collectively called the native backend, as these dual purpose control point representation and bridges backends share a large part of their logic. which are executed when control jumps from one world Because the GVM is a relatively high-level abstrac- to the other. The technique is presented in the context tion that, for example, does not specify how objects are of the ongoing development of Gambit's native backend represented and how garbage collection is performed, which targets the x86-32, x86-64 and ARM architectures. implementing the native backend could involve redesign- ing and rewriting the whole runtime system. Not only CCS Concepts • Software and its engineering → would this approach easily require a few man-years of Retargetable compilers; Source code generation; Runtime effort, but the support for the various Gambit Scheme environments; features would progress incrementally over the course of Keywords Compiler, Backend, Interoperability, Tail the project, hindering large-scale experimentation and Call, Scheme adoption by users until all features are supported. We have instead chosen to design the native backend so that 1 Introduction the code generated interoperates transparently with the code generated by the C backend. In other words, the A popular implementation approach for programming native backend implements the same object represen- languages consists in designing an abstract instruction tation as the C backend (allowing data to be shared set tailored to the source language and implementing transparently) and it is possible to jump back and forth this abstraction as a virtual machine (VM). This usually between the existing runtime system (including library simplifies the compiler, it offers portability when theVM procedures and garbage collector) compiled using the is based on an interpreter, and the abstract instruction C backend and code generated by the native backend. set can be used as the intermediate representation (IR) This way the availability of the Gambit Scheme features of a compiler with multiple target languages. comes early in the development process because their This approach is particularly interesting for Scheme implementation using the C backend can be used as a because the language has constructs that have no direct fallback until the moment they are handled directly, and equivalent in other languages; tail calls and continuations hopefully more efficiently, by the native backend. among others. The Bit [1], Picobit [8], Guile [7], Racket Having early access to the C backend's runtime system [6] and Gambit [3] systems are all based on this approach. simplifies the development process of the native backend. SFPW'18, Sept. 28, 2018, Saint Louis, MO, USA 2018. 1 SFPW'18, Sept. 28, 2018, Saint Louis, MO, USA Laurent Huberdeau and Marc Feeley Allocation procedures, the garbage collector, operating- the set of inlined primitives and the procedure calling system interface procedures, complex library procedures convention. In the existing backends, only 5 registers including display and pretty-print, ressource and are used, 2 of which have a special purpose. The first run time measurement procedures, exception handlers register, r0, contains the return address of the current (stack/heap overflow, wrong number of arguments, etc), procedure and the last, r4, which we call the self register, first-class continuations, can all be used in test programs. is used when calling closures. One of the general purpose These features are particularly useful in debugging the registers, r1, is also used to contain the return value of machine code generation and to assess the performance procedures. The frontend generates a control flow graph of various code generation approaches (choice of instruc- of basic blocks that contain code for the GVM. tions, registers, etc). The GVM defines seven instructions: label, jump, ifjump, This paper explains the design and implementation of switch, copy, close and apply. The label instruction is the mechanism to jump between code generated by the the first instruction of each basic block and gives the C backend and code generated by the native backend block's identifier, which is a small number prefixed by # (cross jumps). Three important design objectives that (e.g. #2 ), in addition to its kind. There are two kinds are achieved by our design are: of basic blocks: first-class blocks and local blocks. First- 1. Cross jumps are transparent. A backend does not class blocks can be used like any other Scheme object know statically if a jump is a cross jump or not. It reference and are for control flow between procedures. is only at run time that the destination is known. There are two types of first-class blocks: entry points A library procedure compiled using the C backend and return points, which correspond to procedure values could be called with return addresses that cause and return addresses respectively. When called, entry it to return to code generated by the C backend points verify the number of arguments before proceeding sometimes and the native backend other times. with the execution of the basic block. Local basic blocks Moreover, cross jumps that are tail calls must not are not first-class objects as they are only used for con- consume space, on the stack or elsewhere. trol flow inside the procedure. The label instruction also 2. Non-cross jumps are performed as efficiently as if indicates the current size of the stack frame. This is used the other backend did not exist. For the native to access locations on the stack with the correct offset backend, this means that it can implement pro- from the top of the stack. cedure values and return addresses as pointers to Basic blocks always end with a branch instruction machine code, and a plain branch to address ma- indicating the frame size at the end of the basic block. chine instruction can be used for procedure calls This allows the backend to adjust the stack pointer only and returns. once as explained below. There are two different types 3. Cross jumps have a cost that is similar to non-cross of branches: conditional and unconditional. Conditional jumps within code generated by the C backend, branches are generated for if, case and cond Scheme which uses trampolines for implementing jumps. expressions and correspond to the ifjump and switch GVM instructions. Unconditional branches correspond The paper is organized as follows. Section 2 presents to the GVM's jump instruction. Conditional and uncon- the GVM and how the C and native backends implement ditional branches have an important difference in regard tail calls and continuations. The technique and its imple- to their destinations. Conditional branches may only mentation are explained in detail in Section 3. Section jump to local blocks and unconditional branches may 4 evaluates the performance overhead of the technique jump to local blocks as well as first-class blocks. We call and its applicability to the development of the native jump instructions to first-class blocks external jumps backend. as the control goes to a label that may be in another 2 Background procedure. For external jumps, the jump's operand may be a reference to a closure, an entry point or a return 2.1 Gambit Virtual Machine point. The Gambit Virtual Machine is the abstract machine tar- The jump instruction optionally specifies a number of geted by the compiler's frontend. The machine uses a set arguments. This is used by the callee to verify that the of locations that can store any Scheme object reference. number of arguments passed to the procedure is correct There are general purpose registers numbered starting and to support rest and optional parameters. When the at 0 (e.g. r1 ), frames that are stored on a stack (e.g. value is given, the destination of the external jump must frame[1] is the first slot of the current frame) and global be a procedure (closure or entry point), otherwise, it variables (e.g. global[a] to access the global variable a). must be a return point. This is used to generate efficient The backends can parameterize the generation of code for code for the jump.

Tail Calling Between Code Generated by C and Native Backends

Bringing GNU Emacs to Native Code

Hop Client-Side Compilation

SI 413, Unit 3: Advanced Scheme

Supporting Return-Value Optimisation in Coroutines

A History of Clojure

Tail Recursion • Natural Recursion with Immutable Data Can Be Space- Inefficient Compared to Loop Iteration with Mutable Data

Interprocedural Control Flow Analysis of First-Order Programs with Tail Call Optimization 1 Introduction

Principles of Programming Languages from Recursion to Iteration Through CPS

A Modern Reversible Programming Language April 10, 2015

Proper Tail Recursion and Space Efficiency

CS 137 Part 9 Fibonacci, More on Tail Recursion, Map and Filter Fibonacci Numbers

CS 321 Programming Languages Forward Recursion Functions Calls