Implementing a Portable Foreign Function Interface Through Dynamic C Code Generation

Implementing a Portable Foreign Function Interface through Dynamic C Code Generation Abstract mentation technique for programming languages, in particular scripting languages. They combine a Interpreters for various languages usually need to number of advantages: interface to arbitrary library functions that have a C interface. Unfortunately, C does not directly al- • simplicity low constructing calls to functions with arbitrary parameter lists at run-time. Various libraries (in • portability across architectures particular libffi and the ffcall libraries) have been • high startup speed developed to close this gap, but they have two dis- advantages: each foreign function call is slow, and One might think that calling C functions from these libraries are not available on all platforms. an interpreter written in C should be trivial, but In this paper we present a method for performing unfortunately in the general case this is not true: such foreign function calls by creating and com- One can call specific C functions easily; one can piling wrapper functions in C at run time, and also call C functions with a specific sequence of ar- then dynamically linking them. The technique is gument and return types indirectly; but calling a portable to all platforms that support dynamic function with a statically unknown sequence of ar- linking. Moreover, the wrapper functions are faster gument and return types is not supported by C. than foreign function calls through earlier libraries, The ffcall libraries1 and libffi2 have been devel- resulting in a speedup by a factor of 1.28 over libffi oped to fill that gap, but they introduce a signifi- on the DaCapo Jython benchmark. One disadvan- cant call overhead and they are not available on all tage is the higher startup cost from calling the C platforms. compiler at run time. We present and evaluate sev- In this paper we present and evaluate a for- eral methods to alleviate this problem: caching, eign function interface based on generating wrap- batching, and seeding. per functions in C at run-time, and then compiling and dynamically linking these functions. Some may 1 Introduction consider this idea obvious, but few people have used it (and even fewer have published about it), people Many languages support calling libraries written have invested significant effort in other approaches in other programming languages; this facility is (e.g., libffi and ffcall), and there have been no em- known as foreign function interface. More con- pirical evaluations of this approach; the present pa- cretely, many of the libraries of interest provide per fills this hole. C interfaces, so the foreign language of interest is Our main contributions in this paper are: usually C. Foreign function interfaces are particu- • larly popular in scripting languages such as Python We present an implementation of this idea in [Bea98], Perl, Tcl, and Ruby, but are also present a JVM interpreter (Section 3.2) and compare in other languages, e.g., Java [Lia99], Common the calling performance with libffi and ffcall Lisp, LuaML[Ram03], Haskell [FLMJ98], and vari- (Section 5.2). ous Forth implementations. 1http://www.haible.de/bruno/packages-ffcall.html Interpreters written in C are a popular imple- 2http://gcc.gnu.org/viewcvs/trunk/libffi/ • We also discuss the run-time cost of invoking microbenchmarks, but also on application bench- the C compiler (Section 4) and present and marks. evaluate several ways of alleviating this prob- The libffi interface is similar, but divides the lem: work into two stages: In the first stage you pass an array of parameter types to the macro – Using a faster compiler (Section 4.1, Sec- ffi_prep_cif, and libffi creates a call interface ob- tion 5.4) ject from that. In the second stage you create an – Caching of compiled wrapper functions array of pointers to the parameter values, and pass between runs (Section 4.2, Section 5.3) that to the calling macro ffi_call along with the – Compiling a batch of wrapper functions call interface object. at once (Section 4.3, Section 5.4) In the Cacao interpreter, we execute the first stage only once per foreign function, and the sec- – Seeding the cache with wrapper functions ond stage on every call. In theory, this two-stage for popular foreign functions (Section 4.4) arrangement can be more efficient than the ffcall interface, but in practice, the ffcall interface is faster We discuss the problem and its existing solutions on the platforms where we measured both.3 But in more depth in Section 2, and finally compare our even if the libffi interface was implemented opti- approach with related work (Section 6). mally, there would still be at least the overhead of constructing the array of parameter pointers. 2 Previous work Another issue with the approach taken by these libraries is that they require additional effort for In this section we describe using the ffcall libraries every architecture and calling convention. While and libffi for calling foreign functions, and the dis- the maintainers of these libraries have done an ad- advantages of this approach. mirable job at supporting a wide variety of archi- You can use the ffcall interface for calling a for- tectures and calling conventions, they are not com- eign function (avcall) as follows: For each param- plete: e.g., at the time of writing the Win64 calling eter, you call a type-specific macro (e.g., av_int), convention for the x86-64 (aka x64) architecture is and pass the actual parameter. These macros con- supported by neither library. So, an approach that struct an av_list data structure, that you eventu- does not require extra work for every architecture ally use in the av_call macro to call the function. and calling convention would be preferable for com- The overhead of using this interface for a foreign pleteness as well as for reducing the total amount function call in an interpreter consists of several of effort. components: • The interpreter has to interpret a description 3 Dynamically Generated C of the foreign function, and use that to call the appropriate ffcall macros in the appropri- Wrapper Functions ate order. In this section we describe the basic idea and two • The macros construct a data structure. implementations of our approach. We present im- provements and optimizations beyond the basic ap- • Finally, av_call has to interpret that data proach in Section 4, and timing results in Section 5. structure, then move the parameters to the right locations according to the calling convention, then perform the call, and finally store 3.1 Basic Idea the result in the right location. The C compiler already knows the architecture and It is easy to see that this is quite a bit more the calling convention of the platform, so we should expensive than a regular function call. How- 3 In our experience, libffi still has one significant practical ever, we were still surprised when we saw signif- advantage over ffcall for interpreter developers: gdb back- icant speedups from our new method not just on tracing works across libffi calls, but not across ffcalls. 2 make use of this knowledge instead of reimplement- 2. The function is for a static class method, ing it from scratch (as is done in ffcall and libffi). so the second parameter has to be the class Unfortunately, the C language does not give us pointer. We pass this pointer through a vari- a way to use this knowledge directly, e.g., through able, and assign the class pointer to the vari- a run-time code generation interface. But we can able when we dynamically link the wrapper perform run-time code generation indirectly: gen- function (and the variable). We have one such erate C source code at run-time, then invoke the C variable per static class function; sharing the compiler at run-time, and finally link the resulting variables for the same class would be possible object file dynamically. but more complex. In an earlier version we What is the C code that we need to generate dy- passed the class pointer as a literal number, namically if we want to call arbitrary functions? It but we changed it to the current scheme in needs to be a C function, and we will call this the order to implement caching (Section 4.2), be- wrapper function. It must be callable from ordi- cause different runs may require different lit- nary C code, so it needs pre-determined parameter eral addresses for the same class (e.g., due to and return types, typically the same for all wrap- address-space randomization). per functions. The wrapper function needs to call the foreign function, and pass the parameters to 3. The rest of the parameters are the parame- it. It has to read these parameters from the lo- ters passed to the method at the JVM level. cations where the interpreter has put them; in a In this example the called function has one stack-based virtual machine (VM), it will typically pointer parameter. The parameters reside read the parameters from the VM stack. And the on the JVM stack and are accessed through wrapper function has to store the result of the for- the stack pointer. The offsets from the stack eign function in a place where the interpreter ex- pointer (0 in this case) are hard-coded into the pects it; again, this will usually be the stack in wrapper function. The type of the parameter a stack-based VM. The wrapper may also update is also hard-coded, in the form of a cast of the the VM state (in particular, the VM stack pointer). pointer type before the access; in the exam- Figure 1 shows and Section 3.2 explains an example ple, the type of the parameter is a pointer, so of such a wrapper function.

Load more