Custom Calling Conventions in a Portable Assembly Language

Submitted to Proceedings of the ACM SIGPLAN '03 Conference on Programming Language Design and Implementation Custom Calling Conventions in a Portable Assembly Language Norman Ramsey Christian Lindig Division of Engineering and Applied Sciences Harvard University [email protected] [email protected] Abstract that space. We might form a cons cell containing a and d using this C-- code: Calling conventions are so di±cult to implement and main- if (hp + 12 > hplim) { // if no space is left tain that people rarely experiment with alternatives. The gc(); // garbage collect primary sources of di±culty appear to be parameter passing } and stack-frame layout. To implement parameter passing, bits32[hp] = TAG_FOR_CONS_CELL; we use the automata developed by Bailey and Davidson, bits32[hp+4] = a; // store the car but we have developed a new speci¯cation language that bits32[hp+8] = d; // store the cdr is implementable directly in the compiler. To implement p = hp + 4; // p is the new object stack-frame layout, we have developed an applicative, com- hp = hp + 12; // set hp to next free word posable abstraction that cleanly decouples the layout of the The code checks to see if there is enough space, and if not, it stack frame from the order in which compiler phases exe- calls the garbage collector. Allocation and initialization are cute. We hope these abstractions will so simplify the im- done inline; the bits32[. ] notation is C-- for a reference plementation problem that compiler writers will routinely to memory. customize calling conventions to improve performance. What sort of calling convention should be used to call gc()? Statically, call sites to gc() are likely to be common 1 Introduction (there is at least one for every extended basic block that allocates), but actual dynamic calls are likely to be rare The C calling convention on your favorite platform may (almost every allocation succeeds without garbage collec- have been carefully crafted to make e®ective use of regis- tion). We therefore want gc to have as many callee-saves ters while also supporting procedures with a variable num- registers as possible|this convention would minimize code ber of arguments. Such a convention can be hard to im- size at each call site and thus minimize total code size. If we plement, and even a mature compiler can harbor lingering were to save some registers unnecessarily at a garbage col- bugs (Bailey and Davidson 1996). If a compiler supports lection, the dynamic cost would be dominated by the cost multiple targets, tail calls, and calls to foreign functions, of the collection. To maximize the number of callee-saves the e®ort is compounded, because one calling convention is registers, we can't simply tune an existing convention to not enough. Such a compiler is required for C--. change the number of registers saved across calls|we must C-- is a portable assembly language that is intended also be sure that no register is reserved to hold a parameter to be generated by a language-dependent front end and or result, because an unused parameter register is typically compiled to e±cient machine code (Peyton Jones, Oliva, treated as caller-saves (Chow 1988). The mechanisms de- and Nordin 1997; Peyton Jones, Ramsey, and Reig 1999). scribed in this paper can be used to de¯ne an appropriate Our hope is to make C-- as easy to generate as C, while convention, which can be called using something like making the resulting code perform almost as well as code call "max-callee-save" gc(); from a custom code generator. To this end, C-- must be in place of the call above. expressive enough so that the author of a front end can The contribution of this paper is a set of abstractions control cost tradeo®s. A previous paper discusses mecha- that a compiler can use to describe calling conventions. nisms an author can use to control tradeo®s in exception ² We have developed new abstractions for specifying pa- dispatch (Ramsey and Peyton Jones 2000); this paper de- rameter passing (Section 3). Our speci¯cations decouple scribes mechanisms used to control cost tradeo®s in calling parameter passing from other parts of the calling conven- conventions. tion, are no more complex than those used in previous One of the cost tradeo®s in a calling convention con- work (Bailey and Davidson 1995), and eliminate the need cerns use of registers. For example, how many registers for a program generator apart from the compiler. are used as callee-saves, caller-saves, and parameter registers can have a signi¯cant e®ect on performance (Davidson ² We have developed a declarative technique for specifying and Whalley 1991; Appel and Shao 1992). As an example the layout of a procedure's activation record or frame where use of registers matters, we present code that might (Section 5). This technique makes it easy to decouple be used to create a cons cell in a functional language. In frame layout from the order in which di®erent phases this example, hp is a \heap pointer" that points to space of the compiler execute, and it makes a frame-layout that can be used for allocation, and hplim marks the end of speci¯cation correspond closely to the sort of picture of 1 frame layout one ¯nds in a typical architecture manual we provide a new speci¯cation language and a new way of or calling-convention manual. implementing frame layout. What if you don't care about customizing calling conventions? Our abstractions are simple enough that they C-- and Quick C-- should be useful even for implementing one calling convention on one platform, and they can describe existing, stan- C-- is intended as a target for a high-level{language com- dard calling conventions cleanly. Although we have not yet piler: the front end. C-- encapsulates a code generator| performed experiments with custom calling conventions, we technology that is well understood but di±cult to imple- have used our abstractions to implement standard C calling ment. The primary goal of C-- is to enable a code gener- conventions on the Alpha, Intel x86, and MIPS platforms. ator to be reused. E®ective reuse requires that the author Implementations range from 90{130 lines of code each, di- of a front end be able to control cost tradeo®s. vided about evenly among parameter passing, frame lay- C-- comprises a language and a run-time interface; the out, and basic infrastructure such as importing modules language is used by the front end to request generated code, and naming registers. We have tested these implementa- and the run-time interface is used by the front end's run- tions using Bailey and Davidson's (1996) techniques, which time system to support such language features as exception our abstractions support. We have also used the same tech- dispatch and garbage collection. Only the language is rele- niques to con¯rm that our compiler interoperates with the vant to this paper. native C compiler, at least for procedure calls that can be The C-- language is designed around a register-transfer expressed in the type systems of both compilers. model of computation. This model is a re¯nement (Ramsey and Davidson 1998) of a model ¯rst used for compilation by Davidson and Fraser (1980); a similar model is used in 2 Background the popular compiler gcc. C-- also provides a form of procedure, which may accept multiple parameters and return Before presenting automata and frame layout, we provide multiple results. Although there is no limit to the number some background on calling conventions and on C--. of parameters or results any one procedure may accept or return, this number is ¯xed at compile time; C-- does not 1 Calling conventions support varargs as it is understood in C. C-- provides control flow between procedures using a construct called cut A calling convention is a contract among four parties: a call- to, which is a bit like C's longjmp but can pass multiple ing procedure (the caller), a called procedure (the callee), parameters (Ramsey and Peyton Jones 2000). a run-time system, and an operating system. All four par- C-- has just enough of a type system to help a compiler ties must agree on how space will be allocated from the put values in machine registers: the type of a value is its stack and how that space will be used: a procedure needs width in bits. Some machines have more than one kind of stack space for saved registers and private data, an operat- register, and a good C-- compiler should put a value in the ing system needs stack space to hold machine state (\signal kind of register that best ¯ts the way the value is used. For context") when an interrupt occurs, and a run-time system example, an intermediate result in a floating-point compu- needs to walk a stack to inspect and modify the state of a tation should be put in a floating-point register. suspended computation. In addition to sharing the stack When a value is passed to a separately compiled proce- with the other two parties, a caller and callee must agree on dure, the C-- compiler can't see how it is used, so the com- how to pass parameters and results. This paper focuses on piler needs help deciding what kind of register should hold caller and callee; we consider run-time system or operating it. We provide such help by using a hint on every actual and system only when they impose constraints on a procedure.

Custom Calling Conventions in a Portable Assembly Language

Strict Protection for Virtual Function Calls in COTS C++ Binaries

Tricore Architecture Manual for a Detailed Discussion of Instruction Set Encoding and Semantics

IAR C/C++ Compiler Reference Guide for V850

Lecture 21: Calling Conventions Menu Calling Convention Calling Convention X86 C Calling Convention

Specification of Compiler Abstraction

EECS 373 Design of Microprocessor-Based Systems

“Option Summary” in Using the GNU Compiler Collection (GCC)

System Calls

System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.4

Introduction to Intel X86-64 Assembly, Architecture, Applications, & Alliteration

Calling Conventions for Different C++ Compilers and Operating Systems

RISC-V Instructions and How to Implement Functions