<<

` and tcc: A Language and Co mpiler for Dynamic

Co e Generatio n

Massimiliano Poletto

Lab or atory forComputer Science, Massachusetts Institute of

and

Wilson C. Hsieh

Depar tmentofComputer Science, Universityof Utah

and

Dawson R. Engler

Lab or atory forComputer Science, Massachusetts Institute of Technology

and

M. Frans Kaasho ek

Lab or atory forComputer Science, Massachusetts Institute of Technology

Dyna mic c o de g ene rat io n allows p rog rammerstouserun -time informa tion in order to a chieve

performan ce an d expressivene ss sup e rior o tho se of sta tic co de. The `C  TickC langu age is

a sup e rset o f ANSI C tha t supp o rts ec ient and h igh -leve l use o f dyna mic c o de ge nerat ion . `C

provides dy namic co de gene ra tion at t he level o f C exp ressions and st ate ments, and su pp orts the

co mp ositio n of dy namiccodeatrun t ime. These fea tures enab le prog rammers t o add dy namic

co de gene ra tion t o existing C co d e inc rementa lly, a nd t o write important app licat ions  suchas

\just-in-time" co mpilers easily. The pa p er presents many e xamples of how`C can e used t o

solve pract ic al problems.

The tcc is an ecient , p ortab le, and freelyavailable implementat io n of `C. tcc allows

to trad e dy namiccompilation sp eed fo r dy namiccodequality: in some a pplic ations,

it is most imp orta nt t o ge nerate co de quickly, while in ot hers c o de qualitymat te rsmore tha n

co mpilat ion sp eed . The overhe ad of dyna miccompila tion is o n the ord er o f 10 0 to 6 00 cy cles p er

ge nerate d inst ruc tion, d ep en ding on the level of dyn amic op timiza tion. Measu rements show t hat

th e use of dyn amic co d e ge nerat ion ca n improveperforma nce byalmostanorderofmagnitu de;

two - t o four-fold sp e edup s are co mmon. In mo st c ases, the overhe ad of dyna mic c ompila tion is

recovered in un der 10 0 uses o f the dy namic c o de ; sometimes it c an b e rec overe d with in one u se.

Cat ego ries a nd S ub je ct Descriptors: D.3.2 [Programming Languages]: Lang uag e Classi ca-

tion s|specialized ap plicatio n langu ages; D.3.3 [Programming Languages]: La nguag e Con-

struct s an d Fe atu res; D.3.4 [Programming Languages ]: Pro cessors|comp ilers; code genera-

tion; ru n-time enviro nments

Ge neral Terms: Algorith ms, La ngua ges, Performa nce

Additiona l Key Words and Ph rase s: , d ynamic co de ge nerat io n, dyn amic co de opt imiza-

tion , ANSI C

Email:[email protected] du, [email protected] h.e du, eng [email protected] , kaasho e [email protected]. L ab o-

rato ry for Co mp ute r Sc ien ce, Massachuset ts Inst it ute o f Techno logy, Cambrid ge, MA 0 213 9. The

sec ond aut hor can b e reached at: University of Ut ah, Comput er Sc ie nce, 5 0 S Centra l Ca mpus

Drive , Ro om 3190 , Salt Lake City, UT 8 4112 -9 205.

This resea rchwas supp o rte d in part bytheAdvanc ed Resea rch Pro ject s Age ncy und er co ntrac ts

N000 14-94-1-098 5 a nd N6 600 1-96-C-85 22, a nd by a NSF Nation al Youn g I nve stigat or Award.

2  Poletto, Hs ieh, Engle r, Kaasho ek

1. INTRODUCTION

Dynamic co de generation | th e generation of co de at run time | en-

ables th e use ofrun-tim e in formation t o improve co de quality. Information about

run-t ime invariants provides new opp ortuni ties for classical optimizations suchas

, dead co de elimination, and inlining. In addition, d ynamic co de

generation is the key technology b ehind j ust-in-time compilers, compilin g inter-

preters, and other comp onents of mo dern m obile co de and oth eradaptive systems.

`C is a sup erset of A NSI C that supp orts the high-level and e cient use of

dynamic co de generation. It ext ends ANSI C with a small number of constructs

that allow the to express dynamic co d e at th e l evel ofCexpressions

and statem ents,and t o compose arbit rary dynamic code atrun ti me. These f eatures

enable programm ers to write comp im p erativeco de manipulation program s in a

style similar to Lisp [St eele Jr. 1990], and it relati vely easy to write p owerful

and p ortable dynamic co de. Furthermore, since `C is a sup erset of AN SI C, it is

not dicult to im proveperformance of co de incrementally byaddin g dynami c co de

generationtoexistin g C p rograms.

`C's extensionstoC|twotyp e constructors, t hree unaryop erators,and a few

sp ecial forms | allow dynamic co de to be typ e-checked statically. Much of the

overhead of can theref ore be incurred statically, which im-

proves the eciency ofdynamic compilation. W hile these constru cts were designed

for AN SI C, it should be straightf orward to add analogous construct s to other

statically typ ed languages.

tcc is an ecientand freely available implementation of `C, consist ing ofafront

end, back ends that comp ile to C and to MIPS and SPARC , and two

runt ime syst ems. tcc allows the user to trade dynamic co de quality for dynamic

co de generation sp eed. If com pilation sp eed must be maximized, dynamic co de

generation and register allo cation can be performed in one pass; if co de quality

is most i mp ortant, the system can construct and opti mize an intermediate repre-

sentation prior to code generation. The overhead of dynamic co de generation is

approxim ately 100 cycles p er generated instruction when tcc only p erforms si mple

dynamic co de optimi zation, and approximately 600 cycles p er generated instruction

when all of tcc's dynam ic optimi zations are turned on.

This paper makes t he f ollowin g contributions:

|It describ es the `C language, and motivates t he d esign of th e language.

|It describ es tcc, with sp ecial emphasis on it s tworunt ime systems, one tu ned f or

co de quality and the ot her for fast dynamic co de generation.

|It presents an extensivesetof`Cexamples, which illu strate the u tilityof dynamic

co de generation and the ease of use of`Cinavarietyof contexts.

|It analyzes the p erformance of tcc and tcc-generated dynamic code on several

b enchmarks. Measurement s showthatuseof dynamic comp ilati on can im prove

p erformance byalmost an order of magnitude in some cases, and generall y results

in two- to four-fold sp eedup s. The overheadof dynamic compilation is usually

recovered in un der 100 uses of th e dynamic co d e; sometimes it can b e recovered

withi n one use.

The rest of t his paper is organized as follows. Section 2 describ es `C, and Sec-

`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation  3

tion 3 describ es tcc. Section 4 illustrates severalsamp le applications of `C. S ect ion5

presents p erformance m easurements. Finally,we discuss related work in Sect ion6,

and summarize our con clusions in Section 7. App endix A d escrib es the `C exten-

sions to t he A NSI C grammar.

2. THE `C LA NGUAGE

The `C language was designed to sup p ort easy-to-usedynamic co de generation in a

systems and applications programm ing environment. This requirementmotivated

some of the key f eatures of the language:

|`C is a small extension of ANSI C: it adds very few construct s | two type

constructors, t hree un ary operators, and a few sp ecial forms { and leaves the rest

of the language intact. Asaresult, it is p ossible t o convert existing C co de to `C

incrementally.

|Dynamic co de in `C is st atically typ ed. Th is i s consistent with C, and improves

the p erformance of dynamic compilation by eliminating the need for dynamic

typ e-checking. The same constructs used to extend C with dynamic co de gener-

ation should b e applicable to other staticall y typ ed languages.

|The dynami c compilation pro cess is imp erative: t he `C programmer directs t he

creation and comp osition of dynamic co de. This approach distinguishes `C f rom

several recent declarative dynamic compilation systems [Auslan der et al. 1996;

Grant etal. 1997; Consel and No el 1996]. We b elieve that the imp erative ap-

proach is b ett er suit ed to a syst ems environ ment , where t he programmer wants

tightcontrolover dynamically generated co de.

In `C, a p rogrammer creates code speci cati ons, whichare stat ic descrip tions of

dynamic co de. Co de sp eci cations can capt ure the values of run-tim e constants,

and they may be comp osed at run time to build larger sp eci cati ons. They are

com piled at run t ime to pro d uce executable co de. The pro cess works as follows:

1 At static compi le time,a`Ccompiler pro cesses the written program. Foreach

co de sp eci cation, t he compil ergenerates co de to capture its run-tim e environ-

ment, aswell as code to generate dynamic co de.

2 Atrun t ime, co de sp eci cations are evalu ated. The sp eci cation captu res its

run-tim e environmentat t his tim e, whichwecall envi ronment bindi ng ti me .

3 Atrun time, co de sp eci cations are passed to t he `C compile sp ecial form. com-

pile invokes t he sp eci cation's co de generator, and returns a function p ointer.

Wecall this p oint i n t ime d ynami c code generatio n time.

For examp le,thefollowing code fragment implements \H ello World" in `C:

hellovoid f void make

void  f  = compile`f printf \hello,worldnn"; g,void;

f  ;

g

The co de wit hin the backquote and braces is a co de sp eci cation for a call to

printf th at should b e generated at run ti me. The co d e sp eci cation is evaluat ed en-

vironment binding time, and t he resulting ob ject is then passed to compile, which

4  Poletto, Hs ieh, Engle r, Kaasho ek

generates executable co de for the printf and returns a f unction pointer dynamic

co de generation time. The funct ion p ointer can t hen b e invoked directl y.

The restof thi s section descri b es in det ail t he dynamic co de generationext ensions

intro d uced in `C. Section 2.1 describ es the ` op erator. Sect ion 2.2 describ es the

typ e constructors in `C. Secti on 2.3 describ es the unquoting op erators, @ and $.

Section 2.4 describ es the `C sp ecial forms.

2.1 The ` Op erator

The ` b ackquote, or\tick" op erator i s u sed to creat e dynamic code speci cations

in `C. ` can b e applied to an expression or comp ound statement, and indicates that

co de correspond ing to that expression or statementshould b e generated at run t ime.

`C disallows t he dynam ic generation of code-generating co de, so ` do es not nest. In

other words, a co de sp eci cation cann ot contain a nested co de sp eci cation. Some

simple usages ofbackquote are as f ollows:

/ Spec i catio n o f dy na mic code for the expressio n ` `4 '' : results

i n d ynami c generatio n o f code that produces4asavalue /

`4

/ Spec i catio n o f dy na mic codeforacal l to pri ntf ;

jmust bedecla red in an enclo sing /

`printf \d",j

/ Spec i catio n o f dy na mic codeforacompound statement /

`f inti;for i = 0; i < 10; i++ printf \dnn", i; g

Dynamic code is lexically scop ed: variables in static co de can be ref erenced in

dynamic co de. Lexical scoping and static typing allow typ e-checking and some

instruction selecti ontooccu r at static compile t ime, decreasing dynamic co de gen-

eration overhead.

The value of a variable af ter it s scop e has b een exited is un de ned, just as in

AN SI C. In contrast toANSIC,however,not all usesofvariab les outside their scope

can b e det ect ed statically. For exampl e, one m ayuse a lo calvariable declared in

static co de from with in a backquote expression, and t hen ret urn t he value ofthe

co de sp eci cation. W hen the co de sp eci cation is compiled, the result ing co de

references a memory lo cationthatnolonger contain s the lo calvariabl e, b ecausethe

original funct ion's activation record has gone away. The compi ler could p erform a

- ow to conservatively warn t he user ofapotential error;however, in

our exp erience this situationarises veryrarely,an d is easy to avoid.

The use of several C constructs is restricted within backquote expressions. In

particu lar, a break, continue, case, or goto stat ement cann ot be used to transfer

control outside the enclosing backquot e expression. For instance, the destination

lab el of a goto statement must be contained in th e same backquote expression

that contains the goto. `C provides other m eans for transferring control between

backquote expressions; we discuss t hese metho ds in Sect ion 2.4. The limitation

on goto and other control-transf er stat ements enables a `C compi ler to statically

determine whet her a control- owchange is legal. The useof retur n is not ed,

`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation  5

b ecause dynamic co de is always imp licitly in sid e a f unction.

A backquote expression can b e dynamically compi led using the compile sp ecial

form , which is describ ed in Section 2.4. compile returns a fu nctionpointer, which

can thenbeinvoked li keany other f unction p ointer.

2.2 Typ e Constr uctors

`C intro duces two new type constructors, csp ec and vspec. cspecsare static typ es

for dynamic co de; their presence allows dynamic co de to b e typ e-checked statically.

vsp ecsare statics typ es for dynamic lvalues  expressions t hatmay b e used on t he

left hand side of an assignment; their presence all ows dynamic co de to allo cate

lvalues as needed.

A cspec or vsp ec has an asso ciated eva luation type , which is the typ e of the

dynamic value of the sp eci cation. The evaluationtypeisanalogous to the typ e to

which a p ointer p oints.

2.2.1 csp ec Types. cspec short for code spec i catio n is the type of a dynamic

co de sp eci cation; th e evaluationtyp e of the cspec is the typ e of t he dynamic value

of t he co de. For example, t he typ e of t he expression `4 is int csp ec. The type void

csp ec is t he type ofageneric cspec typ e  analogous to the use of void * as a generic

p ointer.

App lyin g ` toastatem ent orcomp ound statement yiel ds an expression of type

void csp ec. In part icular, if the dynamic stat ementorcomp ou nd stat ementcontains

a r eturn statem ent, the typ e of the return value do es not a ect the typ e of the

backquote expression. Since all typ e-checking is p erformed stat ically,itispossible

to comp ose backquote expressions to create a fu nction at ru n time with mult iple

p ossibly in compatible return typ es. This de ciency in the typ e system is a design

choice: sucherrors are rareinpract , and checking for them would involvemore

overhead at dynamic oraddit ional lingui st ic extensions.

The co d e generated by ` may include implicit casts used to reconcile th e resul t

typ e of ` wit h it s use; the standard conversion rules of AN SI C apply.

Some simp le uses of csp ec foll ow:

intcsp ec expr1 = `4; / Code spec i catio n f or expressio n ` `4 '' /

oat x;

oat csp ec expr2=`x+4.; / C apturefree vari abl e x: its value wi l l be

bound wh en th e dyna mic cod e is executed /

/ Al l dyna mic compound statements h ave type voi d cspec, regard less of

wh ether the resulting code w il l return a val ue /

void cspec stmt=`f printf \hello,worldnn"; ret urn 0; g;

2.2.2 vsp ec Types. vsp ec va ria ble speci catio n is the typ e of a dynami cally gen-

erated lvalue, a variab le whose storage class  whether it shou ld reside in a register

or on the stack, and in what lo cation exactly is determined dynam ically. The

evaluationtyp e of the vsp ec is th e type of the lvalue. void vsp ec is used asageneric

vsp ec typ e. Ob jects of typ e vsp ec may be created by invokin g the sp ecial forms

param and lo cal. param is used to create a paramet er f or the f unction currently un-

6  Poletto, Hs ieh, Engle r, Kaasho ek

der construction; lo cal is used to reserve space in its activationrecordorallo cate

a register if p ossible. See Section 2.4 f ormore det ails on these sp ecial forms.

In general, an ob ject of typ e vspec is automat ically treatedasavariable ofthe

vsp ec's evaluationtyp e when it app ears inside a csp ec. A vsp ec inside a backquote

expression can thus be used like a trad itional C variable, b oth as an lvalu e and

an rvalu e. For example, t he f ollowin g function creates a cspec that takes a single

integerargum ent , adds one to it, and returns the result:

void csp ec p lus1void  f

/ Pa ram ta kes the typeandposi tion of the argumenttobe generated /

int vsp ec i = parami nt, 0;

retur n `f retur n i+1;g;

g

vsp ecsallowustoconstruct funct ions t hat takearun-t ime-d etermin ednumber

of argum ents; this functi onality i s necessary in applications suchas the compiling

d escrib ed in Section 4.4.2.

2.2.3 Discussio n. W ithin a quoted expression, vsp ecsand csp ecs can b e passed

to f unctions t hat exp ect th eir evalu ation typ es. The followin g co d e is legal:

int fintj;

void main  f

int vsp ec v;

/ ... ini tial ize v to a d ynami c lva lueusing local o r param . .. /

void csp ec c1 = `f f v; g;

g

Wit hin the quoted expression, f exp ects an integer. Sin ce this funct ion call is

evaluated during the of t he dynamic code, the integer lvalue t o whi ch v

refers will already have b een created. As a resu lt, v can be passed to f like any

other integer.

2.3 Unquoting Op erators

The ` op erator allows a programmer to create co de at run time. In this section we

describ e two operators, @ and $, that are used within backquote expressions. @

is u sed to comp ose csp ecs dynamically. $ is used to instantiate values as run-t ime

constants in dynamic co d e. These twooperators\un quote" their op erands: their

op erands are evaluated at environment binding t ime.

2.3.1 The @ Operator. The @ op erator allows co de sp eci cations t o b e comp osed

into larger sp eci cations. @ can only b e applied inside a b ackquote expression , and

its op erand mustbeacsp ec or vsp ec. @ \dereferences" its op erand atenviron ment

bindin g ti me: it returns anob ject whose typ e is the evaluationtyp e of @ 's operand.

The returned ob ject is incorporated into the csp ec in wh ich the @ o ccurs. For

examp le, in th e following fragment, c is t he addit ivecomp osition oftwo csp ecs:

/ Compo se c1 a nd c2 . Eva luatio n o f c yi eld s ` `9 '' . /

`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation  7

intcsp ec c1 = `4, cspec c2 = `5;

intcsp ec c = `@c1 + @c2;

Statements canbecomp osed through concatenation:

/ Concatenate two null state ments. /

void cspec s1 = `fg, csp ec s2 = `fg;

void cspecs=`f @s1; @s2; g;

App lyin g @ inside a backquote expression to a f unction which must return a

csp ec ora vsp eccauses t he f unction to b e called atenvironment binding time. Its

result is incorp orated into th e backquote expression.

In order to imp rove the readability of co de comp osition, `C provid es some impli cit

co ercions of vsp ecs and cspecs, so that th e @ op erator may be om itted in several

situations. A n expressionoftyp e vspec or csp ec that app ears inside a quoted ex-

pression is co erced with an i mplicit @toanobject of its corresp onding evaluation

typ e under t he f ollowin g condit ions:

1 the expression is not inside an unquot ed expression.

2 the expression is not b eing u sed as a statement.

The rst restriction also in cl udes implicitly unquoted expressions: th atis, expres-

sions that o ccur wit hin an implicitl y co erced expression are not i mplicitly co erced.

For example, the arguments in a call to a funct ion retu rning typ e cspec or vsp ec

arenotcoerced, b ecause th e f unction call itself already is.

These co ercions do n ot limit t he expressiveness of `C, b ecause `C supp orts only

one \level" of d ynamic co d e: it do es not supp ort dynamic co d e that generatesmore

dynamic co d e. Therefore, th e ability to manipulate vsp ecsand csp ecs in dynamic

co de is not u sef ul.

These implicit co ercions simplif y t he syntax of csp ec comp osi tion. Consider t he

following example:

intcsp ec a = `4; int csp ec b = `5;

intcsp ec sum = `a+b ;

intcsp ec sum ofsum = `sum+sum;

This co de is equivalent to t he following co de, due to the imp licit co ercion of a,

b,and sum.

intcsp ec a = `4; int csp ec b = `5;

intcsp ec sum = `@a+@b;

intcsp ec sum ofsum = `@sum+@sum;

Com piling sumof sum resul ts in dynamic co de equivalentto4+5+4+5 .

Statements and comp ound statements are considered to have typ e void; an

ob j ect of typ e void csp ec in sid e a backquote expression cannot be used inside an

expression, but can b e comp osed asan expression stat ement:

8  Poletto, Hs ieh, Engle r, Kaasho ek

void csp e c mkscale int  m,intn,intsf

return `f

inti,j;

for i=0;i < $n; i++ f / Loop can be dyna mical ly unrol led /

int  v = $m[i];

for j=0;j< $n ; j++

v[j]=v[j]  $s; / Mu ltip licatio n can be strengthreduced /

gg;

g

Fig. 1 . `Ccodetospecialize multiplica tion of a mat rix byanint ege r.

void csp ec h ello = `f printf\hello "; g;

void cspec world=`f printf \worldnn"; g;

void csp ec greet ing = `f @hello;@world; g;

2.3.2 The $ Operator. The $ op eratorallows run-time values t o b e in corp orated

as run-time constants in dynamic co de. $ evaluates it s operand at environ ment

bindin g time; the resulting valu e is used as a run-time constant in the containing

csp ec. $ may only app ear in sid e a backquote expression, and it may not be un-

quot ed. It m ay b e applied to anyobject notoftype cspec or vspec. The use of $ is

illustrat ed in th e co de fragment b elow.

intx=1;

void cspecc=`f printf \$x = d, x =  ", $x, x; g;

x= 14;

compilec, void  ; / Comp ile a nd run: wil l pri nt``$x = 1, x = 14 '' . /

Use of $ enables sp ecializationofcodebased on run -t ime constants. An exam ple of

this is t he program in Figure 1, whichsp ecializes mu ltiplication of a matrix byan

integer. The p oint er to the matrix, the sizeofthematrix, and the scale factorare

all run-t ime constants, which enables opt imizations such as dynamic

and strength reducti onofmult iplication.

2.3.3 Discussio n. W ithin an unquoted expression, vspecs and csp ecs cannot be

passed to fu nction s that exp ect t heir evaluationtyp es. The f ollowing co de is illegal:

void csp ec f intj;

intgi nt j;

void main  f

int vsp ec v;

void csp ec c1 = `f @f v; g;/ error: v is the wrong type /

void csp ec c2 = `f $gv; g;/ error: v is the w rongtype /

g

The storage class of a variable declared within the scop e of dynami c co de is de-

termined dynamically. Therefore, a variable of typ e T that is lo cal to a backquote

`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation  9

Category Name Synopsis

Man agement of dy namic c o de compile T *c ompilevoid csp e c co de , T 

free co de void free co d eT 

Dy namicvariab les lo ca l T vsp ec lo ca l T 

Dy namic func tion arguments pa ram T vsp ec pa ram T ,int param-num

pu sh init void csp ec pu sh initvoid

pu sh void pushvoid c sp ec args , T csp ec n ext -arg 

Dy namiccontrol ow lab e l void csp ec lab e l

jump void csp ec jumpvoid c spec ta rge t 

self T self T , o the r-args...

Tab leI. The `Cspecial forms. T den ote s a typ e.

expressionmust b e treated as typ e T vsp ec when used in an unquoted expression.

The example b elow illustrat es thi s b ehavior.

void csp ec f int vsp ec j ; / fmust ta ke an int vspec.../

void main  f

void csp ec c = `f intv;@f v; g;/ becausevisadyna mic local /

g

2.4 Sp ecialForms

`C extends AN SI C with several sp ecial forms. Most of t hese sp ecial forms take

typ es asarguments, and their result typ es somet imes dep end on their i nput typ es.

The sp ecial forms can b e broken int o four categories, asshown in Table I.

2.4.1 M anageme nt of Dynami c Code. The compile and f ree co de sp ecial forms

are used to create executable co d e from co de sp eci cations and t o deallo cate t he

storageasso ciated with a dynamic fun ction, resp ectively. compile generates

co de from code , and returns a p ointer to a f unction returning typ e T . It also

automatically reclaim s the storage f or all existi ng vsp ecsand csp ecs: as describ ed

in Sect ion3, vsp ecsand csp ecsare object s that track necessary pieces of program

state at environment binding and dynamic co de generation time, so they are no

longer needed and can be reclaimed after the dynamic co de has b een generat ed.

Some ot her dynamic compilation systems [Leone and Lee 1996; Consel and No el

1996] mem oize dynamic co de fragments: in th e case of `C, the lowoverhead required

to create co de sp eci cations, their generally small size, and th eir suscept ibility

to changes in th e run-time environment make memoization of csp ecs and vsp ecs

unattract ive.

fr ee co de takes as argument a fun ction p oint er to a dynami c function previously

created by compile, and reclaims the memory for that function. In t his way, it

is possible to reclaim the memory consumed by dynamic co de wh en the co de is

code to exp licitly no longer necessary. Th e programmer can use compile and f ree

manage memory used for dynamic co de, similarly to howone normally uses mallo c

and f ree.

10  Poletto , Hsieh, Eng ler, K aas hoek

/ Co nstru ct cspecto sum n integer argu ments.  /

void csp e c co nstruct sumintnf

int i, c sp ec c = `0;

for i=0;i< n; i++ f

int vsp ec v = paramint, i; / Create a para meter  /

c=`c+v;/ Add param 'v' to current su m /

g

return `f return c; g;

g

int csp e c con struct callintnargs,int arg vec f

sum5 , int ; intsum=compile const ruc t

init; / I nitia lize a rgum en t list / void c sp ec a rgs = push

inti;

for i=0;i< na rgs; i++ / For each a rginarg vec... /

push args, `$arg vec[i]; / p ush it onto th e a rgs stack /

return `suma rgs;

g

Fig. 2 . `C allows programmers t o c onst ruc t fun ctions wit h dyna mic numbers of arguments. c on-

su m crea te s a func tion that t akes n argument s and add s the m. c on stru ct call crea tes a cspec st ruct

th at invokes a d ynamic fun ction: it initializesanargu ment sta ckbyinvok ing p ush init , and dy-

namica lly add s a rgu ments t o this list by c alling p ush . `C allows a n argumentlist  an ob jec t of

typ e void csp ec tobeusedasasin gle argumentinacall: `sumargs calls su m using t he argument

list de not ed by args.

2.4.2 Dynami c Varia bles. The lo cal sp ecial f orm is a mechanism for creating l o cal

variables in dynamic co d e. The ob j ect s it creates are analogous to lo cal variables

declared in the b o dy of a backquote expression , but they can b e used across back-

quot e expressions, rath er than b eing restrict ed t o the scop e of one expression. In

addition, lo cal enables dynamic co de to haveanarbit rarynumber of lo cal variables.

lo cal retu rns an ob ject oftyp e T vsp ec th at denot es a dynamic localvariable of type

T in the current dynamic fu nction. In `C, t he typ e T may include one of twoC

storage class speci ers, auto and register: the former in dicates that the variable

should b e allo cated on the stack, while t he lat ter is a h int to the compiler that t he

variable should b e placedinaregister, i f p ossible.

2.4.3 Dynami c Functio n Arguments. The param sp ecial form is used to create

parameters of dynamic functions. param returns an ob ject of typ e T vsp ec that

denotes a formal parameter of the current dynamic fu nction. param-num is t he p a-

rameter's p osition in the f unction's parameter list, whereas T denotes its evaluation

typ e. As il lustrated in F igure 2, param can b e used to create a function that h as

the number of it s parameters determ ined at run time. Figure 3 shows how param

can b e used to curry funct ions.

Whereas param serves to create the formal paramet ers of a dynamic f unction,

push init and push are used toget her to d ynamically b uild argum entlist s for f unction

call s. push init returns a csp ec that corresponds t o a new in itially empty dynamic

argument list. push adds the co de sp eci cation for the n ext argument, next-arg ,to

the dynamically generated l istofarguments args . T ,theevaluation typ e of next-

arg , may not be void. These two sp ecial forms all ow the program mer to create

fun ction calls wh ich pass a dynamically determ ined number of arguments to the

`C and tcc: A Language and Compiler for Dynamic Co de Generation  11

typ e def intwrite pt rchar  ,int;

/ Create a function that cal ls "write" w ith"tcb"hard wired as its rst argu ment.  /

pt r mkwrit estruct tcb tc b f write

char  vsp ec msg=paramcha r , 0;

intvspec nbyte s = pa ramint, 1;

return compile`f return write $t cb, msg , nbyte s; g ,int;

g

Fig. 3 . `C c an b e u se d t o curryfu nct ion s by creat in g func tion paramete rs dynamica lly. In this

ex ample, t his func tionalityallowsanetwork conn ect io n co ntrol block t o b e hidden from clients,

but still enab le s o p erations on t he c onne ction  write ,inthis ca se t o b e pa rame teriz ed with p e r-

co nnec tion dat a.

invoked fu nction. Figure 2 illustrat es their use.

2.4.4 Dynami c C ontrol Fl ow. For error-checking purp oses, `C f orbids goto state-

ments from transfering control outside the enclosing backquote expression. Two

sp ecial forms, lab el and jump, are used for inter-cspec control ow: jump returns

the csp ec of a j ump to i ts argument, target . Target maybeany ob ject oftyp e void

csp ec. lab el simply retu rns a void cspec thatmay b e used as the destination ofa

jump. allows jumptarget to b e writt en as jump target. Sect ion 4.1.5

presents example `C co de that uses lab el and jump to implementsp ecialized nite-

state m achines.

Lastl y, self all ows recursive calls in dynam ic co de with out incurring t he overhead

of dereferencin g a f unction p ointer. T denotes the return typ e of the f unction

that is b ein g dynamically generated. Invoking self results in a call to the f unction

that contains the i nvocation, with oth er-args passed as the arguments. self is j ust

likeanyot her funct ioncall, except that the return typ e of the dynamic fun ctionis

unknown at environment bind ing t ime, so it must b e provided as the rst argu ment.

3. THE tcc COMPILER

The implementation of tcc was driven bytwogoals: high-qualitydynamic co de, and

low dynamic comp ilati onoverhead. `C allows the user to comp ose arb itrary pieces

of code dynamically, which reduces the e ectivenessofstatic analysis. As a result ,

many opt imizations on dynami c co de in `C can only be p erformed at run t ime:

improvements in code quality require more dynamic co de generati on time. The

rest of t his section discusses how tcc handles t his tradeo . Sect ion 3.1 describ es t he

structureoftcc, Sect ion 3.2 gives an overview of the dynamic compilation pro cess,

and Section 3.3 discusses in detail some of the machinery that tcc uses t o generate

co de at ru n t ime.

3.1 Architecture

The tcc compiler is based on [Fraser and Hanson 1995;1990], a p ortable compiler

for AN SI C. lcc performs common sub expression elimination wit hin ext ended b asic

blo cks, and u ses lburg [Fraser et al. 1992] to n d the lowest-cost implementation of

a given IR-level construct. Otherwise, it p erforms few opt imizations.

Figure 4 illustrates t he interaction of static and dynamic com pilationintcc. All

and semantic checking ofdynamic expressions o ccurs at static com pile t ime.

12  Poletto , Hsieh, Eng ler, K aas hoek

`C source

C

tcc front end

static code code specifications

tcc static tcc dynamic back end back end

Static compile time executable executable code executable code static code to bind environments to create of dynamic code dynamic code

code specifications are evaluated

closures for

Environment binding code specifications

Run time compile() is invoked

executable dynamic code

Dynamic code generation

answer

Fig. 4. Overview of the tcc compilat ion pro c ess.

Semantic checksare p erformed at t he level of dynamically generated expressions.

For each csp ec, tcc p erforms internaltyp e checking. It also tracks goto statements

and labels to ensure that a goto do es not transfer control outside the b o dy ofthe

containing csp ec.

Unli ke trad itional static compilers, tcc uses twotyp es of back ends t o generate

co de. One is the static back end, which compiles the non-dynamic parts of `C

programs, and emits eit her n ativeassembly co de or C co d e su itab le for compilation

by an . The other, referred t o as t he dynamic back end, emits

Ccodetogenerate dynamic co de. Once pro duced by th e dynam ic back end, this

Ccodeisinturn compiled by the static back end.

tcc provid estwo dynamic co de generation runtime systems so as to trade o code

generation sp eed for dynamic co de quality. The rst of t hese runt ime systems is

vcode [Engler 1996]. vcode provides aninterface resembli ng t hat of an idealized

load/store RISC architecture; each instruction in this is a C which

emits the correspond ing instruction or series ofinstructions for t he t arget archi-

tecture. vcode 's key f eatu re i s that it generates co de with lowrun-tim e overhead:

as few as t eninstructi ons p er generated instruction in the b est case. W hile vcode

generates code quickly, it only has access to lo cal in formation about backquote

expressions: t he quality of its co de could of ten b e improved. The second runt ime

system, icode, makes a di erent tradeo , and p ro duces b et ter co de at t he exp ense

`C and tcc: A Language and Compiler for Dynamic Co de Generation  13

of additional dynamic compilationoverhead. Rat her than emit co de in one p ass, i t

builds an d optim izes anintermediat e representation prior to co de generation.

lcc is not an opt imizing compiler. The assembly co de em itted by its tradit ional

static back ends is usually signi cantly slower even three or more times slower

than that em itted by optimizin g compil ers suchas gcc orvendor C compilers. To

improve the quality of static code emitt ed by tcc, we have implemented a static

back end t hatgenerates AN SI C from`Csource; this co de can t hen b e compiled by

anyopt imizing compiler. lcc's traditi onal b ack ends canthus b e used when static

com pilation must be fast i.e., during development, and the C back end can be

used when t he p erformance of t he co de is critical.

3.2 The Dynamic Compilation Pro cess

As describ edinSect ion 2, t he creati on of dynamic code canbedivi ded int o three

phases: static compi lation, environment binding, and dynamic co de generation.

This section describ es how tcc im plements t hese three p hases.

3.2.1 Static C ompi le T ime. Durin g static com pilation, tcc com piles the static

parts of a `C program ju st likeatraditional C compiler. It comp iles each dynamic

part{eachbackquote expression{toacode-generating function  CGF, whichis

invoked at run tim e to generate co de f ordynamic expressions.

In order to minimize t he overheadof dynami c compilation, tcc performs asmuch

as possible statically. W hen using vcode, both instruction

selection basedonop erand typ esand csp ec-lo cal register allo cation are done stati-

call y. A dditionally, t he intermediate representation of eachbackquote expression is

pro cessed by the common sub expression elimination and other lo cal optim izations

p erformed by th e lcc front end. tcc also uses copt [Fraser 1980] to perform static

p eephole opt imizations on the co de-generating m acrosused byCGFs.

Not all register allo cation and instruction selection can o ccur statically when

using vcode. For instance, it is not p ossible to determine stat ically what vsp ecsor

csp ecs will b e incorporated into other csp ecs when t he program is executed. Hence,

allo cation of dynamic lvalu es vsp ecs and of result s of comp osed cspecsmust be

p erformed dynamically. The same is true of variables or temp oraries that live

across references to other csp ecs. Eachread or writ e to one of these dynamically

determined lvalues is enclosed in a condi tional in the CGF : di erent co de is emitt ed

at ru n time, dep en ding on wheth er th e ob ject is dynamically allo catedtoaregister

or to m emory. Since the pro cess of instruction selection is enco ded in the b o dy of

the co de-generating function, it is inexp ensive.

When using icode , tcc do es not precomput e asmuchinformation abou t dynamic

co de generation. Rath er than emittin g co de d irectly,theicode macros rst bu ild

upasim ple intermediate representation; t he icode then analyzes

this representation to allo cate registers and perform other optimizations before

emitt ing co de.

State for dynamic code generation is maintai ned in CGFs and in dynamically

allo cated closures. Closures are data structures th at store ve kinds of necessary

inf ormation ab out the run-t ime environmentofabackquote expression: 1 a func-

tion p ointer to t he corresp onding stat ically generated CGF;  2 information about

inter-csp ec control ow i.e., whet her the backquote expression is th e destination

14  Poletto , Hsieh, Eng ler, K aas hoek

csp ec ti=closure 0 = closure0 t alloc closure 4 ,

c losure0!c gf = c gf0, / cod e gen func /

t closure0 ; c sp ec

csp ec t c =  c losure1 =  closure1 t alloc closure 16 ,

c losure1!c gf = c gf1, / cod e gen func /

c losure1!cs i = i, / nestedcspec /

j= j, / runtime cons t / c losure1!rc

k=&k ,/ freevaria ble / c losure1!fv

c sp ec t closure1 ;

Fig. 5 . Sample closure assign ments.

of a jump, 3 the values of run-t ime constants b oun d via the $ op erator; 4the

addresses offree variab les; 5 p ointers to the run-t ime representations ofthecsp ecs

and vsp ecs used inside the backquot e expression. Closures are necessary to reason

ab out comp osition and ou t-of-order sp eci cation of dynamic co de.

For eachbackquote expression, tcc stat ically generates b ot h its co de-generating

fun ction and t he co de t o allo cate and initialize closures. A new closure is initialized

each tim e a backquote expressionisevaluat ed. Csp ecsare represented bypointers

to closures.

For examp le, consider the following co de:

intj,k;

intcspeci=`5;

void cspecc=`f retu rn i+$jk; g;

tcc implements th e assignm ents to t hese csp ecsby assignm ents to pointers to

closures, as illustrat ed in Figure 5. i's closurecontains only a p ointer to its co de-

generat ing fu nction. c has more dep endencies on it s environment, so it s closure

alsostoresot her informati on.

Simp li ed co de-generat ing fu nctions for th ese cspecs app ear in Figure6. cgf0

allo cates a temp orary storage lo cation, generates code to store the value 5 into

it, and returns the lo cation. cgf1 must do a litt le more work: the co de that it

generates loads the value stored at the address of free variable k into a register,

multip lies it by the value of t he run-tim e constant j, adds this to t he d ynamic value

of i, and returns the result. Since i is a csp ec, the co de f or \the dynamic value of i"

is generated by calling i's co de generat ing fun ction.

3.2.2 Run T ime. At run tim e, the co de th at ini tiali zes closures and th e co de-

generat ing fu nctions run to create dynamic code. As illustrated in Figure4, this

pro cess consists of twoparts: environm ent bi nding and dynamic co de generation.

3.2.2.1 Environment Bi nd ing. Du ring environm ent b inding, co de suchasthatin

Figure 5 builds a closurethat captures the environmentof the corresp ondin g back-

quot e expression. Closures are -allo cated, but their allo cation costisgreatly

red uceddown to a p ointer increment, in the normalcase by using arenas [Forsythe

1977].

`C and tcc: A Language and Compiler for Dynamic Co de Generation  15

unsig ned int cgf0 c lo sure0 tcf

vspec t itmp0=tclo c al  INT ; / int temporary /

se ti itmp0, 5 ; / set it to5/

return itmp0; / return th e location /

g

t c f void cg f1  closure1

t itmp0=tclo c al  INT ; / some temporaries / vspec

t itmp1=tclo c al  INT ; vspec

ldii  itmp1,zero,c!fv k ; / ad dr of k /

j; / runtime const j / mulii  itmp 1,itmp 1,c!rc

/  now ap ply i's CGF to i's closu re: cspeccompos ition!  /

itmp0 =  c !cs i!cg f  c !cs i;

a ddi itmp1,itmp0,itmp1 ;

ret i  itmp1; / em itaretur n not return a va lue /

g

Fig. 6. Sample c o de ge nerating funct io ns.

3.2.2.2 Dyna mic Code Generati on. During dynamic co de generati on, t he ` C run-

tim e pro cesses the co de-generating funct ions. The CGF s use the inf ormationinthe

closures to generate co de, and they p erform various dynamic opti mizati ons.

Dynamic co de generation b egins when the compile sp ecial form is invoked on

a csp ec. Compile calls the co de-generating f unction for the csp ec on the csp ec's

closure, and the CG F p erforms most of th e actual co de generation. In terms of our

running example, the code int *f   = compilej , int; causes the run-t ime system

to i nvoke closure1!cgfclosur e1.

When the CG F returns, compile lin ks the resul ting co de, reset s the information

regarding dynamically generated lo cals and paramet ers, and ret urns apointer to

the generated co d e. Weatt empt t o minimize p o orcache b ehaviorbylaying out t he

co de in memoryatarandom o set mo dulo the i-cache size. It wouldbepossible to

track the placem ent of di erent dynamic funct ions to improvecache p erformance,

but wedonotdosocurrently.

Csp ec comp osition | the inl ining of co de corresp onding to one csp ec, b,into that

corresp onding t o another cspec, a, as describ ed in Section 2.3.1 | o ccurs du ring

dynamic co d e generation. This comp osition is im plemented simply byinvokin g b's

CGF f rom within a's CGF. If b returns a value, the value's lo cation is retu rned by

its CGF, and can t hen b e used by op erations within a's CGF .

The sp ecial forms forinter-csp ec control ow, jump and lab el, are imp lemented

eciently. Each closure, includ ing that of the empty void csp ec `fg, contains a eld

that m arkswhet her t he corresp onding csp ec is the destin ation of a jump. The co de-

generat ing fun ction checks t his eld, and if necessary, invokes a vcode or icode

macro to generate a lab el, wh ich is eventually resolved when t he runt ime system

links the co de. Asaresult, lab el can simply ret urn an empty csp ec. jump marks

the closure of the dest ination csp ec appropriately,an d t hen returns a closure that

containsapointertothedesti nati on cspec and t o a CGF that contains an icode

or vcode uncond itional branchmacro.

Generatin g ecientcodefrom comp osed csp ecsrequires optim ization analogous

to funct ion inl ining and inter-pro cedural op timization. Performi ng som e optim iza-

16  Poletto , Hsieh, Eng ler, K aas hoek

tions on th e dynamic co de aft er the order ofcomp osition of csp ecs has b een d eter-

mined can signi cantly improve co de quality. tcc's icode runtime system builds

up anintermediate representationan d p erforms some analyses b ef ore it generates

executab le co de. The vcode runtim e system, by contrast, optimizes for co de gen-

eration sp eed: it generates co de in just one pass, b ut can make p o or spill decisions

when there is register pressure.

Some dynamic optim izations performed by tcc do not dep end on the runt ime

system employed, but are enco ded directly in th e co de-generating fu nctions. These

optimization s do not requireglobalanalysis or other exp ensivecom putation , and

they canconsiderab ly improve t he quality of dynam ic co de.

First, tcc do es constantfoldi ng on run-time constants. The co de-generat ing func-

tions contain co de to evaluate anyparts ofanexpression that consist ofstatic and

run-t ime constants. The dynamically emitted instructions can then enco de these

values as im mediates. Simil arly, tcc p erforms simple lo cal strength red uctionbased

on run-t ime knowledge. For exampl e, the co de-generat ing functions can replace

multip lication by a run-tim e constant integer with a series of shift s and adds, as

describ ed in [ Briggs and Harvey1994].

In ad dition, the co de-generat ing f unctions automat ically p erformsome dynamic

lo op unrolling and dead co de elim inationbased on run-time constants. If the test

of a lo oporcondit ional is invariantat run time, or i f a lo opisbou nded byrun-t ime

constants, th encontrol owcanbedetermined at dynamic code generation t ime.

In addition, run-t ime constant information propagates down lo opnest ing levels: for

examp le, if a lo op induct ionvariable is b ou nded byrun-tim e constants, and it is in

turn used to b ound a nested lo op, t hen the ind uctionvariable of the nested lo op is

considered run-ti me constanttoo, within each unroll editeration of th e nested lo op.

This style of op timization , whichishard-co ded at static compi le t ime and p er-

formed dynamically,pro duces go o d co de without h igh dynam ic compilationover-

head. The co de t ransformations are enco ded in t he CGF, and do not dep end on

run-t ime data struct ures. Furt hermore, dynamic co de th at b ecomes un reachable at

run t ime do es notneed to b e generated, whichcan leadtofaster code generation.

3.3 Runtime Sy stems

tcc provid es two runtim e systems for generatin g co de. vcode emit s code lo cally,

with no global analysis or optimization. icode builds up an interm ediate repre-

sentation in order to supp ort more optimizations: in particular, b ett er register

allo cation.

These tworuntim e systems all owprogram merstocho ose th e appropriat e level of

run-t ime optimization. Th e choice is ap plication-sp eci c: it dep ends on the number

of times the co d e will b e used and on the co de's size an d structure. Programm ers

can select which runtime system to use when they compile a `C program.

3.3.1 vcode. Wh encodegenerationsp eed is more i mp ortant, th e user can have

tcc generate CGFs that use vcode macros, wh ich emit co de in one pass. Register

allo cation with vcode is fast and simp le. vcode provides getreg and putreg op-

erations: the former allo cates a machin e register, the lat ter frees it. If t here are

no unallo cated registers when getreg is invoked, it ret urns a spil ledlocation desig-

nated by a negati venumber; vcode macrosrecognize this numb er as a stacko set,

`C and tcc: A Language and Compiler for Dynamic Co de Generation  17

and emit the necessaryloads and stores. Clients that n d these p er-instruct ion if

statements to o exp ensivecan disable t hem: getreg is then guaranteedtoretu rn only

physical regist ernames and, if it cannot satisfy a request , it terminates th e program

with a run-t ime error. This meth o dology i s quite workable in situations where reg-

ister usage is not data-dep en dent , and the improvementincodegeneration sp eed

roughly a factoroftwo can m akeitworthwhile.

tcc statically emits getr eg and putreg op erations toget her with other vcode

macros in t he co de-generating f unctions: t his ensures t hat the register assignments

of one csp ec do not con ict with t hose of anoth er csp ec dynami cally comp osed wit h

it. However, e cient inter-csp ec regist er allo cation is hard, and t he placement of

these register management op erations can greatly a ect co de quality. For exam ple,

if a register is reserved  getreg'd across a csp ec comp osit ion p oint, it b ecom es un-

available f or allo cation in t he nested cspec and in all cspecs nested within it. As a

result , vcode could run out of registers and resort t o spills aft er on ly a few levels

of csp ec nesting. To help improvecodequality, tcc follows some sim ple heuristics.

First, expression trees are rearranged so that csp ec op erand s of instructions are

evaluated b efore non-cspec op erands. This mini mizes the number of temp oraries

that span csp ec references, and hen ce t he number ofregistersal lo cated by the CG F

of one csp ec during th e execut ionof the co de-generating f unction of a nested csp ec.

Secon dly, no registers are allo cated for the ret urn value of non-void csp ecs: the

co de-generating funct ionforacsp ec allo cat es t he regist er for storing its result , and

simply returns this register name to t he CG F of the enclosing csp ec.

Tofurther reduce th e overhead of vcode register allo cati on, tcc reserves a limit ed

number of physical registers. These registersare not all o cated by getreg, but instead

are managed at static compile tim e by tcc's dynamic back end . They canonly b e

used for valu es whose live ranges do not span comp osition wit h a cspec and are

typicall y employed for expression temp oraries.

As a result of these optimi zations, vcode register allo cation is quite f ast. How-

ever, if the dynamic co d e contain s large blo cks with h igh register pressure, or

if cspecsare dynamicall y combined in a waythat forces manyspil ls, co de quality

su ers.

3.3.2 icode. Wh en co de qualityismore imp ortant, the user canhave tcc gen-

erate CGFs that use icode macros, which generate anintermediate representation

on whichopti mizati ons can b e p erformed. For example, icode canperform global

register allo cation ondynamic co de m ore e ectively than vcode in the presence of

csp ec comp osition.

icode provides an interface similar to thatofvcode, with two main ext ensions:

1 an in nit e number of registers, and 2 primit ives to express changes in esti-

mated usage frequen cy of co de. The rst extension allows icode clients to emi t

co de that assumes no spills, leaving the work of global, inter-csp ec register allo-

cation to icode. The second allows icode to obtain estim ates of co de execution

frequency at low cost. For instance, prior to invoking icode macros that corre-

sp ond to a lo op body, the icode client coul d invoke ref mul 10: t his tells icode

that all variab le references o ccurring in the subsequentmacrosshould b e weighted

as o ccurrin g 10 times  anest imated averagenumber of loop iterations more than

the surrounding co de. Aft er emitt ing th e lo op b o dy, the icode client should i nvoke

18  Poletto , Hsieh, Eng ler, K aas hoek

a correspondi ng refdiv 10 macro to correctly weightcodeout sid e of the lo op. The

estimates obtained in this wayare usef ul for several opt imizations; they currently

provide approximate variable usage counts t hat help to guide register allo cation.

icode's intermediate representation is designed to b e compact two 4-byte m a-

chine words p er icode instruction and easy to parse in ordertored uce the overhead

of subsequentpasses. W hen compile is invoked in icode mo de, icode builds a ow

graph, p erforms register allo cation, and nall y generates execut able co de. Wehave

attempt ed to m inimize the cost ofeachof these operations. Webrie y discuss each

of them in turn.

3.3.2.1 Flo w G ra ph C onstructio n. icode buil ds a control- owgraph in one pass

after all CGFs have b een invoked. The owgraph is a single array that uses p ointers

for ind exing. In order to allo cat e all required memory in a single allo cation, icode

com putes an upp er b ou nd on the number of basic blo cks by summing t he numbers

of labels an d j umps emitted by ic ode macros. Af ter allo cating space f orananarray

of this size, it traverses th e bu er of icode instructions and adds basic blo cksto

the array in the same order in which they exist in the list of instruct ions. Forward

references are initially stored in an array of pairs of basic blo ck addresses; when

all t he b asic blo cks are buil t, the forward ref erences areresolved bytraversing this

array and linking the pairs of blo cks listed in it. As i t builds the owgraph, icode

also collects a min imal amount of lo cal data- ow information def and use sets

for each basic block. All memory management o ccurs through arenas [Forsythe

1977], which ensures low amortized cost f or memoryallo cation and essentially free

deallo cation.

3.3.2.2 Regi ster Allocatio n. Goo d register allo cation is the main b ene t that

icode provides over vcode. icode currently implements four di erent register

allo cation algorithms, which di erinoverhead and in the quality of the co de that

they pro duce: , linear scan, and a simple schem e based on est imated

usage counts.

The graph coloring allo cator implements a simpli ed version of Chai tin's algo-

rithm [Chaitin et al. 1981]: it do es not do coalescin g, bu t emp loys est imates of

usage counts to gu ide sp illing. The livevariable information used by t his allo cator

is obt ained by an i terative data- ow pass over the icode ow graph. Both the

liveness analysis and the register all o cation pass were carefu lly implemented for

sp eed, but their actual p erformance is inherently limit ed b ecause the algorit hms

were developed for static com pilers, and prioritize co de quality over compilation

sp eed. The graph coloring allo cator therefore serves as a b enchmark: it pro duces

the b est co de, b ut is relati vely slow.

At t he opp osite end of the sp ect rum is icode's simple \usage count" allo cator:

it makes n o attempt to pro duce part icularly good co de, but is fast. This allo ca-

tor ignores liveness information altogether: it sim ply sorts all variables in order

of decreasing estim ated usage counts, allo cates the n available registers to the n

variables with th e h ighest usagecounts, and places all ot her variables on t he stack.

Most dynamic co de created using `C is relatively small: as a result , despite its

simplicity, this allo cation algorit hm of tern p erforms just as well as graph coloring.

Lastl y, icode implements l inear scanregister allo cation[Polett o and Sarkar 1998].

This algorithm improves p erformance relativetograph colorin g: it do es n ot bu ild

`C and tcc: A Language and Compiler for Dynamic Co de Generation  19

LinearScanRegis terAlloc ation

active fg

foreach liveinte rval i,inorde r of increasing start p oint

ExpireOld Intervals i

if len gth active =R then

Spil lAtIntervali

else

register[i]  a registe r removed from pool of free registe rs

add i to active , sorte d by inc rea sing e nd p oint

ExpireOld Intervalsi

foreach int erval j in active ,in order o f inc re asing end p oint

if end point[j ]  startpoint[i] then

return

remove j from active

ad d register[j ]topooloffree registe rs

SpillAtInterval i

spil l  last inte rval in active

if endpoint[spil l] > endpoint[i] then

register[i]  register[s pil l]

locatio n[s pil l]  ne w st ack lo cat ion

remove sp il l from active

ad d i to active ,sorted byincre asing end p oint

else

locatio n[i]  new stack lo c ation

Fig. 7. Line ar sca n registe r allo c ation .

and color a graph, but rather assigns registerstovariab les in one p ass over a sorted

list of live interva ls. Given an ordering for example, linearlayout order, or depth-

rstorder of the i nstruct ions in a owgraph , a liveintervalofavariable v is t he

interval[m; n]suchthat v is notliveprior t o instruction m or af ter in struction n.

Once the list of liveintervals is com puted, allo cat ing R available registerssoasto

minim ize the number of spil ledintervals requires rem oving the smallest number of

liveintervals so th at no morethan R li veintervals overlap. The algorit hm skips for-

ward through the sorted list of liveintervals fromstart p oint to start p oint, keeping

trackof the set of overlapping intervals. When more than R intervals overlap, it

heurist ically spil ls the interval th at ends furthest away,and moves on to the next

startpoint. The algorithm app ears in Figure7,and is discussed in detail in [Poletto

and S arkar 1998].

icode canobt ain l iveinterval information in two di erentways. The rst m ethod

is simp ly to comput e live variable information by iterative analysis, as for graph

coloring,and to then coarsen th is information to one liveinterval p er variable. This

technique pro duces liveintervals thatareasaccurate as p ossible, but is not fast.

The second met ho d is considerably faster, b ut pro d uces slightly more conservative

intervals. The algorithm nds and top ologically sorts th e strongly connected com-

p onents  SCCs of the owgraph. If a variable is de n ed or used in an SCC, it is

assumed livethroughout the whole S CC. The liveinterval of avariab le therefore

20  Poletto , Hsieh, Eng ler, K aas hoek

stretches f rom the topologically rst SCC where i t app ears t o t he last. Likethe

linear scan algorithm , t his technique is analyzed in [Poletto and Sarkar 1998].

These di erentalgorithms allow icode to provide a varietyof tradeo s of compi le-

tim e overhead versus qualityof co de. Graph coloring is most exp en sive and usually

pro d uces the b est co de, linear scan is consid erably f aster but sometimes pro duces

worse code, and t he usage count allo catorisfaster th an linear scan but can pro-

duce considerably worse co de. However, given the relatively small size ofmost `C

dynamic co de, the algorith ms perform simi larly on the b enchmarks presented in

this pap er. A s a result, Section 5 presents measurements only forarepresentative

case, the linear scan allo cator using live intervals derived from f ull live variable

inf ormation.

3.3.2.3 Code Ge nerati on. The n al phase of co de generation wit h icode is t he

translation oftheintermediate representation int o executab le co de. The co de emit-

ter makes one passthroughtheicode intermed iate representation: it invokes t he

vcode macro that corresponds to each icode instruction, and prep ends and ap-

p end s spill co d e as necessary.

icode has severalhundred in structions the Cartesian pro duct of op eration kinds

and op erand typ es, so a for th e entire instruction set is quit e large.

Most `C programs, however, use only a small subset of all icode instructions. tcc

thereforekeeps track of the icode instructions used byanappl ication. It enco des

this usage i nformation for a given `C source le in d ummy symbol names in the

corresp onding ob ject le. A pre-linking pass t hen scans all the les ab out to be

linked and em its an addit ional ob ject le containin g an icode-to-binary translator

tailored sp eci cally to the icode macros present in the executable. This si mple

trick signi cantly reduces the size of the icode co de generator; for example, for

the b enchm arks presented in this pap er i t usually shrank the co de generators bya

factor of5or6.

3.3.2.4 Othe r Features. icode is designed to b e a generic f ramework for dynamic

co de optim ization: it is p ossible to exten d it wit h addit ional opt imization passes,

suchascopypropagation, common su b expression elim inati on, etc. However, pre-

liminary measurements indicate t hatmuch dynamic optimi zation b eyond register

allo cation is probably not pract ical: the increase in dynamic comp ile time is not

just i ed bysu cientimprovements in t he sp eed of the resulting co d e.

4. APP LICATIONS

`C is valuable in a number of practical set tings. The language can be employed

to increase performance through the use of dynamic co de generati on, as well as

to simplif y the creationof programs t hatcannot easily b e writ ten i n AN SI C. For

examp le, ` C can b e u sed to b uild ecient searching and sort ing routi nes, im plement

dynamic i ntegrated layer pro cessing f or highperformance network subsystems, and

create com piling interpreters and \just -in-tim e" comp ilers. This section presents

severalways in which`Cand dynamic co de generationcan help to solve practical

problems. Wehavedivi ded the examples into four broadcategories: sp ecialization,

dynamic fun ction call construction, dynamic in lining, and compilation. Many of

the appli cations describ ed b elow are also used for th e p erformance evaluation in

Section 5.

`C and tcc: A Language and Compiler for Dynamic Co de Generation  21

struct hte f /  Hash ta ble entry str ucture /

intval; /  Key that entry is as socia ted with /

struct hte  next ; /  Po inter to next entry  /

/ ... /

g;

struct ht f /  Hash ta ble stru cture /

int sc att er; /  Va lue usedtoscatter keys /

intnorm; /  Va lue used to norm alize /

struct hte hte; /  Vector o f po inters to h ash ta ble entries /

g;

/ Has h returns a po inter to the h ash ta bl e entry, if any, that ma tch esval./

struct hte ha shstruct ht ht, intval f

struct hte  hte = ht!hte [val  ht !sc att er / ht !norm];

while ht e && hte !val != val hte=hte!next ;

return hte ;

g

Fig. 8 . A h ash func tion writt en inC.

4.1 Sp ecialization

`C provides programmers with a general set of mechanisms to build co de at run

tim e. Dynamic co de generation can b e used t o hard-wire run-t ime valu es into t he

instruction stream, wh ichcan enable co de optim izations suchas strength reduction

and dead co de eli mination. In add ition, `C enables more unusual and complicated

op erations, such as sp ecializing a piece of co de t o a part icular input

f orexample, a given array or som e class ofdat a structuresfor examp le, all arrays

with elements of a given length.

4.1.1 Hash ing. A simple example of `C is th e optimization of a generic hash

fun ction, wh ere the t able size is determined at run t ime, and where t he f unction

uses a run-t ime value to help its hash. Consi der the C code in Figure8. The

C functi on has three values that can be treated as run-time constants: ht!hte,

ht!scatter , and ht!norm. As illustrated in Figure9, using `C to sp eciali ze the

fun ction for th ese values requires only a few changes. The resulting co de can be

considerab ly faster than t he equivalentCversion, b ecause tcc hard-co des the run-

tim e constants hte, scatter , and norm in the instruction stream, and reduces the

multip lication and division operations in strength. The cost of using t he resulting

dynamic f unction is an in direct jum p on a fun ctionpointer.

4.1.2 Vector Dot Product. Matrix and vect ormanip ulati ons such as dot pro du ct

provide m any opp ortuniti es for dynamic co de generation. They often involve a large

number of operations on values whichchangerelativel y inf requently. Matrices may

have run-time characteristics  i.e., largenumb ers of zeros and small integers that

can improveperformance of matrix op erations, bu t cannot b e exploited by static

com pilation techniques. In ad dition, sparsematrix techniques areonly ecientfor

matrices with a high degree of sparseness.

In the context of matrix multipli cation, dynamic code generation can remove

multip lication by zeros, and st rength-reduce mult iplicationby small integers. Be-

22  Poletto , Hsieh, Eng ler, K aas hoek

/ Type of the functio n generatedbymk ha sh: takes a val ue as inpu t

and produ cesaposs ibly nul l po inter to a h ash ta ble entry /

typ e def struc t hte  hp tr intval;

/ Co nstru ct a h as h functio n with the size, scatter, a nd h ash ta ble po inter ha rd coded.  /

h ashstruct ht ht f hptr mk

intvspec val = pa ramint, 0;

void c sp ec c o de = `f

struct hte hte=$ht!ht e[val  $ht !sc att er / $ht!norm];

while hte && hte!val !=valhte = hte !ne xt;

return hte;

g;

return compilecode, struct hte ; / Com pile and return the resul t /

g

Fig. 9. Sp e cializ ed hash fun ction writ ten in `C.

void dot int  a, int b, intnf

int sum,k;

for sum=k=0;k< n; k++ sum += a [k]b[k];

return sum;

g

Fig. 10 . A do t-pro d uct rout ine written in C.

cause co de for each row is created once an d then used once for each column, the

costs of co de generation can b e recovered easily. Consider the C co de t o compu te

dot pro duct in Figure10. At run tim e several optim izations can be employed.

For exam ple, the programmer can directly eliminate multiplication by zero. The

corresp onding `C co de app ears in Figure11.

The dot pro d uct writ ten in `C can p erform substantially b etter than its static C

counterpart. The ` C co de do es not emit co d e for multipl ications by zero. In addi-

tion, the `C comp iler can enco d e values as immediates in arithm et ic instructions,

and can reduce multiplications by the runtim e constant $row[k] in strength.

4.1.3 B inary Search . Figure 12 illustratesarecursive implementation of binary

search. We present a recursive version here for clarity: the static code used for

measurements i n Secti on 5 is a more e cient , iterativeversion. Nonet heless, even

the it erative implementation incurssome overhead due to lo oping an d b ecauseit

needs t o ref erence into the input array.

When th e input arraywillbesearched several times, however, one can use `C to

wri te co de likethat in Figure 13. The struct ureof this codeisvery similar to that

of the recursive binary search. By adding a few dynamic code generation primit ives

to t he original algorit hm, wehave created a fu nction t hatreturns a csp ec for binary

search that is tailored to a given input set. We can create a C f unction pointer

from t his csp ec:

typ edef intipintkey;

ip mksearchintn,int x f

`C and tcc: A Language and Compiler for Dynamic Co de Generation  23

void csp e c mkdot introw[], intn f

intk;

int  vsp ec c ol = p aramint ,0;/ Inp ut vector fo r dynamic function /

int c sp ec sum = `0; / Spec fo r sum o f prod ucts ; inital ly 0 /

for k=0;k < n ; k++ /  Only generatecode for n onzero multipl ications /

if row[k] / Specia lize on index of co l[k] a nd val ue of ro w[k] /

sum = ` sum + c ol[$k ]  $row[k];

return `f return sum; g ;

g

Fig. 1 1. `C c o de to bu ild a sp e cialize d dot-pro duc t rou tine.

int binint  x, intkey,intl,intu,intr f

intp;

if l > u return 1;

p=u r;

if  x[p] ==key return p;

else if x[p] < key return bin x, key, p+1, u, r/2 ;

else return binx, key,l,p1, r/2 ;

g

Fig. 12. A tail-recu rsiveimplementat io n of binary se arch.

void csp e c ge nint  x, int vsp ec key,intl,intu,intrf

intp;

if l > u return `f return 1; g;

p=u r;

return ` f

if $ x[p] == key return $p;

else if  $x [p] < ke y @g en x, key, p+1, u, r/2 ;

else @gen x, key,l,p1 , r/2 ;

g;

g

Fig. 13. `Ccodetocreate a \self-se arching" ex ecut able array.

24  Poletto , Hsieh, Eng ler, K aas hoek

typ e def dou bled ptr ;

dptr mkp ow intexpf

do uble vsp e c base = pa ram doub le, 0; / Argum ent: the bas e  /

do uble vsp e c result = lo c alregister doub le; / Local : ru nning prod uct  /

void c sp ec squ ares;

intbit=2;

/ Initia lize the running prod uct /

if  1&ex p squ ares = `f result=base ; g;

else squ ares = `f result=1.; g ;

/ Mu ltip ly som e m ore, if necessar y /

while  bit  exp  f

sq uares = `f @squ ares; base = base; g;

if b it & e xp sq uares = `f @squ are s; result = base ; g;

bit=bit  1;

g

/ Comp ile a functio n wh ichreturns the resul t  /

return compile`f @squa res; return result; g, d ouble ;

g

Fig . 14 . Co de t o c reateaspecialize d e xp on entia tion fun ction.

int vsp ec key=param int, 0; / One argument:th e key to search fo r /

retur n ipcompilegenx, key,0,n1, n/2, int;

g

In the resul ting co de, the values from the input array are hard-wi red into the

instruction stream, an d the lo op is u nrolled into a binarytree of nested if statements

that compare the value to b e found t o constants. As a resu lt, th e searchinvolves

neither loads from m emory norlooping overhead, so the dynamicall y constructed

co de is considerably more ecient than it s static counterpart. For small inpu t

vect ors on t he order of 30 elements, t his result s in lo okup p erformance sup erior

even to that of a hash t able.

4.1.4 Exponentia ti on. Another exam ple of tailoring co de to an input set comes

from graphics [Draves 1995], where it is sometimes necessary to apply

an exp onentiation f unction to a large data set. Traditionally, exp onentiation is

com puted in a lo op wh ich p erforms repeatedmult iplication and squarin g. Given a

xed exp onent, we can unroll this lo op and obtain straight-line co de t hatcontains

the m inimum number ofmultiplications. The `C co d e to p erform this optimization

app ears in Fi gure 14.

4.1.5 Fi niteState Ma ch ines. It is p ossib le to use `C to sp ecialize co de f ormore

com plex data than just arrays and prim itivevalues. For exam ple, tcc can compil e a

DFA descript ioninto sp ecialized co de, as shown in Figure15. The f unction mk dfa

accept s a dat a structure th at describ es a D FA wit h a unique start state and some

number of accept states: at each st ate, the DFA transitions to the next state

and pro duces one character of output based on the next charact er in it s input .

mk df a uses `C's inter-csp ec control ow prim itives, jump and lab el  Section 2.4,

`C and tcc: A Language and Compiler for Dynamic Co de Generation  25

typ e def struc t f

intn; / State numberstart state h as n=0 /

int a cce ptp ; / No nzero if this is an a ccept state  /

char  in; / I /O and next state info : on inpu t in[k] /

char  out ; / produce o utput ou t[k] and goto state  /

int  next ; / num ber next[k]  /

t; gsta te

typ e def struc t f

int size; /  Number of states /

t  states; /  Descrip tionofeach state  / sta te

t; gdfa

int mk dfad fa t dfa  cha r in, char ou t f

char  vsp ec in=paramcha r , 0; / I npu t to d fa /

char  vsp ec out = pa ramcha r ,1;/ Output bu er /

char vsp ec t = lo c alchar;

void csp ec la b els =  void csp e c mallo c dfa!size  sizeofvoid c sp ec ;

void csp ec co de = `fg ; / I nitia l l y d yna mic code isem pty  /

inti;

st ate s; i++ for i=0;i< dfa!n

la b els[i] = lab e l; /  Create labels to m ark each s tate /

st ate s;i++ f / For each sta te ... / for i=0;i< dfa!n

state t cur=dfa !sta te s[i];

intj=0;

co de = `f @c o de; /  ... p repend the code so fa r /

@lab e ls[i]; /  ... a dd the to ma rk th is state  /

t= in; g; /  ... read current input /

while c ur!in[j] f /  ... a dd code todoth e right thing if  /

code=`f @co d e; /  this is a n inpu t w e expect /

if t == $c ur!in[j] f

in++; ou t++ = $cur!out[j];

jump lab els[cu r!next [j]];

gg;

j++;

g

co de = `f @c o de; /  ... a dd code toreturn 0 if we're a t end  /

if  t == 0 f /  of inpu t in an a ccept state, o r /

if $c ur!ac cep tp return 0;

else return 2; /2 if we're in ano ther state  /

g / or 1ifnotra nsitio n and not en d of inpu t  /

else return 1; g;

g

return compile c o de, int ;

g

Fig. 15 . Co de to c rea te a hard-co de d nite stat e machine .

to create co d e that directl y implements the given DFA : eachstat e is imp lemented

byaseparate piece of dynamic co de, and state transit ions are simpl y condit ional

branches. Th e dynamic co de contains no references into the originaldat a structure

that describ es the DFA.

4.1.6 Swap . The examples so far have illustrated sp ecialization to sp eci c val-

ues. Value sp ecialization can imp rove performance, bu t it may be impractical if

26  Poletto , Hsieh, Eng ler, K aas hoek

typ e def void  fpvoid ,void ;

fpmk swap intsize f

long  vsp ec src = paramlon g , 0 ; / Arg 0 : source /

long  vsp ec d st = paramlong  , 1 ; / Arg 1 : d estina tion /

long v spec tmp = lo c allong ; /  Tempora ry for s wap s /

voidcspecs=`fg; / Code to be built up, initia l ly em pty /

inti;

for i=0;i< size/size oflon g; i++ / Buil d swa p code  /

s= `f @s; t mp = src[$i];src[$i] = dst [$i]; dst [$i]=tmp; g;

return fpc ompile s, void;

g

Fig. 16 . `Ccodetogenerate a specialized swap routine .

the values change to o f requently. A relatedapproach that do es not su er from this

drawback is sp ecialization based onprop erties, suchas size, of d ata typ es. For in-

stance, it is often necessarytoswap th e contents of tworegions of memory: in-place

sorting algorithm s are one such examp le. As lon g as the data b eing manipulated is

no largerthan a machine word, this pro cess is quite e cient. However, when m a-

nipulating larger regions f or example, C structs,thecodeisoft en inecient. One

waytocopy t he regions is t o invoke the C mem ory copyroutine, memcpy ,

repeatedly. Using memcpy incurs funct ion call overhead, as well overhead with in

memcpy itself. Anot her way is to iterati vely swap one word at a ti me, but this

metho d incurs lo op overhead.

`C allows us to create a swaprouti ne that is sp ecial ized to t he size of t he region

b ein g swapp ed. The co de in Figure16isan exam ple, simpli ed t o handle only t he

case where the size of the regionisamultip le of sizeof long . This rout ine returns

a pointer to a f unction that contains only assignm ents, and swaps the region of

the given size without resorti ng to lo oping ormultiple calls t o memcpy . The size

of the generated code will usuall y be rath er small, whi ch makes t his a pro table

optimization. Sect ion5 evaluates a `C heapsort imp lementati on in which the swap

routin e is customized to t he size of the ob ject s b eing sorted and dynamically in lined

into t he m ain sorting co de.

4.1.7 C opy . Copying a memory region ofarbitrary size is anoth er common op-

eration. An imp ortant application of this is compu ter graphics [Pikeetal. 1985].

Similar to t he previous co de f orswapping regions of memory, the example co de in

Figure17 ret urns a f unction customized for copying a memory region of a given

size.

copy takes twoargum ents: the size of the regions to b e copied, The p ro cedure mk

and t he numberof tim es that the i nner copying lo op should b e u nrolled. It creates

a cspec for a functi onthat takes twoarguments, p ointerstosource and destination

regions. It t hen creates p rologue co de t o copyregions whichwould not b e copied

by t he unrolled lo opifn mo d unrol lx 6= 0, and generates t he b o dy of t he un rolled

lo op. Finall y,itcomp oses these two csp ecs, invokes compile,and returns a p ointer

to t he resulting cust omized copyroutine.

`C and tcc: A Language and Compiler for Dynamic Co de Generation  27

typ e def void  fpvoid ,void ;

copyintn,intunrollx f fpmk

inti,j;

un signe d  vsp e c src=paramunsigne d , 0; / Arg 0 : sou rce /

un signe d  vsp e c dst = pa ram unsigned , 1 ; / Arg 1 : d estina tion /

intvspeck=localint; / Local: loop coun ter /

csp e c voidcopy=`fg, unrollbody=`fg;/ Code to build , initial ly empty /

for i=0;i< n  unrollx ; i++/ Unrol l the rema inder copy /

copy=`f @c opy; dst[$i] = src[$i]; g ;

if n  unrollx / Unrol l copy loop u nro l lx times  /

for j=0;j < unrollx; j ++

un rollb o dy = `f @unrollb o dy; dst[k+$j] = src[k+$j]; g ;

co py=`f / Co mpose rem ainder co py with the ma inunrol ledloop /

@co py;

for  k = $i; k < $n; k += $un rollx @unrollb o dy;

g;

/ Comp ile and retu rn a /

return fpc ompile co py,void ;

g

Fig. 17. G ene rat in g sp e cializ ed copycodein`C.

4.2 DynamicFunction Call Construction

`C allows programmers to generate funct ions and calls to fun ctions th at have

arguments whose number and typ es are not known at compile time. This func-

tionality dist inguishes `C: neither ANSI C nor any of the dynamic compilation

system s discussed in Section6 provid es mechani sms for construct ing f unction calls

dynamically.

A useful application of dynamic f unction call constructi on is the generation of

co de to marshal and unmarshal argum ent s stored in a byte vector. These op erations

arefrequently p erformed to sup p ortremote pro cedu re call [Birrell and Nelson1984].

Generatin g sp ecialized co de for the most active functions resul ts in sub stantial

p erformance b ene ts [Thekkath and Levy 1993].

We present two functi ons, marshal and unmarshal, that dynamically construct

marshaling and u nmarshaling co de, resp ectively, given a \format vector," typ es,

that sp eci es the typ es of arguments. The sample co de in Figure18 generates a

marshaling fun ction for argu ment s with a parti cular set oftyp es  in t his exam ple,

int, void *,and double. First , it sp eci es co de to allo cate storage for a byte vector

large enoughtohold the argu ment s describ ed by the type formatvector. Then, f or

every typ e in the type vector, it creates a vsp ec th at refers to the corresp onding

parameter,an d constructscodetostore th e parameter's value into the byte vector at

adist inct run -ti me constant o set. Finally,itsp eci es co de that ret urns a p ointer

to the byte vector of marshaled arguments. Af ter dynamic co de generation, the

fun ction that has been constructed will store all of its paramet ers at xed, non-

overlapping o set s in the result vector. Since all typ e and o set comput ations

28  Poletto , Hsieh, Eng ler, K aas hoek

typ e def union f int i; do uble d; void p; g typ e;

typ e def enum f INTEG ER, DOUBLE, POI NTER g typ e t; / Types w e expect to m arsh al  /

extern void allo c ;

marshaltyp e t  typ e s, intnargs f void csp e c mk

inti;

typ e  vsp ec m = lo cal typ e ; / Specofpointer to resu lt vector  /

voidcspecs=`f m= typ e allo c na rgs  sizeoftype; g;

for i=0;i< na rgs; i++ f / Add code to m arsh al each param /

switchtyp e s[i] f

case INTEGER: s = `f @s; m[$i].i = param$i,int ; g; break;

case DOUBLE: s= `f @s; m[$i].d = param$i, double ; g ; break;

case POINTER: s = `f @s; m[$i].p = param$i, void ; g; break;

g

g

/ Return code specto mars hal para meters a nd return result vector /

return `f @s; return m; g ;

g

Fig . 18 . Sample marsha ling co de in `C.

typ e def intfp tr ; /  Typeofthe fu nctio n w e wil l becal ling /

void csp e c mk unmarsha ltyp e t ty p es, int nargs f

inti;

fptr vsp ec fp=paramfpt r,0;/ Arg 0: the functio n to invoke  /

typ e  vsp ec m = paramtyp e , 1 ; / Arg1:the vector to u nmars hal /

void c sp ec a rgs = push init; / Initia lize the dynam icargument list /

for i=0;i< na rgs; i++ f / Buil d up th e d yna mic argu ment l ist  /

switchtyp e s[i] f

case INTEGER: push args,`m[$i].i; break;

case DOUBLE: push args,`m[$i].d ; break;

case POINTER: push args,`m[$i].p ; break;

g

g

/ Return code spectocal l the given function w ith u nma rsha ledargs /

return `f fp arg s; g ;

g

Fig. 19. Unmarshalin g c o de in `C.

havebeen done du ring environment binding, the generated co de will be ecient.

Further performance gain s could be achieved if the co de were to manage details

such as and alignment.

Dynamic generation of un marshaling code is equally usefu l. The pro cess relies

on `C's mechanism f orconstructin g calls to arbitrary funct ions at run ti me. It not

only improves e ciency, b ut also provides valuable f unctionality. For examp le, in

Tcl [Ousterhout 1994] the runtime system can make up calls into an application.

However, b ecause Tcl cann ot dynamical ly create co de to call an arbitrary f unction,

it marshals all of t he up call arguments into a single byte vector, and forces appli-

cations to explicitly unmarshal them. If system s suchas Tcl used `C to construct

`C and tcc: A Language and Compiler for Dynamic Co de Generation  29

up calls, clients would b e able to write t heir co de as normal C routin es, whichwould

increase t he easeofexpression and decrease t he chance for errors.

The co d e in Figure 19 generates an unmarshaling fu nctionthatworks wit h t he

marshaling code in Figure 18. The generated co de takes a funct ion p ointer as

its rst argument and a byte vector of marshaled arguments as its second. It

unmarshals the values in the byte vectorinto their appropriate paramet erpositions,

and then invokes t he funct ionpointer. mk unmarshal works as follows: it creates t he

sp eci cations for t he generated f unction's two incoming arguments, and init ializes

the argu ment list. Th en, for every typ e in the typ e vector, it creates a csp ec to

index into the byte vector at a xed o set and pushes th is csp ec into its correct

parameter p osition. Finally, it creates the call to th e funct ion p ointed to by the

rst dynamic argum ent.

4.3 Dynamic Inlining

`C makes it easy to in line and comp ose f unctions dynam ically. This feature is

analogous t o dynamic in lining t hrough ind irect fu nction calls. It improves p erfor-

mance by elim inat ing f unction call overhead and by creat ing the opp ortuni tyfor

optimization across fun ctionboundaries.

4.3.1 Pa rameterizedLibrary Functio ns. D ynam ic fu nction comp osi tionisusef ul

when writing and using library f unctions that would normally be parameterized

with f unction p ointers, such as many mathematical and st andard C lib rary rou-

tines. The `C co de for N ewton' s met ho d [Press et al. 1992] in Figure 20 i llustrates

its u se. The f unction newton takes asarguments t he maximu m allowednumber of

iterations, a tolerance, an init ial estimate, and two p ointers t o fun ctions th at return

csp ecstoevaluate a f unction and its derivative. In the calls f p0 and fprimep0 ,

p0 is passed asavsp ec argument. The csp ecsretu rned by these fun ctions are in-

corp orated directly int o th e dynamical ly generated co de. As a result, there i s no

fun ction call overhead, and inter-csp ec opt imizationcan o ccur du ring dynamic co de

generation.

4.3.2 Network P rotocol La yers. Another i mp ortantapp lication of dynamic inlin-

ing is t he op timizationof networking co de. The mo dular comp osition of di erent

proto col layers has long b een a goal in t he networking community [Clark and Ten-

nenhouse 1990]. Eachproto collayerfrequently involves data manipulation op era-

tions, suchaschecksumming and byte-swapping. Since p erformi ng mu ltiple data

manipulation passes is exp ensive, it is desirable to comp osethelayerssothat all

the data handling o ccurs in on e phase [ Clark and Ten nenhouse 1990].

`C can be used to construct a network subsystem that dynamically integrates

proto col data op erations into a single pass over memory  e.g., by incorp orating

encrypt ion and compression into a single copy operation. A simple design for

suchasystem d ivi des each data-manipulation stage into pipes th at each consume

a single in put and pro duce a single out put. These pip es can then b e comp osed and

incorporated into a data-copying lo op. The design includes t he abilitytosp ecify

prologue and epilogue co de t hat is executed b efore and aft er t he data-copying lo op,

resp ect ively. As a result, p ip es can manipulate state and make end-to-end checks,

such as ensuring that a checksum is valid af ter it has b een computed.

The pip e in Figure 21can be used to do byte-swapping. Since a byte swapp er

30  Poletto , Hsieh, Eng ler, K aas hoek

typ e def dou ble csp ec d ptr doub levsp ec ;

/ Dyn amica l ly create a NewtonRa ph son routine. n: m ax num berofiteratio ns;

tol : m aximum to lerance; p 0: initia l estim ate; f: function to so lve; fp rim e: d erivative of f. /

p0, dptr f, dpt r fprime f double n ewto nint n , do uble to l, dou ble usr

void c sp ec c s = `f

inti; dou ble p, p 0 = usr p0;

for i=0;i < $n; i++ f

p=p0 fp0  / fprime p0 ; /  Compose cspecs retur ned by f a nd fprim e  /

if abs p p0 < t ol return p; / Return resul t if we've co nverged eno ugh /

p0=p;/ Seed the next iteratio n /

g

error\ me tho d failed a ft er d ite rat ionsnn" , i;

g;

return compile cs,doub le  ; / Co mp ile,ca l l, and return the result. /

g

/ Function tha t constructs a cspectocom pute fx = x+1 ^2 /

double c sp ec fd ouble vsp e c x f return ` x + 1.0   x + 1.0 ; g

/ Function tha t constructs a cspectocalcu late f'x = 2x+1 /

double c sp ec fprime dou ble vsp ec x f return ` 2.0  x + 1.0; g

/ Ca l l newto n to sol veanequation  /

newt on void f printf \Ro ot isfn n", newt on 100, .0 000 01, 10., f, fprime ; g void use

Fig. 2 0. `C co de to c re ate and u serou tines t o use Newt on's me tho d fo r solving p o lyn omials. This

2

ex ample c omput es t he ro ot of t he fu nct ion f  x=x +1 .

unsig ned c sp ec byte swapu nsig ned v sp ec input  f

return ` input  24  j inpu t & 0x 00  8 j

 input  8 & 0 x 00  j inpu t  24 & 0x ;

g

/ ``Bytesw ap'' ma inta ins no sta te and so needs no initial or na l code /

initialvoid f return `fg; g void csp e c byt eswap

na lvoid f return `fg; g void csp e c byt eswap

Fig. 21. A samplepipe: bytes wap ret urns a csp ec for c o de th at byt e-swaps in pu t.

do es not need to maintain any state, t here is no need to specify init ial and nal

co de. The byte swapp er sim ply consists of t he \consu mer"routine that manipulates

the data.

To construct the integrated data-copying routin e, t he initial, consumer, and -

nal csp ecsof each pip e are comp osed wit h the corresp onding csp ecsof the p ip e's

neighbors. The comp osed init ial co de is placed at the beginning of the resulting

routin e; the consumer co de is inserted in a lo op, and comp osed with co de which

provides it wit h inpu t and stores it s outp ut; and t he nal co de is placed at the end

of the routin e. Asim pli ed versionwould lo ok li ke the co de fragmentinFigure 22.

In a mature implementation of this co d e,wecould f urt her improve p erformance by

unrollin g the data-copying loop. Addit ionally, pip es would take i nputs and out puts

of di erent sizes t hat th e comp osit ion f unction wou ld reconcile.

`C and tcc: A Language and Compiler for Dynamic Co de Generation  31

typ e def void c spec vpt r;

typ e def unsigne d csp ec u ptr unsigned c sp ec ;

/ Pipe structure: contains po inters to functions tha t return cspecs

fo r th e initializa tio n, pipe, a nd na lization cod e of each p ipe  /

struct pip e f

vp tr initial, n al; / initial a nd na l code /

up tr p ipe; / pipe /

g;

/ Return cspecthat resu ltsfrom compo sing the givenvector o f p ipes  /

void csp e c co mp osestruct pip e plist, intnf

struct pipe p;

intvspec nwords = paramint, 0; / Arg 0: input size /

un signe d  vsp e c input = p aram unsigne d , 1; / Arg1:pipein put /

un signe d  vsp e c ou tput = paramunsigne d , 2; / Arg2:pipe outpu t /

void c sp ec initial = `fg ,cspec nal = `fg;/ Prol ogu e a nd epilogue code /

un signe d csp e c pip es = `inpu t[i];/  Ba se p ipe inp ut  /

for p=&plist[0]; p < &plist [n]; p++ f / Compo se a l l stages together /

init ial=`f @initial; @p!initial; g;/ Com po se initia l statements  /

pipes =`p!pip e pip es; / Co mpos e p ipes: on e p ipe's ou tputisthe next one's inpu t  /

nal = `f @ nal; @p! nal ; g ;/ Com po se nal sta tements /

g

/ Create a fun ction w ithinitial s tatem ents rst, co nsumer statements

seco nd, an d nal statements l ast. /

return `f inti;

@init ial;

for i=0;i< nwords; i++ o utp ut[i] = pip es;

@ na l;

g;

g

Fig. 22 . `Ccodefor c omp osing dat a p ipes.

4.4 Compilation

`C's imp erative approach to dynami c co de generationmakes it well-suited for writ-

ing com pilers and compilin g interpreters. `C helps to make such programs both

ecient and easy to writ e: the program mer can f o cus on parsing, and leave the

task of code generationto`C.

4.4.1 Doma in-Speci c La ngua ges. Small, domain-sp eci c languages can b ene t

from dynamic compilation. The small query languages used to search

are one class of such languages [Kep p el et al. 1993]. Since databases are large,

dynamically compiled queries will usuall y b e applied many times, wh ichcaneasily

pay for the cost of dynamic co de generation.

query takesavect orof Weprovideatoy example in Figure23. Th e funct ion mk

queries. Each query contains the following elements: a record eld  i.e.,

CHILDREN or INCOME;avaluetocompare t his eld to; and the op eration to use

query dynamically in the comparison i. e., <; >; etc. . Given a query vector, mk

creates a query function whichtakes a database record asanargumentand checks

whether that record satis es all of the constraint s in th e query vector. This check

32  Poletto , Hsieh, Eng ler, K aas hoek

typ e def enum f INCOME, CHI LDREN /  ...  / g que ry; / Query type /

typ e def enum f LT, LE, G T, G E, NE, EQ g bool op ; / Comparison operation /

struct que ry f

qu eryrec ord e ld; / F ield to us e  /

un signe d val; / Val ue tocompareto /

op b o ol op; / Co mpar ison operation / bo ol

g;

struct re cord f int inc ome; intchildren; / ...  / g;/ Simp le da tabase record /

/ Function tha t takes a po inter to a d atabase record and returns 0 o r 1,

depend ing o n wh ether the recordmatch es th e query /

typ e def intip trstruct re cord r;

ipt r mk que ry struct que ry  q, intnf

int i, c sp ec eld, c sp ec e xpr = `1; / I nitia lize th e boolean exp ression  /

struct rec ord  vsp ec r = paramstruct rec ord , 0; / Record to examine /

for i=0;i< n; i++ f / Bu il d the rest of the boolean express ion /

switch q[i].re cord eld f /  Load th e a pp rop riate eld val ue  /

case INCOME: e ld = `r!inco me; break;

case CHI LDREN: e ld = `r!children ; break;

/ ... /

g

switch q[i].bool op f / Co mpare the eld va lue to runtime cons ta nt q[i] /

case LT: e xpr = `e xpr && eld < $q[i].val; break;

case EQ: ex pr = `ex pr && e ld == $q[i].val; break;

case LE: e xpr = `e xpr && e ld  $q[i].val; break;

/ ... /

g

g

return iptrc ompile `f return e xpr; g,int ;

g

Fig. 23. Compilat ion of a small qu ery lang uage .

`C and tcc: A Language and Compiler for Dynamic Co de Generation  33

enum f WHI LE, I F, ELSE, ID, CONS T, LE, GE, NE, EQ g; /  Multicharactertokens  /

int exp ect  ; / Consu me th e given tokenfrom the inpu t stream, o r fa il if not found /

int csp e c exp r; / Pars e u nary exp ressio ns /

int get to k ; / Consu me a token from th e inp ut stream /

intlo ok ; / Peek at the next tokenwithout co nsuming it /

t ok; / Current token / int cur

sym ; / Given a token, return correspo nding vspec  / int vsp ec lo okup

void csp e c stmt  f

int c sp ec e = `0; voidcspecs=`fg ,s1=`fg,s2=`fg ;

switch  get tok   f

case WHI LE: /  'wh ile' '' expr '' stmt /

exp e ct `'; e = exp r ;

exp e ct `'; s = stmt;

return `f whilee @s; g;

case IF: / 'if' '' exp r '' stmt f 'else' stmt g/

exp e ct `'; e = exp r ;

exp e ct `'; s1 = st mt ;

if lo o kELS E f

ge tt ok ; s2 = st mt ;

return `f if e  @s1; else @s2; g;

g else return `f if e @s1; g ;

case `f': / `f 'stmt `g '  /

while !lo ok `g 's=`f @s; @stmt ; g;

return s;

case `;': return `fg;

case ID: f / ID `=' expr `;' /

int vsp ec lvalue = lo okup symc ur to k;

exp e ct `=';e=expr ; exp ect `;' ;

return `f lvalue = e; g;

g

de fault: parse err\ exp e ct ing statement";

g

g

Fig. 24 . A sample stat ementparse r from a c ompiling int erprete r writt en in `C.

is implemented simply as an expression which comput es th e conjuncti on of the

given constraints. The query f unction never references the query vect or, since all

the values and comparison op erations in the vect orhave b een hard-co ded into t he

dynamic co de's instruct ion stream.

The dynamically generated co de exp ects one incoming argum ent, the database

record tobecomp ared. It then \seeds" t he b o olean expression: since weare build-

ing a conjuncti on, t he initial value is 1. The lo op then t raverses the queryvector,

and builds up the dynamic co de f or the con junction accordin g to the elds, values,

and comparisonoperations describ ed in the vector. W hen t he csp ec for the b o olean

expressionisconstructed, mk query com piles it and retu rns a funct ionpointer. That

optimized funct ioncan b e applied to databaseentriestodetermin e wheth er they

match t he given constraints.

34  Poletto , Hsieh, Eng ler, K aas hoek

4.4.2 C ompi ling Interpreters. Comp iling interpreters also known as JIT com-

pilers are im p ortant pieces of technology: they combine the exibility of an in-

terpret ed programming environment wit h th e p erformance of compiled co de. For

a given piece of co de, the user of acom piling interpreter paysaone-tim e cost of

com pilation, which can b e roughly comparable t o thatofinterpretation. Every sub-

sequent use of that co de emp loys the compiled version, whichcanbemuchfaster

than t he interpreted version. Comp iling interpreters are also useful in systems such

as , in which\just-in -t ime" comp ilersare common ly used.

Figure 24 contains a fragmentofcode forasi mple com piling interpreter writt en

in `C. Th is interpreter t ranslates a simple subset ofC:asitparses the program, i t

builds up a csp ec that represents it.

5. EVALUATIO N

`C and tcc supp ort ecient dynamiccodegeneration. In particular, the measure-

ments in t his section demonstrate the following results:

|By using `C and tcc,wecanachievegoodsp eedups relative to static C. Sp eedups

by a factoroftwotofour are common for the programs t hatwehave describ ed.

|`C and tcc do not imp ose excessiveoverhead on p erformance. The cost of dynamic

com pilation is usually recovered in und er100 runs of a b enchm ark; som etimes,

this cost can b e recovered in one run ofabenchmark.

|The tradeo between dynamic co de quality and dynamic code generati onsp eed

must b e made on a p er-app lication b asis. For some applications, it is b etter to

generat e co de faster; for ot hers, it is b etter to generate b et ter co de.

|Dynamic co de generation can result in large sp eed ups when it enables large-

scale optimization: when interpret ation can be eliminated, or when dynamic

inlini ng enables f urt her opt imization. It provides smaller sp eedups if only l o cal

optimizations, such as strength reduct ion, are performed dynamically. In such

cases, the cost of dynamic co de generationmay outweigh its b ene ts.

5.1 Exp erimental Metho dology

The b enchmarks that wemeasurehave b een describ ed in previous sections. Ta-

ble I I brie y summarizes each b enchmark, and lists t he section in whi ch it app ears.

The p erforman ce improvements ofdynamic co de generation hinge on customizing

co de to data. A s a result, the p erformance of all of t he b en chmarks in this section

is data-dep en denttosome d egree. In particular, the amount of co de generated by

the b enchmarks i s in some cases dep end ent on t he inp ut data. For example, since

dp generates co de t o comp ute t he dot pro duct of an input vector with a run-t ime

constantvector, the size of t he dynami c co de and hence its i-cache p erformance is

dep end enton the size of the run-time constantvect or. It s p erformance relativeto

static co de also dep ends on the densityof0sintherun -t ime constantvect or, since

those elements are opt imized out when generating the dynamic co de. Sim ilarly,

binary, and dfa generate m ore co de for larger input s, whichgenerally im proves their

p erformance relative to equivalent static co de until negative i-cache e ects come

into pl ay.

Some ot her b enchmarks { ntn, ilp,and query {involvedynamic f unction inlining

that is a ected by i nput d ata. For example, the co de inlin edinntn dep ends onthe

`C and tcc: A Language and Compiler for Dynamic Co de Generation  35

Benchmark Description Sec tion Page

ms Scale a 100 x100 mat rix by the inte ge rsin[10; 1 00] 2.3.2 8

hash Ha sh t able, c onsta nt ta ble size , scat ter value, a nd 4.1.1 22

hash tab le p oint er: one hit and one miss

dp Dot pro duct with a run-time c onsta ntvec to r: 4.1.2 23

len gth 40 , one -third zero e s

binary Bina ry search on a 16-element con stantarray; 4.1.3 23

one hit and one miss

pow Exp on entiat ion of 2 by the int ege rsin[10; 40] 4.1.4 24

dfa Finite st ate machine compu tat ion 4.1.5 25

6 st ate s, 13 transitions, input leng th 16

he ap He apsort, paramete rize d with a sp ec ialized swap: 4.1.6 26

500-entryarrayof12-by te struct ures

mshl Marshal ve arguments into a byte vector 4.2 28

un mshl Unmarsha l a byte vect or, and ca ll a funct ion of 4.2 28

ve arguments

2 9

nt n Ro ot of f x=x +1 to a toleranc e of 10 4.3.1 30

ilp Int egrat ed copy,check sum,byt eswap of a 16 KB bu e r 4.3.2 31

qu ery Qu ery 2 000 reco rds wit h se ven binary c omparisons 4.4.1 32

Table I I. De script ionsofbenchmarks.

fun ction tobecom puted, that in ilp on the nature of the proto col stack, and that

in query on t he typ e of query submit ted by the user. The advantage of dynamic

co de over static co de i ncreases with the opp ort unity for inlining and cross-f unction

optimization. For example, an ilp proto colstack comp osed from many small passes

will p erform relatively b etter in dynamic co de that one comp osed from a few larger

passes.

Lastl y, a few b enchmarks are relatively data-indep endent. pow, heap, mshl, and

unmshl generate varying amounts of co de dep ending, resp ectively, on t he exp onent

used, or t he typ e and sizeoftheob ject s b eing sorted, marshaled or unmarshaled,

but the di erences are small for m ost reasonable i nputs. ms obtains p erformance

improvements byhard-wiring lo op b ounds and strength-reducing mult iplicationby

the scale factor. hash makes similar optimizations when comput ing a hash f unction.

The values of run-time constants maya ect p erformance to some degree for ex-

ample, excessively large constants are not u sef ul for this sortofop timization, but

such e ects aremuchsmaller than t hose of m ore large-scale dynamic optim izations.

Each b enchm ark waswritt en b oth in `C and in static C. Th e `C programs were

com piled with b oth the vcode and the icode-based tcc back ends. When measur-

ing the p erformance oftheicode runtim e system, wealways em ployed li nearscan

register allo cation with live intervals derived from livevariable inf ormation. The

static C programs werecom piled b oth wi th the lcc compiler and wit h the G NU C

com piler, gcc. The co de-generat ing f unctions used for d ynamic co de generation are

created from the lcc interm ediate representation, using that compiler's co de gener-

ation strategies. As a result , t he p erformance of lcc-generat ed co de should b e used

as the baseline to measure t he impact of dynamic co de generation. Measurements

coll ect ed usin g gcc serve to compare tcc to an optim izing com piler of reasonable

quality.

The machin e used for m easurements is a Su n Ultra 2 Mo d el2170 workstation wit h

36  Poletto , Hsieh, Eng ler, K aas hoek

384MB of main memoryand two168 MHz U ltraSPARC-I CPUs. Th e UltraSPARC-

I can issue up to 2 integer and 2 oating p oint instructions per cycle, and has a

wri te-through , non-allo cat ing, direct -mapp ed , on-chip 16KB cache. It implements

the SPARCversion9architecture [SPARCInternational1994]. tcc also generates

co de f or t he M IPS family of pro cessors; werep ort only SPARC measurements f or

clarity,sin ceresult s on t he twoarchitectu res are similar.

Times were obtained bymeasuring a large number oftrials | enough t o provide

several seconds ofgranularity, wit h negligibl e stand ard deviations | using the

getr usage system call. The number of trials varied from100 to 100000, dep endin g on

the b enchmark. The result ing t imes were then divided by the number of iterations

to obt ain t he average overheadof a single run. This formofmeasurement ignores

the e ect s of cache re ll misses, but is representative of how these app lications

would likely b e used for exam ple, in t ight inner lo ops.

Section 5.2 discusses t he p erformance e ects of using d ynamic co de generation:

sp eci cally, the sp eedup of dynam ic co de relative to static co de, and the overheadof

dynamic co d e generation relativetosp eedup. Section 5.3 presents break downs of

the dynamic compilation overhead ofboth vcode and icode in u nits of pro cessor

cycles p er generated instruction.

5.2 Per formance

This section shows that tcc provides low-overhead dynamic co de generation, and

that it can be used to sp eed up a number of b enchmarks. We describ e results

for th e b enchm arks in Table I I and for xv , a freely available image manipulation

package.

Wecomput e the sp eedup due to dynamiccodegenerationby dividing the t ime

required to run the static co de by the time required to run the corresp onding

dynamic co de. We measure overhead bycalculating each b enchmark's \cross-over"

p oint, if one exists. This p ointisthenumber of tim es that dynamic co de must b e

used so that the overhead of dynamic co de generation equals th e tim e gained by

running t he dynamic co de.

The performance of dynamic co de is up to an order of magnitude b etter than

that of unoptimi zed st atic co de. In many cases, the p erformance imp rovementof

using dynamiccodegeneration can b e amortized over fewer than ten runs of t he

dynamic co de. The b enchmarks that achieve the highest sp eedu ps are those in

whichdynamic informati onallows the moste ectiverestructuring of co de relative

to the static version. The main classes of such b enchmarks are num erical co de

in which part icularvalues allowlargeamounts of work to b e optimized awayfor

examp le, dp , co de in which an exp ensivelayer of data structure interpretation can

be removed at run tim e f or examp le, query, and code in which inlinin g can be

p erformed dynamically but not stat ically for examp le, ilp.

vcode generates co de app roximately three to eight times more quickly than

icode. Nevertheless, the co de generated by icode canbeconsiderably f aster than

that generated by vcode. A programmer can cho ose between the two systems

to trade co de quality for co de generation sp eed, dep ending on the needs of the

application.

`C and tcc: A Language and Compiler for Dynamic Co de Generation  37

icode-lcc vcode-lcc 10 icode-gcc vcode-gcc

5 Speedup (static time / dynamic time)

0 ms hash dp binary pow dfa heap mshl unmshl ntn ilp query

Benchmark

Fig. 25. Sp ee dup of dy namic c o de over sta tic co de .

5.2.1 Speedup. Fi gure 25 shows that usin g ` C and tcc improves t he p erformance

of almost all of our b enchmarks. Both in this gure and in F igure 26, the legend

indicates which stat ic and dynamic com pilersare b ein g compared. ico de-lcc com-

pares dynami c co de created with icode to static co de compiled wit h lcc; vco de-lcc

compares dynami c co d e created wit h vcode to static co de compiled with lcc. Sim-

ilarly, ico de-gcc compares icode to static code compiled with gcc, and vco de-gcc

compares vcode to static co de compiled with gcc.

In general, dynamic co de is signi cant ly faster than static co de: sp eed ups bya

factor oftworelativetothebest co de emitt ed by gcc arecommon. Unsurprisingly,

the co de pro duced by icode is faster than that pro duced by vcode, by up to

50 in some cases. Also, the GN U compiler generates b ett er code th an lcc, so

the sp eedups relative to gcc are almost always smaller than th ose relative to lcc.

As mentioned earlier, however, the basis for comp arison should be lcc, since the

co de-generating f unctions are generated byan lcc-style back end, whichdoesnot

p erform static opti mizati ons.

Dynamic code generation do es not pay o in only one b enchmark, unmshl. In

this b enchmark, `C provides f unctionality that d o es n ot exist in C. The static code

used f orcomparison implements a sp ecial caseof the general functionality provided

by t he `C co de, and it is verywell tu ned.

5.2.2 Cro ss-o ver. Figure 26 indicates that the cost ofdynamic co de generation

in tcc is reasonably low. The cross-over p oint on the vertical axis is the number

of times that the dynami c co de must b e usedinorder for the total overheadofits

38  Poletto , Hsieh, Eng ler, K aas hoek

10000 icode-lcc vcode-lcc 1000 icode-gcc vcode-gcc 100

10

1

0.1

0.01 Cross-over point (number of runs, log scale)

0.001 ms hash dp binary pow dfa heap mshl unmshl ntn ilp query

Benchmark

Fig. 2 6. Cross-over p oints, in numberofruns. I f the cross-over p oint do e s not exist, th e ba r is

omitt ed.

com pilation and uses to be equal to the overhead of the same number of uses of

static co de. This numberisameasure of how quickly dynamic co de \pays for itself."

For all b enchmarks except query, one u seofdynamic co de corresp ondstoone run

of th e dynami cally created function. In query,however, t he dynam ic co de is used

as a small p art of t he overall algorit hm: it is t he test function u sed to determine

whether a recordinthedatabase matches a particular query. As a result, in that

case we de ne one use of the dynamic co de to b e one run of the searchalgorith m,

which corresponds to many invo cations one per database entry of the dynamic

co de. This met ho dology realistically measures how sp ecializationisused i n these

cases.

In the caseof unmshl, the dynamiccodeisslower than th e stat ic one, so the cross-

over point never occurs. Usually, however, the p erformance bene t of dynamic

co de generati on o ccurs after a few hundred or fewer runs. In some cases ms,

heap, ilp, and quer y, the dynamic co de pays for itself af ter only one run of the

b enchmark. In ms and heap, this o ccurs b ecause a reasonable probl em size is large

relative to the overheadofdynamic compilation, so even small improvement s in run

tim e  from strength reduction, lo op unrolling, and h ard-wiring p ointers outweigh

the co de generation overhead. In addit ion, ilp and query exemplif y the typ es of

applications in which dynamic code generation can be most useful: ilp b ene ts

fromext ensive dynamic fu nction inlining th atcannot b e p erformed statically, and

query dynamically removes alayer of interpret ation inherent in a database query

language.

Figures 25 and 26 show how dynamic compilation sp eed can be exchanged for

`C and tcc: A Language and Compiler for Dynamic Co de Generation  39

Convolutionmask pixe ls Times seconds

lcc gcc tcc  icod e DCG overhe ad

3

3  3 5.79 2.44 1.91 2:5  10

3

7  7 17.57 6.86 5.78 3:5  10

TableIII. Performance o f convolution on an 11 52x9 00 image in xv.

dynamic co de quality. vcode can b e used t o p erform fast, one-pass dynamic co de

generation when the dynamic co de will not used very much. However, the co de

generated by icode is of ten consid erably faster than that generated by vcode :

hence, icode is useful when the dynamic co de is run more times, sothat t he co de's

p erformance is more imp ortant than the cost of generating it.

5.2.3 xv . To test the p erformance of tcc on a relatively large app lication, we

mo di ed xv to use `C. xv is a pop ular image manipulation package that consists

of approximat ely 60000 lines of co de. We picked one of it s imagepro cessing algo-

rithms and changed it to make use of dynamic co de generation. One algorithm is

sucient, since most of the algorithms are implem ented similarly. Th e algorith m,

Bl ur, applies a convolu tion matrix of user-de ned size that consists of all 1's to

the source image. The original algorithm was i mplemented ecient ly: the values

in the convolut ion matrix are known statically to be all 1's, so convoluti on at a

p oint is simp ly th e average of the imagevalu es of neighboring p oints. Nonetheless,

the inner lo op contains image-b oundarychecks based on run-time constants, and

is b ou nded by a run-t ime constant , t he size of t he convol ution m atrix.

Results from this experim ent app ear in Table I I I. For b oth a 3  3 and 7  7

convolut ionmask, the dynamic co d e obt ained using tcc and icode is approximately

3 tim es as fast as th e static co de created by lcc,an d approximat ely 20 fast er than

the static co de generated by gcc with all optim izations t urned on. Im p ortantly,the

overhead of dynamic co de generation is almost3ordersofmagnitud e less than t he

p erformance b ene t it provides.

xv is an exam ple of th e usefuln ess of dynamic co de generation i n t he context ofa

well-known app lication program. Two factors makethisresult signi cant. First , t he

original static code wasqui te well-tuned. In addition, the tcc co d e generator that

emits the co de-generating codeisderived f rom an lcc code generator: as a result ,

the default dynamic co de, barring any dynam ic optimizations, is considerably less

well-t uned t han equivalent co de generated by the GNU compi ler. D espite all this,

the dynamic code is faster than even aggressively optim ized static co de, and the

cost of dynamic co de generation is insigni cantcompared to the b ene t obtained.

5.3 Analysis

This sect ion analyzes the co de generation overhead of vcode and icode. vcode

generates code atacost of approximately 100 cycles p ergeneratedinstruction. Most

of this time is taken up by register management; just laying out t he inst ructions

in memory requires much less overhead. icode generates code roughly three to

eight tim es more slowly than vcode. Again, much of this overhead is du e to

register allo cation: the choice of register allo cator can signi cantly in uence icode

p erformance.

40  Poletto , Hsieh, Eng ler, K aas hoek

250 Code generation Environment binding 200

150

100

50 Cycles/generated instruction

0 ms hash dp binary pow dfa heap mshl unmshl ntn ilp query

Benchmark

Fig. 27. Dyn amic c o de ge neration overhead using vcod e.

5.3.1 vcode Overh ead. Figure 27breaksdown the co de generation overhead of

vcode for eachof t he b enchmarks. The vcode back end generatescodeatapp rox-

imately 100 cycles p er generated instruction: t he geometric mean oftheoverheads

for t he b enchmarks i n this paper is 119 cycles p er i nstructi on. The costofenviron-

ment binding is small | almost all t he t ime i s sp entincodegeneration.

The co de generation overhead has several comp onents. The breakdown is dicul t

to measure precisely and varies slightly from b enchmark to b enchmark, but there

aresome broad patterns:

|Laying in structi ons out in m emory  bitwiseoperations t o construct instructions,

and stores t o write them to memory accounts f orroughly 15oftheoverhead.

|Dynamically allo cating memory for the co de, linking, delay slot optim izations,

and p rologue and epilogue co d e add approximately anot her 25.

|Regist er managementvcode's putreg/getr eg op erations accounts forabout 50

of the overhead.

|App roximately 10 of th e overhead is due t o other artifacts, suchascheckson

the storage class of dynamic variables, the overhead of calling co de generating

fu nctions, etc.

These result s indicate that dynamic register allo cation, even in the minimal

vcode implementation, is a ma jor source of overhead. This cost is unavoidable

in `C's dynamic code comp osit ion mo del; systems that can stati cally allo cat e reg-

isters for dynamic co de shoul d t hereforehaveaconsiderable advantageover `C in

terms of dynam ic compile-time p erformance.

5.3.2 icode Overh ead. Figure 28 breaks down the co de generation overhead of

icode for eachof the b enchmarks. Foreach b enchmarkwerep ort two costs. The

colu mns labeled L represent the overhead ofusing icode with linear scan register

`C and tcc: A Language and Compiler for Dynamic Co de Generation  41

Code generation Live interval construction 1000 Setup/FG construction L IR layout L Environment binding L L L L L L L L L U 500 U U L U U U U U Cycles/generated instruction U U U U

0 ms hash dp binary pow dfa heap mshl unmshl ntn ilp query

Benchmark

Fig. 2 8. Dyn amic co de gen era tion overhea d using icod e. Columns lab eled L den ote icod e with

line ar sca n reg iste r allo cat ion; th ose lab eled U te ico de with a simple allo c at or base d o n

usag e c ounts.

allo cation b ased on precise livevariable inf ormation. The column s lab eled U repre-

sent the overhead ofusing icode with t he sim ple allo cator that places the variables

with th e highest usage counts in registers. Both algorith ms are describ ed in Sec-

tion 3.3.2.2. For each b enchmark and typ e of register allo cation, we rep ort the

overhead due to environment b inding, laying out th e icode IR , creat ing t he ow

graph and doing some setu p  allo cating m emory f or the co de, init ializing vcode ,

etc., p erform ing various ph ases of register allo cation, and nally generating co de.

icode's code generation sp eed ranges from about 200to800 cycles p er in struc-

tion, dep endin g on the b enchm ark and t he type ofregist er allo cation. The geomet-

ric m ean of the overheads f or the b enchmarks in this pap er, when using li nearscan

register allo cation, is 615 cycles p er instruction.

The allo cator based on usage count s is consid erab ly fast er than linear scan, b e-

cause it do es not have to compute live variables and live intervals, and do es not

need to buil d a complete ow graph. By contrast , the traditional graph coloring

register allo cator not shown in the gure is generally over twice as slow as the

linear scan allo cator. Graph coloring is a useful reference algorit hm, b ut it i s not

practical f or dynamic co de generation: linear scan is faster and produces co de that

is usually just asgood. At t he other ext reme, the usage count allo cator is f aster

than li near scanand of ten makes go o d allo cation decisions on small b enchmarks;

however, it sometim es pro duces verypoor co d e  for exam ple, dfa and heap. As a

result , lin ear scan is t he default allo cator for icode,an d the one f or whichwe show

p erformance results in F igures 25 and 26.

42  Poletto , Hsieh, Eng ler, K aas hoek

In addit ion to register allo cation and related liveness analyses, the main sources of

overhead are owgraph construction and co de generation. The lat ter roughly cor-

resp onds t o the vcode co de generation overhead. Environment b inding and laying

out the icode intermediate representation are relatively inexp ensiveoperations.

6. RELATED WORK

Dynamic co de generati on has a long history [Kepp el et al. 1991]. It has b een used

to increase the p erformance of operating syst em s [Bershad et al. 1995; Engler et al.

1995; Pu et al. 1995; Puetal. 1988], wind owing operation s [Pikeetal. 1985], dy-

namically typ ed languages [ Chamb ers and Ungar 1989; Deutsch and Schi man

1984;Holzle and Ungar 1994], and simulators [W itchel and Rosenblum 1996;Veen-

stra and Fowler 1994]. Researchon`Cand tcc grew out of work on DCG [Engler

and Pro ebsti ng 1994], a low-level dynamic co d e generation system. Earli er descrip-

tions of the `C languageand tcc have b een published elsewhere [Engleretal. 1995;

Poletto et al. 1997].

Other languages also provide the ab ility t o create co de atrun ti me. For exam ple,

most Lisp dialects [Kelsey et al. 1998; Steele Jr. 1990] , Tcl [Ousterhout 1994], and

Perl [Wall et al. 1996], provide an \" op eration that allows co de to b e generated

dynamically. Thi s approach i s extremely exible but , u nfortun ately, comes at a high

price: since these languages are dynamically typ ed, lit tle co de generationcost can

b e pushed to compile time.

Kep p el addressed some of the issues in dynamic co de generation[Kep-

pel 1991]. He d eveloped a portable system for mo difying instruction spaces on a

varietyofmach ines. Hi s system dealt wit h the di culties presented by caches and

op erating system restrictions, but it did notaddress h ow to selectan d emit actual

binary instruct ions. Kepp el, Eggers, and Henry [Kepp el et al. 1993] demonstrated

that dynamic code generation can b e e ectiveforseveral d i erent applications.

There has b een much recent work on sp ecial ization and run-tim e compilation

in C. Unl ike `C, whichtakes an imp erative approach t o expressing d ynamic co de

generation, and requires the programmer to exp licitly manipulate dynamic co de

ob j ects, most of these systems adopt a declarative approach. In this mo del, the

programmer annotates the C with direct ives that identif y run-t ime

constants and p ossib ly sp ecify various co d e generation p olicies, such as the ag-

gressiveness of sp ecialization and the extent to which dynamiccodeiscached and

reused. D ynamic co de generation h app ens automatically in suchsystems.

One such system hasbeen develop ed at the Uni versityofWashington[Auslan-

der et al. 1996; Grant et al. 1997]. The rst UW compiler [Auslander etal. 1996]

provided a limited set of annotations and exh ibited relatively poor p erformance.

That system p erforms data- ow analysis to discover all derived run-tim e constants,

given the run -ti me constants sp eci ed by the programm er. Th e second syst em,

DyC [Grantetal. 1997], provides a more expressive annotationlanguage and sup-

port for several features, including polyvariant division which allows th e same

program point to be analyzed for di erent combinations of run -ti me invariants,

p olyvariant sp ecialization  whichallows t he same program p oint to b e dynamically

com piled multiple t imes, each sp ecialized to di erent values of a set of run-t ime

invariants, lazy sp ecialization, and interpro cedural sp ecial ization. These f eatures

allow the syst emtoachieve levels of functionality simil ar t o `C, but in a com pletely

`C and tcc: A Language and Compiler for Dynamic Co de Generation  43

di erent style of program ming. DyC do es not provide mechanisms for creating

fun ctions and function calls with dynamically determined numbers of arguments.

For si mple forms of sp ecialization, DyC somet imes generates code more qui ckly

than tcc using vcode. For morecom plex f orms ofspecializationsuch as generat-

ing a compiling interpreter fromaninterpreter, DyC is approximat ely as fast as

tcc using icode.

Another aut omatic dynamic co de generation system driven by user annotations

is Temp o [Consel and No el 1996]. Temp o is a template-based dynamic compiler

derived from G NU CC. It is si milar t o D yC, but provides supp ort for only function-

level p olyvariant division and sp eciali zation, and do es n ot provide mean s of setting

p olicies f or division, sp ecialization, caching, and sp eculative sp ecializati on. In ad-

diti on, it do es not supp ort sp ecializationacross separate source les. U nlike DyC,

however, it p erforms conservative aliasand side-e ect analysis t o identify partially

static data structures. The p erformance data i ndicates that Temp o's cross-over

p oints tend t o b e slightly worsethan D yC, but th e speed ups are comparable, which

indicates that Temp o generates code ofcomp arable quality, but more slowly. Since

Temp o do es not support complex sp ecializati on mechanisms, though, its expres-

siveness is weaker thanthat of DyC and `C. The Tem p o pro ject has targeted `C as

a back end for its run-tim e sp ecializer.

Fabius [Leone and Lee 1996] is a dynamic compil ation system based on partial

evaluation thatwas developed in t he context ofapurely f unctional subset of ML.

It uses a syntactic form of to allow the programmer to express run-t ime

invariants. Given th e hi nt s regardin g run-t ime invariants, Fabius p erforms dynamic

com pilationand op timizationautomatically. Fabius achieves fast co de generation

sp eeds, but `C is more exible than Fab ius. In Fabius, the user cannot directly

manipulate dynamic co de, and unlikeinTemp o and DyC, the user has no recourse

to additional annotations for controlling the co de generation pro cess. In essence,

Fabius uses dynami c compilation solely f or its p erformance advantages, extending

to run time the ap plicabilityoftraditional optimizations suchascopy propagation

and d ead co d e elimin ation.

The Dynamo pro ject [Leone and Dybvig 1997] is a successortoFabius. Leone and

Dybvig are design ing a staged compiler archit ecturethat supp orts di erent levels

of dynamic opt imization: emitt ing a high-level intermedi ate representation enables

\heavyweight " optimizations to b e p erformed atrun time, whereas emittin g a low-

level intermediate representation enables only \li ghtweight" opt imizations. The

eventu algoal of t he D ynam o pro ject is to b uild a system that will automatically

p erform dynamic optim ization.

From a lin guistic persp ective, the declarative systems have the advantage that

most annotations preserve the semantics of the origin al co de, so it is possible to

com pile and debug a program wit hout t hem. Knowing exactly wh ere to insertthe

annotations, however, can still b e a challenge. Also, imp ortantly, only D yC seems

to provide dynamic co de generation exi bilitycomp arable to thatof `C. Furt her-

more, even with D yC, manycomm on dynamic co d e programming tasks, suchasthe

various lightweight compiling interpreters presented in th is p aper, involve writing

interpreter funct ions no less comp licated t hanthose one would write f or`C.Inthe

end, the choice of syst em is probably a mat ter of individual t aste.

Fromaperformance p ersp ective, declarative systems canoft en allow b ett er static

44  Poletto , Hsieh, Eng ler, K aas hoek

optimizationthan `C, b ecause t he control ow within dynamic code canbedeter-

mined statically. Nonetheless, compl icated control ow, such as lo ops containing

con ditionals, can limit th is advantage. For example, in DyC, the full extent of

dynamic co de cannotingeneralbedetermined stat ically unless one p erforms f ull

multi-wayloop unrolling, wh ich can cause prohibit ivecodegrowth. Finally, only

Leone and Lee [Leone and Lee 1996] consistently generate co de signi cantly more

quickly than tcc; as we describ ed above, their system provides less funct ionality

and exibi litythan `C.

7. CONCLUSION

This pap er has descri b ed th e design and imp lementationof`C,ahigh-level language

for dynam ic co de generation. `C is a sup erset of A NSI C th atprovi des dynamic code

generation t o programmers at t he level of C expression s and statements. Not unlike

Lisp, `C allows programm ers to creat e and comp ose pieces ofcodeatrun t ime. It

enables programmers to add dynamic co de generation t o existin g C programs i n a

simple, p ortable, an d incrementalman ner. Finally, the m echanisms that it provides

for dynamic co de generati oncan b e mapp ed ont o staticall y typ edlanguages other

than ANSI C.

tcc is a portable and freely available implementation of `C. Im plementing `C

demonstrated that there is an imp ortant trade-o between the sp eed of dynamic

co de generation and the quality of the generated co de. As a resul t, tcc supp orts

tworuntim e syst ems for dynamic co de generation. The rst of these, vcode, emits

co de in one pass and only p erform s lo calop timizations. Th e second, icode , builds

an intermediate representation at ru n t ime and p erforms oth er opt imizations, such

as global register allo cation, b efore it emi ts co de.

Wehavepresented severalexample p rograms t hat demonstrate th e utility and ex-

pressivenessof `C in di erentcontexts. `C can b e used to im prove the p erformance

of database query languages, network data manipulation routines, m ath libraries,

and m any other applications. It is also well-suited for writ ing com piling interpret ers

and \just-in -t ime" compilers. For som e applications,dynamic co de generation can

improve performance by alm ost an order of magni tude over tradi tional C co de;

sp eedups byafactor of two t o four arenotuncommon.

Dynamic co de generation with tcc is qu ite fast. vcode dynam ically generates

one inst ruction in approximately 100 cycles; icode dynamically generates one in-

struction in approximately 600 cycles. In most of our examples, the overhead of

dynamic code generationisrecovered in und er 100 uses of t he dynamic co de; some-

tim esitcan b e recovered wit hin one run.

`C and tcc are practical to ols for using dynamic code generation in day-to-day

programming. They also provide a f rameworkfor explorin g the trade-o s in the use

and implementation of dynamic compil ation. A release of tcc, which currently runs

on M IPSand SPARC pro cessors, is available at http://pdos.lcs.mit.edu/tickc.

ACKNOWLE DGMENTS

Vivek Sarkar was instrumental in t he development of t he linear scan register allo-

cationalgorithm, whichgrew out of hi s work on spill-co de m inimization within a

single basic blo ck. Eddie Kohler provided valuab le feedback on the language and

`C and tcc: A Language and Compiler for Dynamic Co de Generation  45

on vcode register allo cation. Jonathan Litt patiently rewrote parts of xv in `C

while tcc was still immature, and thus help ed us n d several bugs.

A. `C GRAMMAR

The grammar for `C consists of the C grammar given in Harbison and St eele's C

reference manual [Harbison and Steele Jr. 1991] with the addit ions listed b elow,

and t he f ollowing restrictions:

|An unquoted -expression can only app ear inside a back quote-expressi on ,and can-

not app ear wit hin another unquoted-expressi on.

|A bac kquote-expressi on cannot app ear within anoth er backq uote-expressi on.

|csp ecs and vsp ecscannot b e declared wit hin a back quote-expressi on.

una ry-expressio n: back quote-expressi on j unquoted-expressi on

unquoted-expressi on: at-expressio n j do l l ar-expressio n

bac kquote-expressi on: ` unary-expressi on j ` compound-statement

at-expressio n: @ una ry-expressio n

do l l ar-expressio n: $ una ry-expressio n

po inter: cspec type-qua li er-list j vspec type-qual i er-list

op t op t

j cspec type-qual i er-list po inter j vspec type-qual i er-list po inter

op t op t

REFE RENCES

Auslander, J. , Philipos e, M., Chambers , C., Eggers , S., and B ershad, B . 1 996. Fast , e ec-

tive dyna mic co mpilation . In Proceedings of the SIG PLAN '9 6 Co nferenceonProgram ming

Langua geDesign and I mplem enta tio n. Philad elphia, PA, 14 9{1 59.

Bers had , B. N., Savage, S., Pardyak, P. , Sirer, E . G. , Fiuczynski, M. , B ecker, D ., Eggers ,

S., and Chambers, C. 199 5. Ext ensibility, safety and performance in the SPIN op erat ing

system. In Proceedings of the Fifteenth ACM Symposium o n s Principles.

Copp e r Mou nta in, CO, 267{ 284 .

Birrell, A. D. and Nelson, B . J. 198 4. Implementin g remote pro ce dure ca lls. ACM Tra nsac-

tio ns o n Co mp uter Systems 2, 1Fe b., 39 {59 .

Briggs, P. and Harvey, T. 199 4. Multiplicat ion by integ er c onsta nts.

http :// soft lib .ric e.e du/M SCP .

Chaitin, G.J., Auslander, M. A., Chandra, A. K., Coc ke, J. , Hop kins, M.E., and Mark -

stein, P . W. 19 81. Reg iste r allo cat io n via co loring. Com puter La nguages 6 , 47 {57.

Chambers, C. andUngar, D. 1 989. Custo miza tion: Op timizing c ompile r te chno logy fo r SELF,

a dyna mica lly-typ ed ob jec t-oriente d programming lang uage . I n Proceedings o f PLDI '89 .Port-

lan d, OR, 146 {16 0.

Clark, D. D. and Tennenhouse,D.L.19 90. Architec tural con side rat ions for a new g enerat ion

of p rot o co ls. I n ACM Com municatio n Arch itectures, Protocol s, and Appl ications SIGCOMM

199 0.Philade lphia, PA.

Cons el, C. and N oel, F . 1 996. A gene ra l approachfor run-timesp ecializ ation an d its applicat ion

to C. I n Proceed ings o f the 23th Annu al Symposium o n P rincip les o f P rogra mming La nguages.

St. Pet ersbu rg, FL, 145 {15 6.

Deuts ch, P . and Schiffman, A. 19 84. Ecientimplementat ion of the -80 syste m. In

Proceedings of th e 11th Annua l Symposium on Principles o f Progra mming La nguages. S alt Lake

City, UT, 29 7{30 2.

Draves, S. 199 5. L ightweight lang uage s for inte rac tivegra phics. Technica l Re p ort CMU-CS-95-

148, Carn egie Mellon University.May.

Engler, D . and Proebsting, T. 199 4. DCG : An ecient , ret arget able dyn amic c o de g enera-

tion syst em. Proceedings of the Sixth I nterna tio nal Co nferenceonArch itectura l Sup port for

Progra mming Langua ges and Operating Systems , 263{ 272 .

46  Poletto , Hsieh, Eng ler, K aas hoek

Engler, D. R. 199 6. vcod e:are targe tab le , ext ensible, very fast dyna mic co de ge neration sy stem.

In Proceedings of the SIG PLAN '96 ConferenceonProgram ming Langua ge Design a nd I mple-

mentation. Phila delphia, PA, 1 60{ 170. h ttp: //w ww.p dos .lcs .mi t.ed u/~ engl er/ v cod e.ht ml.

Engler, D. R. , Hsieh, W . C., and Kaashoek, M. F. 19 95. `C: A lang uag e fo r high-level, e-

cient , and machin e-ind ep en dent dyna mic co de ge neration. In Proceedingsofth e 2 3th Annua l

Sympo sium on Principl esofProgramm ing Langua ges . St. Pete rsburg, FL, 131{ 144 .

Engler, D. R., Kaashoek, M.F., and O'To ole Jr., J. 1 995 . Exokernel: an op era ting syst em

architec ture for ap plica tion-sp eci c resource ma nage ment. In Proceed ings o f the F ifteenth ACM

Sympo sium on Operating System s P rincip les. Copp er Mount ain Re sort , Colora do, 2 51{2 66.

Fo rsythe, G.E.1 977. Co mp uterMeth od s fo r Math em atical Com putations . Prentice -Hall, En-

glewood Cli s, NJ.

Frase r, C. 198 0. cop t. ft p:// ftp .cs. pri ncet on. edu/ pub / lcc /con tri b/co pt. shar .

Frase r, C . W. and Hanson, D. R. 1 990 . A c o de g enerat ion interfa ce for ANS I C. Technical

Re p ort CS-TR-270 -9 0, Depa rtmentofComp ute r Scienc e, Prin cet on University.

Frase r, C. W . andHans on,D.R.1 995 . Aretargetable C compiler: design a nd imp lementation.

Benjamin /Cumming s Publishin g Co .,Redwood City,CA.

Frase r, C. W ., Henry, R. R., and P roebsti ng, T. A. 19 92. BURG | fast opt imal instruct ion

selection an d t re e parsing. SI GPLAN No tices 27, 4 April , 68{ 76.

Grant, B. , Moc k, M., Philipos e, M., Chambers , C., and Eggers, S. 1997 . An nota tion-

direc ted run-timespecializat ion inC. In Sympo sium on Partial Eva luation and Sema ntics-

BasedProgra m Ma nipula tion.Amsterdam, Th e Ne the rlands.

Harbis on, S. and St eele Jr. , G. 19 91. C, A Reference Manua l , Th ird ed. Prentice Hall,

Englewo o d Cli s,NJ.



Holz le, U . and Ungar, D . 199 4. Opt imizing dy namically-dispat che d ca lls with run-timetyp e

fee dba ck. In Proceedings of th e SIG PLAN '94 Co nferenceonProgram ming La nguage Design

and Im plementation .Orlando , Florida, 3 26{ 335.

5

Kels ey,R., Cl inger, W ., Ree s, J., ed itors , et al. 19 98. Rev ise d Repo rt on the Algo rith mic

Langua ge Scheme. htt p:// www -swi ss. ai.m it. edu/ ~ja ffer /r5 rs_t oc. html .

Keppel, D. 199 1. Aport able inte rface for on-the - y instruct ion spac e mo di cat ion. In Fou rth

Internationa l Co nferenceonArchitectura l Su pport fo r Program ming Langua ges an d Operating

Systems.Sant a Clara, CA, 86 {95 .

Keppel, D., Eggers, S., and Henry,R.1 991. A c ase for runtime co d e ge neration . TR 9 1-11-04,

UniversityofWashingt on.

Keppel, D., Eggers , S., andHenry, R. 199 3. Evaluat in g runtime-co mpiled value -sp e ci c op-

timiza tions. TR 93-11-02, DepartmentofCompute r Scien ce an d En gin eering, Universityof

Washing ton .

Leo ne, M. and Dybvig, R. K. 1 997. Dy namo: A sta ged compiler architec ture for dy namic

program o ptimiz ation . Tech. Rep . 49 0, I ndiana University Co mpute r Sc ie nce Department.

Sept .

Leo ne, M.and Lee, P . 199 6. Op timizing ML with run -time c o de gene ra tion. In Proceed ingsofthe

SIGPLAN '9 6 ConferenceonProgram ming La nguage Design a nd I mp lementation.Philade l-

phia, PA, 13 7{1 48.

Ouste rhout, J. 1 994. Tcl and th e Tool kit. Addison-Wesley Professio nal Co mputing S eries.

Addiso n-Wesley,Re ading, MA.

Pik e, R ., L ocanthi, B., and R eiser, J. 198 5. Hardware /softwaretra de-o s fo r bitmap gra phics

on the Blit. Software|Practice and Experience15,2Fe b., 13 1{1 51.

Pol etto, M., Engler, D. R ., and Kaasho ek, M. F. 1997 . tcc : A syst em fo r fast , exible, an d

high-level dyna mic c o de g ene rat ion. In Proceedings of the ACM SIG PLAN '9 7 Conferenceon

Progra mming Langua geDesign and I mpl ementa tio n. Las Ve gas, NV, 1 09{1 21.

Pol etto, M. and Sarkar, V. 19 98. Line ar sca n reg ist er allo cat ion. ACM Tra nsactions on

Progra mming Langua ges and Systems .To app e ar.

Pres s, W. H ., Teukols ky, S.A., Vet terling,W.T., and Flannery,B.P.199 2. Numerical

Recipes in C , Se cond e d. Cambridg e Un iversityPress, Cambridg e, UK.

`C and tcc: A Language and Compiler for Dynamic Co de Generation  47

Pu, C. , Autry, T., Black, A., Consel, C., Cowan, C., Inouye, J., Kethana, L. , Walpol e,

J., and Zhang, K. 199 5. Optimist ic inc rementa l sp e cializa tion: streamlining a commerical

op erat ing syst em. In Proceedings of the Fifteenth ACM Sympo sium on Operating Systems

Principles. Co pp er Mountain, CO.

Pu, C ., Mas salin, H., and Ioannidis , J. 19 88. The Synt hesis kernel. Com puting Systems 1, 1,

11{3 2.

SPARCInt ernat io nal 1 994. SPARCArchitecture Manu al Version 9. SPARCIntern ationa l, En-

glewood Cli s, Ne w Je rsey.

Steele Jr., G. 1990 . Comm on Lisp , Se cond e d. Digita l Press,Burlin gto n, MA.

Thekkath, C . A. and Levy,H.M.19 93. Limits t o low-late ncy c ommunicat io n on high-sp ee d

networks. ACM Transa ctio ns o n Co mp uter System s 11 , 2May, 179{ 203 .

Veenst ra, J. and Fow ler, R. 1994 . MI NT: a front end for ecientsimulat ion of shared-memory

multipro c essors. In Mod eling and Simu lation of Co mp uters and Telecom municatio ns Systems.

Durha m,NC.

Wall, L., Christiansen, T., and Schwartz, R. 1996 . Progra mming P erl.O'Reilly & Asso c ia tes,

Seba stopol, CA.

Wi tchel, E. and Ro senblum, M. 1996 . Embra: Fast a nd ex iblemachine simulat ion. In Pro-

ceedings o f ACM SI GMETRI CS '9 6 Co nferenceonMeas urement and Mod eling of Com puter

Systems.Philade lphia, PA, 68{ 79.

Rec eived Oct ob er 1997 , Revised May 1998 , Acce pte d June 19 98.