`C and tcc: A Language and Co mpiler for Dynamic
Co d e Generatio n
Massimiliano Poletto
Lab or atory forComputer Science, Massachusetts Institute of Technology
and
Wilson C. Hsieh
Depar tmentofComputer Science, Universityof Utah
and
Dawson R. Engler
Lab or atory forComputer Science, Massachusetts Institute of Technology
and
M. Frans Kaasho ek
Lab or atory forComputer Science, Massachusetts Institute of Technology
Dyna mic c o de g ene rat io n allows p rog rammerstouserun -time informa tion in order to a chieve
performan ce an d expressivene ss sup e rior t o tho se of sta tic co de. The `C TickC langu age is
a sup e rset o f ANSI C tha t supp o rts ec ient and h igh -leve l use o f dyna mic c o de ge nerat ion . `C
provides dy namic co de gene ra tion at t he level o f C exp ressions and st ate ments, and su pp orts the
co mp ositio n of dy namiccodeatrun t ime. These fea tures enab le prog rammers t o add dy namic
co de gene ra tion t o existing C co d e inc rementa lly, a nd t o write important app licat ions suchas
\just-in-time" co mpilers easily. The pa p er presents many e xamples of how`C can b e used t o
solve pract ic al problems.
The tcc compiler is an ecient , p ortab le, and freelyavailable implementat io n of `C. tcc allows
programmers to trad e dy namiccompilation sp eed fo r dy namiccodequality: in some a pplic ations,
it is most imp orta nt t o ge nerate co de quickly, while in ot hers c o de qualitymat te rsmore tha n
co mpilat ion sp eed . The overhe ad of dyna miccompila tion is o n the ord er o f 10 0 to 6 00 cy cles p er
ge nerate d inst ruc tion, d ep en ding on the level of dyn amic op timiza tion. Measu rements show t hat
th e use of dyn amic co d e ge nerat ion ca n improveperforma nce byalmostanorderofmagnitu de;
two - t o four-fold sp e edup s are co mmon. In mo st c ases, the overhe ad of dyna mic c ompila tion is
recovered in un der 10 0 uses o f the dy namic c o de ; sometimes it c an b e rec overe d with in one u se.
Cat ego ries a nd S ub je ct Descriptors: D.3.2 [Programming Languages]: Lang uag e Classi ca-
tion s|specialized ap plicatio n langu ages; D.3.3 [Programming Languages]: La nguag e Con-
struct s an d Fe atu res; D.3.4 [Programming Languages ]: Pro cessors|comp ilers; code genera-
tion; ru n-time enviro nments
Ge neral Terms: Algorith ms, La ngua ges, Performa nce
Additiona l Key Words and Ph rase s: compilers, d ynamic co de ge nerat io n, dyn amic co de opt imiza-
tion , ANSI C
Email:[email protected] du, [email protected] h.e du, eng [email protected] , kaasho e [email protected]. L ab o-
rato ry for Co mp ute r Sc ien ce, Massachuset ts Inst it ute o f Techno logy, Cambrid ge, MA 0 213 9. The
sec ond aut hor can b e reached at: University of Ut ah, Comput er Sc ie nce, 5 0 S Centra l Ca mpus
Drive , Ro om 3190 , Salt Lake City, UT 8 4112 -9 205.
This resea rchwas supp o rte d in part bytheAdvanc ed Resea rch Pro ject s Age ncy und er co ntrac ts
N000 14-94-1-098 5 a nd N6 600 1-96-C-85 22, a nd by a NSF Nation al Youn g I nve stigat or Award.
2 Poletto, Hs ieh, Engle r, Kaasho ek
1. INTRODUCTION
Dynamic co de generation | th e generation of executable co de at run time | en-
ables th e use ofrun-tim e in formation t o improve co de quality. Information about
run-t ime invariants provides new opp ortuni ties for classical optimizations suchas
strength reduction, dead co de elimination, and inlining. In addition, d ynamic co de
generation is the key technology b ehind j ust-in-time compilers, compilin g inter-
preters, and other comp onents of mo dern m obile co de and oth eradaptive systems.
`C is a sup erset of A NSI C that supp orts the high-level and e cient use of
dynamic co de generation. It ext ends ANSI C with a small number of constructs
that allow the programmer to express dynamic co d e at th e l evel ofCexpressions
and statem ents,and t o compose arbit rary dynamic code atrun ti me. These f eatures
enable programm ers to write comp lex im p erativeco de manipulation program s in a
style similar to Lisp [St eele Jr. 1990], and make it relati vely easy to write p owerful
and p ortable dynamic co de. Furthermore, since `C is a sup erset of AN SI C, it is
not dicult to im proveperformance of co de incrementally byaddin g dynami c co de
generationtoexistin g C p rograms.
`C's extensionstoC|twotyp e constructors, t hree unaryop erators,and a few
sp ecial forms | allow dynamic co de to be typ e-checked statically. Much of the
overhead of dynamic compilation can theref ore be incurred statically, which im-
proves the eciency ofdynamic compilation. W hile these constru cts were designed
for AN SI C, it should be straightf orward to add analogous construct s to other
statically typ ed languages.
tcc is an ecientand freely available implementation of `C, consist ing ofafront
end, back ends that comp ile to C and to MIPS and SPARC assembly, and two
runt ime syst ems. tcc allows the user to trade dynamic co de quality for dynamic
co de generation sp eed. If com pilation sp eed must be maximized, dynamic co de
generation and register allo cation can be performed in one pass; if co de quality
is most i mp ortant, the system can construct and opti mize an intermediate repre-
sentation prior to code generation. The overhead of dynamic co de generation is
approxim ately 100 cycles p er generated instruction when tcc only p erforms si mple
dynamic co de optimi zation, and approximately 600 cycles p er generated instruction
when all of tcc's dynam ic optimi zations are turned on.
This paper makes t he f ollowin g contributions:
|It describ es the `C language, and motivates t he d esign of th e language.
|It describ es tcc, with sp ecial emphasis on it s tworunt ime systems, one tu ned f or
co de quality and the ot her for fast dynamic co de generation.
|It presents an extensivesetof`Cexamples, which illu strate the u tilityof dynamic
co de generation and the ease of use of`Cinavarietyof contexts.
|It analyzes the p erformance of tcc and tcc-generated dynamic code on several
b enchmarks. Measurement s showthatuseof dynamic comp ilati on can im prove
p erformance byalmost an order of magnitude in some cases, and generall y results
in two- to four-fold sp eedup s. The overheadof dynamic compilation is usually
recovered in un der 100 uses of th e dynamic co d e; sometimes it can b e recovered
withi n one use.
The rest of t his paper is organized as follows. Section 2 describ es `C, and Sec-
`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation 3
tion 3 describ es tcc. Section 4 illustrates severalsamp le applications of `C. S ect ion5
presents p erformance m easurements. Finally,we discuss related work in Sect ion6,
and summarize our con clusions in Section 7. App endix A d escrib es the `C exten-
sions to t he A NSI C grammar.
2. THE `C LA NGUAGE
The `C language was designed to sup p ort easy-to-usedynamic co de generation in a
systems and applications programm ing environment. This requirementmotivated
some of the key f eatures of the language:
|`C is a small extension of ANSI C: it adds very few construct s | two type
constructors, t hree un ary operators, and a few sp ecial forms { and leaves the rest
of the language intact. Asaresult, it is p ossible t o convert existing C co de to `C
incrementally.
|Dynamic co de in `C is st atically typ ed. Th is i s consistent with C, and improves
the p erformance of dynamic compilation by eliminating the need for dynamic
typ e-checking. The same constructs used to extend C with dynamic co de gener-
ation should b e applicable to other staticall y typ ed languages.
|The dynami c compilation pro cess is imp erative: t he `C programmer directs t he
creation and comp osition of dynamic co de. This approach distinguishes `C f rom
several recent declarative dynamic compilation systems [Auslan der et al. 1996;
Grant etal. 1997; Consel and No el 1996]. We b elieve that the imp erative ap-
proach is b ett er suit ed to a syst ems environ ment , where t he programmer wants
tightcontrolover dynamically generated co de.
In `C, a p rogrammer creates code speci cati ons, whichare stat ic descrip tions of
dynamic co de. Co de sp eci cations can capt ure the values of run-tim e constants,
and they may be comp osed at run time to build larger sp eci cati ons. They are
com piled at run t ime to pro d uce executable co de. The pro cess works as follows:
1 At static compi le time,a`Ccompiler pro cesses the written program. Foreach
co de sp eci cation, t he compil ergenerates co de to capture its run-tim e environ-
ment, aswell as code to generate dynamic co de.
2 Atrun t ime, co de sp eci cations are evalu ated. The sp eci cation captu res its
run-tim e environmentat t his tim e, whichwecall envi ronment bindi ng ti me .
3 Atrun time, co de sp eci cations are passed to t he `C compile sp ecial form. com-
pile invokes t he sp eci cation's co de generator, and returns a function p ointer.
Wecall this p oint i n t ime d ynami c code generatio n time.
For examp le,thefollowing code fragment implements \H ello World" in `C:
hellovoid f void make
void f = compile`f printf \hello,worldnn"; g,void;
f ;
g
The co de wit hin the backquote and braces is a co de sp eci cation for a call to
printf th at should b e generated at run ti me. The co d e sp eci cation is evaluat ed en-
vironment binding time, and t he resulting ob ject is then passed to compile, which
4 Poletto, Hs ieh, Engle r, Kaasho ek
generates executable co de for the printf and returns a f unction pointer dynamic
co de generation time. The funct ion p ointer can t hen b e invoked directl y.
The restof thi s section descri b es in det ail t he dynamic co de generationext ensions
intro d uced in `C. Section 2.1 describ es the ` op erator. Sect ion 2.2 describ es the
typ e constructors in `C. Secti on 2.3 describ es the unquoting op erators, @ and $.
Section 2.4 describ es the `C sp ecial forms.
2.1 The ` Op erator
The ` b ackquote, or\tick" op erator i s u sed to creat e dynamic code speci cations
in `C. ` can b e applied to an expression or comp ound statement, and indicates that
co de correspond ing to that expression or statementshould b e generated at run t ime.
`C disallows t he dynam ic generation of code-generating co de, so ` do es not nest. In
other words, a co de sp eci cation cann ot contain a nested co de sp eci cation. Some
simple usages ofbackquote are as f ollows:
/ Spec i catio n o f dy na mic code for the expressio n ` `4 '' : results
i n d ynami c generatio n o f code that produces4asavalue /
`4
/ Spec i catio n o f dy na mic codeforacal l to pri ntf ;
jmust bedecla red in an enclo sing scope /
`printf \d",j
/ Spec i catio n o f dy na mic codeforacompound statement /
`f inti;for i = 0; i < 10; i++ printf \dnn", i; g
Dynamic code is lexically scop ed: variables in static co de can be ref erenced in
dynamic co de. Lexical scoping and static typing allow typ e-checking and some
instruction selecti ontooccu r at static compile t ime, decreasing dynamic co de gen-
eration overhead.
The value of a variable af ter it s scop e has b een exited is un de ned, just as in
AN SI C. In contrast toANSIC,however,not all usesofvariab les outside their scope
can b e det ect ed statically. For exampl e, one m ayuse a lo calvariable declared in
static co de from with in a backquote expression, and t hen ret urn t he value ofthe
co de sp eci cation. W hen the co de sp eci cation is compiled, the result ing co de
references a memory lo cationthatnolonger contain s the lo calvariabl e, b ecausethe
original funct ion's activation record has gone away. The compi ler could p erform a
data- ow analysis to conservatively warn t he user ofapotential error;however, in
our exp erience this situationarises veryrarely,an d is easy to avoid.
The use of several C constructs is restricted within backquote expressions. In
particu lar, a break, continue, case, or goto stat ement cann ot be used to transfer
control outside the enclosing backquot e expression. For instance, the destination
lab el of a goto statement must be contained in th e same backquote expression
that contains the goto. `C provides other m eans for transferring control between
backquote expressions; we discuss t hese metho ds in Sect ion 2.4. The limitation
on goto and other control-transf er stat ements enables a `C compi ler to statically
determine whet her a control- owchange is legal. The useof retur n is not restrict ed,
`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation 5
b ecause dynamic co de is always imp licitly in sid e a f unction.
A backquote expression can b e dynamically compi led using the compile sp ecial
form , which is describ ed in Section 2.4. compile returns a fu nctionpointer, which
can thenbeinvoked li keany other f unction p ointer.
2.2 Typ e Constr uctors
`C intro duces two new type constructors, csp ec and vspec. cspecsare static typ es
for dynamic co de; their presence allows dynamic co de to b e typ e-checked statically.
vsp ecsare statics typ es for dynamic lvalues expressions t hatmay b e used on t he
left hand side of an assignment; their presence all ows dynamic co de to allo cate
lvalues as needed.
A cspec or vsp ec has an asso ciated eva luation type , which is the typ e of the
dynamic value of the sp eci cation. The evaluationtypeisanalogous to the typ e to
which a p ointer p oints.
2.2.1 csp ec Types. cspec short for code spec i catio n is the type of a dynamic
co de sp eci cation; th e evaluationtyp e of the cspec is the typ e of t he dynamic value
of t he co de. For example, t he typ e of t he expression `4 is int csp ec. The type void
csp ec is t he type ofageneric cspec typ e analogous to the use of void * as a generic
p ointer.
App lyin g ` toastatem ent orcomp ound statement yiel ds an expression of type
void csp ec. In part icular, if the dynamic stat ementorcomp ou nd stat ementcontains
a r eturn statem ent, the typ e of the return value do es not a ect the typ e of the
backquote expression. Since all typ e-checking is p erformed stat ically,itispossible
to comp ose backquote expressions to create a fu nction at ru n time with mult iple
p ossibly in compatible return typ es. This de ciency in the typ e system is a design
choice: sucherrors are rareinpract ice, and checking for them would involvemore
overhead at dynamic compile time oraddit ional lingui st ic extensions.
The co d e generated by ` may include implicit casts used to reconcile th e resul t
typ e of ` wit h it s use; the standard conversion rules of AN SI C apply.
Some simp le uses of csp ec foll ow:
intcsp ec expr1 = `4; / Code spec i catio n f or expressio n ` `4 '' /
oat x;
oat csp ec expr2=`x+4.; / C apturefree vari abl e x: its value wi l l be
bound wh en th e dyna mic cod e is executed /
/ Al l dyna mic compound statements h ave type voi d cspec, regard less of
wh ether the resulting code w il l return a val ue /
void cspec stmt=`f printf \hello,worldnn"; ret urn 0; g;
2.2.2 vsp ec Types. vsp ec va ria ble speci catio n is the typ e of a dynami cally gen-
erated lvalue, a variab le whose storage class whether it shou ld reside in a register
or on the stack, and in what lo cation exactly is determined dynam ically. The
evaluationtyp e of the vsp ec is th e type of the lvalue. void vsp ec is used asageneric
vsp ec typ e. Ob jects of typ e vsp ec may be created by invokin g the sp ecial forms
param and lo cal. param is used to create a paramet er f or the f unction currently un-
6 Poletto, Hs ieh, Engle r, Kaasho ek
der construction; lo cal is used to reserve space in its activationrecordorallo cate
a register if p ossible. See Section 2.4 f ormore det ails on these sp ecial forms.
In general, an ob ject of typ e vspec is automat ically treatedasavariable ofthe
vsp ec's evaluationtyp e when it app ears inside a csp ec. A vsp ec inside a backquote
expression can thus be used like a trad itional C variable, b oth as an lvalu e and
an rvalu e. For example, t he f ollowin g function creates a cspec that takes a single
integerargum ent , adds one to it, and returns the result:
void csp ec p lus1void f
/ Pa ram ta kes the typeandposi tion of the argumenttobe generated /
int vsp ec i = parami nt, 0;
retur n `f retur n i+1;g;
g
vsp ecsallowustoconstruct funct ions t hat takearun-t ime-d etermin ednumber
of argum ents; this functi onality i s necessary in applications suchas the compiling
interpreter d escrib ed in Section 4.4.2.
2.2.3 Discussio n. W ithin a quoted expression, vsp ecsand csp ecs can b e passed
to f unctions t hat exp ect th eir evalu ation typ es. The followin g co d e is legal:
int fintj;
void main f
int vsp ec v;
/ ... ini tial ize v to a d ynami c lva lueusing local o r param . .. /
void csp ec c1 = `f f v; g;
g
Wit hin the quoted expression, f exp ects an integer. Sin ce this funct ion call is
evaluated during the execution of t he dynamic code, the integer lvalue t o whi ch v
refers will already have b een created. As a resu lt, v can be passed to f like any
other integer.
2.3 Unquoting Op erators
The ` op erator allows a programmer to create co de at run time. In this section we
describ e two operators, @ and $, that are used within backquote expressions. @
is u sed to comp ose csp ecs dynamically. $ is used to instantiate values as run-t ime
constants in dynamic co d e. These twooperators\un quote" their op erands: their
op erands are evaluated at environment binding t ime.
2.3.1 The @ Operator. The @ op erator allows co de sp eci cations t o b e comp osed
into larger sp eci cations. @ can only b e applied inside a b ackquote expression , and
its op erand mustbeacsp ec or vsp ec. @ \dereferences" its op erand atenviron ment
bindin g ti me: it returns anob ject whose typ e is the evaluationtyp e of @ 's operand.
The returned ob ject is incorporated into the csp ec in wh ich the @ o ccurs. For
examp le, in th e following fragment, c is t he addit ivecomp osition oftwo csp ecs:
/ Compo se c1 a nd c2 . Eva luatio n o f c yi eld s ` `9 '' . /
`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation 7
intcsp ec c1 = `4, cspec c2 = `5;
intcsp ec c = `@c1 + @c2;
Statements canbecomp osed through concatenation:
/ Concatenate two null state ments. /
void cspec s1 = `fg, csp ec s2 = `fg;
void cspecs=`f @s1; @s2; g;
App lyin g @ inside a backquote expression to a f unction which must return a
csp ec ora vsp eccauses t he f unction to b e called atenvironment binding time. Its
result is incorp orated into th e backquote expression.
In order to imp rove the readability of co de comp osition, `C provid es some impli cit
co ercions of vsp ecs and cspecs, so that th e @ op erator may be om itted in several
situations. A n expressionoftyp e vspec or csp ec that app ears inside a quoted ex-
pression is co erced with an i mplicit @toanobject of its corresp onding evaluation
typ e under t he f ollowin g condit ions:
1 the expression is not inside an unquot ed expression.
2 the expression is not b eing u sed as a statement.
The rst restriction also in cl udes implicitly unquoted expressions: th atis, expres-
sions that o ccur wit hin an implicitl y co erced expression are not i mplicitly co erced.
For example, the arguments in a call to a funct ion retu rning typ e cspec or vsp ec
arenotcoerced, b ecause th e f unction call itself already is.
These co ercions do n ot limit t he expressiveness of `C, b ecause `C supp orts only
one \level" of d ynamic co d e: it do es not supp ort dynamic co d e that generatesmore
dynamic co d e. Therefore, th e ability to manipulate vsp ecsand csp ecs in dynamic
co de is not u sef ul.
These implicit co ercions simplif y t he syntax of csp ec comp osi tion. Consider t he
following example:
intcsp ec a = `4; int csp ec b = `5;
intcsp ec sum = `a+b ;
intcsp ec sum ofsum = `sum+sum;
This co de is equivalent to t he following co de, due to the imp licit co ercion of a,
b,and sum.
intcsp ec a = `4; int csp ec b = `5;
intcsp ec sum = `@a+@b;
intcsp ec sum ofsum = `@sum+@sum;
Com piling sumof sum resul ts in dynamic co de equivalentto4+5+4+5 .
Statements and comp ound statements are considered to have typ e void; an
ob j ect of typ e void csp ec in sid e a backquote expression cannot be used inside an
expression, but can b e comp osed asan expression stat ement:
8 Poletto, Hs ieh, Engle r, Kaasho ek
void csp e c mkscale int m,intn,intsf
return `f
inti,j;
for i=0;i < $n; i++ f / Loop can be dyna mical ly unrol led /
int v = $m[i];
for j=0;j< $n ; j++
v[j]=v[j] $s; / Mu ltip licatio n can be strength reduced /
gg;
g
Fig. 1 . `Ccodetospecialize multiplica tion of a mat rix byanint ege r.
void csp ec h ello = `f printf\hello "; g;
void cspec world=`f printf \worldnn"; g;
void csp ec greet ing = `f @hello;@world; g;
2.3.2 The $ Operator. The $ op eratorallows run-time values t o b e in corp orated
as run-time constants in dynamic co de. $ evaluates it s operand at environ ment
bindin g time; the resulting valu e is used as a run-time constant in the containing
csp ec. $ may only app ear in sid e a backquote expression, and it may not be un-
quot ed. It m ay b e applied to anyobject notoftype cspec or vspec. The use of $ is
illustrat ed in th e co de fragment b elow.
intx=1;
void cspecc=`f printf \$x = d, x = dnn", $x, x; g;
x= 14;
compilec, void ; / Comp ile a nd run: wil l pri nt``$x = 1, x = 14 '' . /
Use of $ enables sp ecializationofcodebased on run -t ime constants. An exam ple of
this is t he program in Figure 1, whichsp ecializes mu ltiplication of a matrix byan
integer. The p oint er to the matrix, the sizeofthematrix, and the scale factorare
all run-t ime constants, which enables opt imizations such as dynamic loop unrolling
and strength reducti onofmult iplication.
2.3.3 Discussio n. W ithin an unquoted expression, vspecs and csp ecs cannot be
passed to fu nction s that exp ect t heir evaluationtyp es. The f ollowing co de is illegal:
void csp ec f intj;
intgi nt j;
void main f
int vsp ec v;
void csp ec c1 = `f @f v; g;/ error: v is the wrong type /
void csp ec c2 = `f $gv; g;/ error: v is the w rongtype /
g
The storage class of a variable declared within the scop e of dynami c co de is de-
termined dynamically. Therefore, a variable of typ e T that is lo cal to a backquote
`C and tcc: ALang uag e and Co mpiler for DynamicCode Generation 9
Category Name Synopsis
Man agement of dy namic c o de compile T *c ompilevoid csp e c co de , T
free co de void free co d eT
Dy namicvariab les lo ca l T vsp ec lo ca l T
Dy namic func tion arguments pa ram T vsp ec pa ram T ,int param-num
pu sh init void csp ec pu sh initvoid
pu sh void pushvoid c sp ec args , T csp ec n ext -arg
Dy namiccontrol ow lab e l void csp ec lab e l
jump void csp ec jumpvoid c spec ta rge t
self T self T , o the r-args...
Tab leI. The `Cspecial forms. T den ote s a typ e.
expressionmust b e treated as typ e T vsp ec when used in an unquoted expression.
The example b elow illustrat es thi s b ehavior.
void csp ec f int vsp ec j ; / fmust ta ke an int vspec.../
void main f
void csp ec c = `f intv;@f v; g;/ becausevisadyna mic local /
g
2.4 Sp ecialForms
`C extends AN SI C with several sp ecial forms. Most of t hese sp ecial forms take
typ es asarguments, and their result typ es somet imes dep end on their i nput typ es.
The sp ecial forms can b e broken int o four categories, asshown in Table I.
2.4.1 M anageme nt of Dynami c Code. The compile and f ree co de sp ecial forms
are used to create executable co d e from co de sp eci cations and t o deallo cate t he
storageasso ciated with a dynamic fun ction, resp ectively. compile generates machine
co de from code , and returns a p ointer to a f unction returning typ e T . It also
automatically reclaim s the storage f or all existi ng vsp ecsand csp ecs: as describ ed
in Sect ion3, vsp ecsand csp ecsare object s that track necessary pieces of program
state at environment binding and dynamic co de generation time, so they are no
longer needed and can be reclaimed after the dynamic co de has b een generat ed.
Some ot her dynamic compilation systems [Leone and Lee 1996; Consel and No el
1996] mem oize dynamic co de fragments: in th e case of `C, the lowoverhead required
to create co de sp eci cations, their generally small size, and th eir suscept ibility
to changes in th e run-time environment make memoization of csp ecs and vsp ecs
unattract ive.
fr ee co de takes as argument a fun ction p oint er to a dynami c function previously
created by compile, and reclaims the memory for that function. In t his way, it
is possible to reclaim the memory consumed by dynamic co de wh en the co de is
code to exp licitly no longer necessary. Th e programmer can use compile and f ree
manage memory used for dynamic co de, similarly to howone normally uses mallo c
and f ree.
10 Poletto , Hsieh, Eng ler, K aas hoek
/ Co nstru ct cspecto sum n integer argu ments. /
void csp e c co nstruct sumintnf
int i, c sp ec c = `0;
for i=0;i< n; i++ f
int vsp ec v = paramint, i; / Create a para meter /
c=`c+v;/ Add param 'v' to current su m /
g
return `f return c; g;
g
int csp e c con struct callintnargs,int arg vec f
sum5 , int ; intsum=compile const ruc t
init; / I nitia lize a rgum en t list / void c sp ec a rgs = push
inti;
for i=0;i< na rgs; i++ / For each a rginarg vec... /
push args, `$arg vec[i]; / p ush it onto th e a rgs stack /
return `suma rgs;
g
Fig. 2 . `C allows programmers t o c onst ruc t fun ctions wit h dyna mic numbers of arguments. c on-
su m crea te s a func tion that t akes n argument s and add s the m. c on stru ct call crea tes a cspec st ruct
th at invokes a d ynamic fun ction: it initializesanargu ment sta ckbyinvok ing p ush init , and dy-
namica lly add s a rgu ments t o this list by c alling p ush . `C allows a n argumentlist an ob jec t of
typ e void csp ec tobeusedasasin gle argumentinacall: `sumargs calls su m using t he argument
list de not ed by args.
2.4.2 Dynami c Varia bles. The lo cal sp ecial f orm is a mechanism for creating l o cal
variables in dynamic co d e. The ob j ect s it creates are analogous to lo cal variables
declared in the b o dy of a backquote expression , but they can b e used across back-
quot e expressions, rath er than b eing restrict ed t o the scop e of one expression. In
addition, lo cal enables dynamic co de to haveanarbit rarynumber of lo cal variables.
lo cal retu rns an ob ject oftyp e T vsp ec th at denot es a dynamic localvariable of type
T in the current dynamic fu nction. In `C, t he typ e T may include one of twoC
storage class speci ers, auto and register: the former in dicates that the variable
should b e allo cated on the stack, while t he lat ter is a h int to the compiler that t he
variable should b e placedinaregister, i f p ossible.
2.4.3 Dynami c Functio n Arguments. The param sp ecial form is used to create
parameters of dynamic functions. param returns an ob ject of typ e T vsp ec that
denotes a formal parameter of the current dynamic fu nction. param-num is t he p a-
rameter's p osition in the f unction's parameter list, whereas T denotes its evaluation
typ e. As il lustrated in F igure 2, param can b e used to create a function that h as
the number of it s parameters determ ined at run time. Figure 3 shows how param
can b e used to curry funct ions.
Whereas param serves to create the formal paramet ers of a dynamic f unction,
push init and push are used toget her to d ynamically b uild argum entlist s for f unction
call s. push init returns a csp ec that corresponds t o a new in itially empty dynamic
argument list. push adds the co de sp eci cation for the n ext argument, next-arg ,to
the dynamically generated l istofarguments args . T ,theevaluation typ e of next-
arg , may not be void. These two sp ecial forms all ow the program mer to create
fun ction calls wh ich pass a dynamically determ ined number of arguments to the
`C and tcc: A Language and Compiler for Dynamic Co de Generation 11
typ e def intwrite pt rchar ,int;
/ Create a function that cal ls "write" w ith"tcb"hard wired as its rst argu ment. /
pt r mkwrit estruct tcb tc b f write
char vsp ec msg=paramcha r , 0;
intvspec nbyte s = pa ramint, 1;
return compile`f return write $t cb, msg , nbyte s; g ,int;
g
Fig. 3 . `C c an b e u se d t o curryfu nct ion s by creat in g func tion paramete rs dynamica lly. In this
ex ample, t his func tionalityallowsanetwork conn ect io n co ntrol block t o b e hidden from clients,
but still enab le s o p erations on t he c onne ction write ,inthis ca se t o b e pa rame teriz ed with p e r-
co nnec tion dat a.
invoked fu nction. Figure 2 illustrat es their use.
2.4.4 Dynami c C ontrol Fl ow. For error-checking purp oses, `C f orbids goto state-
ments from transfering control outside the enclosing backquote expression. Two
sp ecial forms, lab el and jump, are used for inter-cspec control ow: jump returns
the csp ec of a j ump to i ts argument, target . Target maybeany ob ject oftyp e void
csp ec. lab el simply retu rns a void cspec thatmay b e used as the destination ofa
jump. Syntactic sugar allows jumptarget to b e writt en as jump target. Sect ion 4.1.5
presents example `C co de that uses lab el and jump to implementsp ecialized nite-
state m achines.
Lastl y, self all ows recursive calls in dynam ic co de with out incurring t he overhead
of dereferencin g a f unction p ointer. T denotes the return typ e of the f unction
that is b ein g dynamically generated. Invoking self results in a call to the f unction
that contains the i nvocation, with oth er-args passed as the arguments. self is j ust
likeanyot her funct ioncall, except that the return typ e of the dynamic fun ctionis
unknown at environment bind ing t ime, so it must b e provided as the rst argu ment.
3. THE tcc COMPILER
The implementation of tcc was driven bytwogoals: high-qualitydynamic co de, and
low dynamic comp ilati onoverhead. `C allows the user to comp ose arb itrary pieces
of code dynamically, which reduces the e ectivenessofstatic analysis. As a result ,
many opt imizations on dynami c co de in `C can only be p erformed at run t ime:
improvements in code quality require more dynamic co de generati on time. The
rest of t his section discusses how tcc handles t his tradeo . Sect ion 3.1 describ es t he
structureoftcc, Sect ion 3.2 gives an overview of the dynamic compilation pro cess,
and Section 3.3 discusses in detail some of the machinery that tcc uses t o generate
co de at ru n t ime.
3.1 Architecture
The tcc compiler is based on lcc [Fraser and Hanson 1995;1990], a p ortable compiler
for AN SI C. lcc performs common sub expression elimination wit hin ext ended b asic
blo cks, and u ses lburg [Fraser et al. 1992] to n d the lowest-cost implementation of
a given IR-level construct. Otherwise, it p erforms few opt imizations.
Figure 4 illustrates t he interaction of static and dynamic com pilationintcc. All
parsing and semantic checking ofdynamic expressions o ccurs at static com pile t ime.
12 Poletto , Hsieh, Eng ler, K aas hoek
`C source
tcc front end
static code code specifications
tcc static tcc dynamic back end back end
Static compile time executable executable code executable code static code to bind environments to create of dynamic code dynamic code
code specifications are evaluated
closures for
Environment binding code specifications
Run time compile() is invoked
executable dynamic code
Dynamic code generation
answer
Fig. 4. Overview of the tcc compilat ion pro c ess.
Semantic checksare p erformed at t he level of dynamically generated expressions.
For each csp ec, tcc p erforms internaltyp e checking. It also tracks goto statements
and labels to ensure that a goto do es not transfer control outside the b o dy ofthe
containing csp ec.
Unli ke trad itional static compilers, tcc uses twotyp es of back ends t o generate
co de. One is the static back end, which compiles the non-dynamic parts of `C
programs, and emits eit her n ativeassembly co de or C co d e su itab le for compilation
by an optimizing compiler. The other, referred t o as t he dynamic back end, emits
Ccodetogenerate dynamic co de. Once pro duced by th e dynam ic back end, this
Ccodeisinturn compiled by the static back end.
tcc provid estwo dynamic co de generation runtime systems so as to trade o code
generation sp eed for dynamic co de quality. The rst of t hese runt ime systems is
vcode [Engler 1996]. vcode provides aninterface resembli ng t hat of an idealized
load/store RISC architecture; each instruction in this interface is a C macro which
emits the correspond ing instruction or series ofinstructions for t he t arget archi-
tecture. vcode 's key f eatu re i s that it generates co de with lowrun-tim e overhead:
as few as t eninstructi ons p er generated instruction in the b est case. W hile vcode
generates code quickly, it only has access to lo cal in formation about backquote
expressions: t he quality of its co de could of ten b e improved. The second runt ime
system, icode, makes a di erent tradeo , and p ro duces b et ter co de at t he exp ense
`C and tcc: A Language and Compiler for Dynamic Co de Generation 13
of additional dynamic compilationoverhead. Rat her than emit co de in one p ass, i t
builds an d optim izes anintermediat e representation prior to co de generation.
lcc is not an opt imizing compiler. The assembly co de em itted by its tradit ional
static back ends is usually signi cantly slower even three or more times slower
than that em itted by optimizin g compil ers suchas gcc orvendor C compilers. To
improve the quality of static code emitt ed by tcc, we have implemented a static
back end t hatgenerates AN SI C from`Csource; this co de can t hen b e compiled by
anyopt imizing compiler. lcc's traditi onal b ack ends canthus b e used when static
com pilation must be fast i.e., during development, and the C back end can be
used when t he p erformance of t he co de is critical.
3.2 The Dynamic Compilation Pro cess
As describ edinSect ion 2, t he creati on of dynamic code canbedivi ded int o three
phases: static compi lation, environment binding, and dynamic co de generation.
This section describ es how tcc im plements t hese three p hases.
3.2.1 Static C ompi le T ime. Durin g static com pilation, tcc com piles the static
parts of a `C program ju st likeatraditional C compiler. It comp iles each dynamic
part{eachbackquote expression{toacode-generating function CGF, whichis
invoked at run tim e to generate co de f ordynamic expressions.
In order to minimize t he overheadof dynami c compilation, tcc performs asmuch
instruction selection as possible statically. W hen using vcode, both instruction
selection basedonop erand typ esand csp ec-lo cal register allo cation are done stati-
call y. A dditionally, t he intermediate representation of eachbackquote expression is
pro cessed by the common sub expression elimination and other lo cal optim izations
p erformed by th e lcc front end. tcc also uses copt [Fraser 1980] to perform static
p eephole opt imizations on the co de-generating m acrosused byCGFs.
Not all register allo cation and instruction selection can o ccur statically when
using vcode. For instance, it is not p ossible to determine stat ically what vsp ecsor
csp ecs will b e incorporated into other csp ecs when t he program is executed. Hence,
allo cation of dynamic lvalu es vsp ecs and of result s of comp osed cspecsmust be
p erformed dynamically. The same is true of variables or temp oraries that live
across references to other csp ecs. Eachread or writ e to one of these dynamically
determined lvalues is enclosed in a condi tional in the CGF : di erent co de is emitt ed
at ru n time, dep en ding on wheth er th e ob ject is dynamically allo catedtoaregister
or to m emory. Since the pro cess of instruction selection is enco ded in the b o dy of
the co de-generating function, it is inexp ensive.
When using icode , tcc do es not precomput e asmuchinformation abou t dynamic
co de generation. Rath er than emittin g co de d irectly,theicode macros rst bu ild
upasim ple intermediate representation; t he icode runtime system then analyzes
this representation to allo cate registers and perform other optimizations before
emitt ing co de.
State for dynamic code generation is maintai ned in CGFs and in dynamically
allo cated closures. Closures are data structures th at store ve kinds of necessary
inf ormation ab out the run-t ime environmentofabackquote expression: 1 a func-
tion p ointer to t he corresp onding stat ically generated CGF; 2 information about
inter-csp ec control ow i.e., whet her the backquote expression is th e destination
14 Poletto , Hsieh, Eng ler, K aas hoek
csp ec ti=closure 0 = closure0 t alloc closure 4 ,
c losure0!c gf = c gf0, / cod e gen func /
t closure0 ; c sp ec
csp ec t c = c losure1 = closure1 t alloc closure 16 ,
c losure1!c gf = c gf1, / cod e gen func /
c losure1!cs i = i, / nestedcspec /
j= j, / run time cons t / c losure1!rc
k=&k ,/ freevaria ble / c losure1!fv
c sp ec t closure1 ;
Fig. 5 . Sample closure assign ments.
of a jump, 3 the values of run-t ime constants b oun d via the $ op erator; 4the
addresses offree variab les; 5 p ointers to the run-t ime representations ofthecsp ecs
and vsp ecs used inside the backquot e expression. Closures are necessary to reason
ab out comp osition and ou t-of-order sp eci cation of dynamic co de.
For eachbackquote expression, tcc stat ically generates b ot h its co de-generating
fun ction and t he co de t o allo cate and initialize closures. A new closure is initialized
each tim e a backquote expressionisevaluat ed. Csp ecsare represented bypointers
to closures.
For examp le, consider the following co de:
intj,k;
intcspeci=`5;
void cspecc=`f retu rn i+$jk; g;
tcc implements th e assignm ents to t hese csp ecsby assignm ents to pointers to
closures, as illustrat ed in Figure 5. i's closurecontains only a p ointer to its co de-
generat ing fu nction. c has more dep endencies on it s environment, so it s closure
alsostoresot her informati on.
Simp li ed co de-generat ing fu nctions for th ese cspecs app ear in Figure6. cgf0
allo cates a temp orary storage lo cation, generates code to store the value 5 into
it, and returns the lo cation. cgf1 must do a litt le more work: the co de that it
generates loads the value stored at the address of free variable k into a register,
multip lies it by the value of t he run-tim e constant j, adds this to t he d ynamic value
of i, and returns the result. Since i is a csp ec, the co de f or \the dynamic value of i"
is generated by calling i's co de generat ing fun ction.
3.2.2 Run T ime. At run tim e, the co de th at ini tiali zes closures and th e co de-
generat ing fu nctions run to create dynamic code. As illustrated in Figure4, this
pro cess consists of twoparts: environm ent bi nding and dynamic co de generation.
3.2.2.1 Environment Bi nd ing. Du ring environm ent b inding, co de suchasthatin
Figure 5 builds a closurethat captures the environmentof the corresp ondin g back-
quot e expression. Closures are heap-allo cated, but their allo cation costisgreatly
red uceddown to a p ointer increment, in the normalcase by using arenas [Forsythe
1977].
`C and tcc: A Language and Compiler for Dynamic Co de Generation 15
unsig ned int cgf0 c lo sure0 tcf
vspec t itmp0=tclo c al INT ; / int temporary /
se ti itmp0, 5 ; / set it to5/
return itmp0; / return th e location /
g
t c f void cg f1 closure1
t itmp0=tclo c al INT ; / some temporaries / vspec
t itmp1=tclo c al INT ; vspec
ldii itmp1,zero,c!fv k ; / ad dr of k /
j; / run time const j / mulii itmp 1,itmp 1,c!rc
/ now ap ply i's CGF to i's closu re: cspeccompos ition! /
itmp0 = c !cs i!cg f c !cs i;
a ddi itmp1,itmp0,itmp1 ;
ret i itmp1; / em itaretur n not return a va lue /
g
Fig. 6. Sample c o de ge nerating funct io ns.
3.2.2.2 Dyna mic Code Generati on. During dynamic co de generati on, t he ` C run-
tim e pro cesses the co de-generating funct ions. The CGF s use the inf ormationinthe
closures to generate co de, and they p erform various dynamic opti mizati ons.
Dynamic co de generation b egins when the compile sp ecial form is invoked on
a csp ec. Compile calls the co de-generating f unction for the csp ec on the csp ec's
closure, and the CG F p erforms most of th e actual co de generation. In terms of our
running example, the code int *f = compilej , int; causes the run-t ime system
to i nvoke closure1!cgfclosur e1.
When the CG F returns, compile lin ks the resul ting co de, reset s the information
regarding dynamically generated lo cals and paramet ers, and ret urns apointer to
the generated co d e. Weatt empt t o minimize p o orcache b ehaviorbylaying out t he
co de in memoryatarandom o set mo dulo the i-cache size. It wouldbepossible to
track the placem ent of di erent dynamic funct ions to improvecache p erformance,
but wedonotdosocurrently.
Csp ec comp osition | the inl ining of co de corresp onding to one csp ec, b,into that
corresp onding t o another cspec, a, as describ ed in Section 2.3.1 | o ccurs du ring
dynamic co d e generation. This comp osition is im plemented simply byinvokin g b's
CGF f rom within a's CGF. If b returns a value, the value's lo cation is retu rned by
its CGF, and can t hen b e used by op erations within a's CGF .
The sp ecial forms forinter-csp ec control ow, jump and lab el, are imp lemented
eciently. Each closure, includ ing that of the empty void csp ec `fg, contains a eld
that m arkswhet her t he corresp onding csp ec is the destin ation of a jump. The co de-
generat ing fun ction checks t his eld, and if necessary, invokes a vcode or icode
macro to generate a lab el, wh ich is eventually resolved when t he runt ime system
links the co de. Asaresult, lab el can simply ret urn an empty csp ec. jump marks
the closure of the dest ination csp ec appropriately,an d t hen returns a closure that
containsapointertothedesti nati on cspec and t o a CGF that contains an icode
or vcode uncond itional branchmacro.
Generatin g ecientcodefrom comp osed csp ecsrequires optim ization analogous
to funct ion inl ining and inter-pro cedural op timization. Performi ng som e optim iza-
16 Poletto , Hsieh, Eng ler, K aas hoek
tions on th e dynamic co de aft er the order ofcomp osition of csp ecs has b een d eter-
mined can signi cantly improve co de quality. tcc's icode runtime system builds
up anintermediate representationan d p erforms some analyses b ef ore it generates
executab le co de. The vcode runtim e system, by contrast, optimizes for co de gen-
eration sp eed: it generates co de in just one pass, b ut can make p o or spill decisions
when there is register pressure.
Some dynamic optim izations performed by tcc do not dep end on the runt ime
system employed, but are enco ded directly in th e co de-generating fu nctions. These
optimization s do not requireglobalanalysis or other exp ensivecom putation , and
they canconsiderab ly improve t he quality of dynam ic co de.
First, tcc do es constantfoldi ng on run-time constants. The co de-generat ing func-
tions contain co de to evaluate anyparts ofanexpression that consist ofstatic and
run-t ime constants. The dynamically emitted instructions can then enco de these
values as im mediates. Simil arly, tcc p erforms simple lo cal strength red uctionbased
on run-t ime knowledge. For exampl e, the co de-generat ing functions can replace
multip lication by a run-tim e constant integer with a series of shift s and adds, as
describ ed in [ Briggs and Harvey1994].
In ad dition, the co de-generat ing f unctions automat ically p erformsome dynamic
lo op unrolling and dead co de elim inationbased on run-time constants. If the test
of a lo oporcondit ional is invariantat run time, or i f a lo opisbou nded byrun-t ime
constants, th encontrol owcanbedetermined at dynamic code generation t ime.
In addition, run-t ime constant information propagates down lo opnest ing levels: for
examp le, if a lo op induct ionvariable is b ou nded byrun-tim e constants, and it is in
turn used to b ound a nested lo op, t hen the ind uctionvariable of the nested lo op is
considered run-ti me constanttoo, within each unroll editeration of th e nested lo op.
This style of op timization , whichishard-co ded at static compi le t ime and p er-
formed dynamically,pro duces go o d co de without h igh dynam ic compilationover-
head. The co de t ransformations are enco ded in t he CGF, and do not dep end on
run-t ime data struct ures. Furt hermore, dynamic co de th at b ecomes un reachable at
run t ime do es notneed to b e generated, whichcan leadtofaster code generation.
3.3 Runtime Sy stems
tcc provid es two runtim e systems for generatin g co de. vcode emit s code lo cally,
with no global analysis or optimization. icode builds up an interm ediate repre-
sentation in order to supp ort more optimizations: in particular, b ett er register
allo cation.
These tworuntim e systems all owprogram merstocho ose th e appropriat e level of
run-t ime optimization. Th e choice is ap plication-sp eci c: it dep ends on the number
of times the co d e will b e used and on the co de's size an d structure. Programm ers
can select which runtime system to use when they compile a `C program.
3.3.1 vcode. Wh encodegenerationsp eed is more i mp ortant, th e user can have
tcc generate CGFs that use vcode macros, wh ich emit co de in one pass. Register
allo cation with vcode is fast and simp le. vcode provides getreg and putreg op-
erations: the former allo cates a machin e register, the lat ter frees it. If t here are
no unallo cated registers when getreg is invoked, it ret urns a spil ledlocation desig-
nated by a negati venumber; vcode macrosrecognize this numb er as a stacko set,
`C and tcc: A Language and Compiler for Dynamic Co de Generation 17
and emit the necessaryloads and stores. Clients that n d these p er-instruct ion if
statements to o exp ensivecan disable t hem: getreg is then guaranteedtoretu rn only
physical regist ernames and, if it cannot satisfy a request , it terminates th e program
with a run-t ime error. This meth o dology i s quite workable in situations where reg-
ister usage is not data-dep en dent , and the improvementincodegeneration sp eed
roughly a factoroftwo can m akeitworthwhile.
tcc statically emits getr eg and putreg op erations toget her with other vcode
macros in t he co de-generating f unctions: t his ensures t hat the register assignments
of one csp ec do not con ict with t hose of anoth er csp ec dynami cally comp osed wit h
it. However, e cient inter-csp ec regist er allo cation is hard, and t he placement of
these register management op erations can greatly a ect co de quality. For exam ple,
if a register is reserved getreg'd across a csp ec comp osit ion p oint, it b ecom es un-
available f or allo cation in t he nested cspec and in all cspecs nested within it. As a
result , vcode could run out of registers and resort t o spills aft er on ly a few levels
of csp ec nesting. To help improvecodequality, tcc follows some sim ple heuristics.
First, expression trees are rearranged so that csp ec op erand s of instructions are
evaluated b efore non-cspec op erands. This mini mizes the number of temp oraries
that span csp ec references, and hen ce t he number ofregistersal lo cated by the CG F
of one csp ec during th e execut ionof the co de-generating f unction of a nested csp ec.
Secon dly, no registers are allo cated for the ret urn value of non-void csp ecs: the
co de-generating funct ionforacsp ec allo cat es t he regist er for storing its result , and
simply returns this register name to t he CG F of the enclosing csp ec.
Tofurther reduce th e overhead of vcode register allo cati on, tcc reserves a limit ed
number of physical registers. These registersare not all o cated by getreg, but instead
are managed at static compile tim e by tcc's dynamic back end . They canonly b e
used for valu es whose live ranges do not span comp osition wit h a cspec and are
typicall y employed for expression temp oraries.
As a result of these optimi zations, vcode register allo cation is quite f ast. How-
ever, if the dynamic co d e contain s large basic blo cks with h igh register pressure, or
if cspecsare dynamicall y combined in a waythat forces manyspil ls, co de quality
su ers.
3.3.2 icode. Wh en co de qualityismore imp ortant, the user canhave tcc gen-
erate CGFs that use icode macros, which generate anintermediate representation
on whichopti mizati ons can b e p erformed. For example, icode canperform global
register allo cation ondynamic co de m ore e ectively than vcode in the presence of
csp ec comp osition.
icode provides an interface similar to thatofvcode, with two main ext ensions:
1 an in nit e number of registers, and 2 primit ives to express changes in esti-
mated usage frequen cy of co de. The rst extension allows icode clients to emi t
co de that assumes no spills, leaving the work of global, inter-csp ec register allo-
cation to icode. The second allows icode to obtain estim ates of co de execution
frequency at low cost. For instance, prior to invoking icode macros that corre-
sp ond to a lo op body, the icode client coul d invoke ref mul 10: t his tells icode
that all variab le references o ccurring in the subsequentmacrosshould b e weighted
as o ccurrin g 10 times anest imated averagenumber of loop iterations more than
the surrounding co de. Aft er emitt ing th e lo op b o dy, the icode client should i nvoke
18 Poletto , Hsieh, Eng ler, K aas hoek
a correspondi ng refdiv 10 macro to correctly weightcodeout sid e of the lo op. The
estimates obtained in this wayare usef ul for several opt imizations; they currently
provide approximate variable usage counts t hat help to guide register allo cation.
icode's intermediate representation is designed to b e compact two 4-byte m a-
chine words p er icode instruction and easy to parse in ordertored uce the overhead
of subsequentpasses. W hen compile is invoked in icode mo de, icode builds a ow
graph, p erforms register allo cation, and nall y generates execut able co de. Wehave
attempt ed to m inimize the cost ofeachof these operations. Webrie y discuss each
of them in turn.
3.3.2.1 Flo w G ra ph C onstructio n. icode buil ds a control- owgraph in one pass
after all CGFs have b een invoked. The owgraph is a single array that uses p ointers
for ind exing. In order to allo cat e all required memory in a single allo cation, icode
com putes an upp er b ou nd on the number of basic blo cks by summing t he numbers
of labels an d j umps emitted by ic ode macros. Af ter allo cating space f orananarray
of this size, it traverses th e bu er of icode instructions and adds basic blo cksto
the array in the same order in which they exist in the list of instruct ions. Forward
references are initially stored in an array of pairs of basic blo ck addresses; when
all t he b asic blo cks are buil t, the forward ref erences areresolved bytraversing this
array and linking the pairs of blo cks listed in it. As i t builds the owgraph, icode
also collects a min imal amount of lo cal data- ow information def and use sets
for each basic block. All memory management o ccurs through arenas [Forsythe
1977], which ensures low amortized cost f or memoryallo cation and essentially free
deallo cation.
3.3.2.2 Regi ster Allocatio n. Goo d register allo cation is the main b ene t that
icode provides over vcode. icode currently implements four di erent register
allo cation algorithms, which di erinoverhead and in the quality of the co de that
they pro duce: graph coloring, linear scan, and a simple schem e based on est imated
usage counts.
The graph coloring allo cator implements a simpli ed version of Chai tin's algo-
rithm [Chaitin et al. 1981]: it do es not do coalescin g, bu t emp loys est imates of
usage counts to gu ide sp illing. The livevariable information used by t his allo cator
is obt ained by an i terative data- ow pass over the icode ow graph. Both the
liveness analysis and the register all o cation pass were carefu lly implemented for
sp eed, but their actual p erformance is inherently limit ed b ecause the algorit hms
were developed for static com pilers, and prioritize co de quality over compilation
sp eed. The graph coloring allo cator therefore serves as a b enchmark: it pro duces
the b est co de, b ut is relati vely slow.
At t he opp osite end of the sp ect rum is icode's simple \usage count" allo cator:
it makes n o attempt to pro duce part icularly good co de, but is fast. This allo ca-
tor ignores liveness information altogether: it sim ply sorts all variables in order
of decreasing estim ated usage counts, allo cates the n available registers to the n
variables with th e h ighest usagecounts, and places all ot her variables on t he stack.
Most dynamic co de created using `C is relatively small: as a result , despite its
simplicity, this allo cation algorit hm of tern p erforms just as well as graph coloring.
Lastl y, icode implements l inear scanregister allo cation[Polett o and Sarkar 1998].
This algorithm improves p erformance relativetograph colorin g: it do es n ot bu ild
`C and tcc: A Language and Compiler for Dynamic Co de Generation 19
LinearScanRegis terAlloc ation
active fg
foreach liveinte rval i,inorde r of increasing start p oint
ExpireOld Intervals i
if len gth active =R then
Spil lAtIntervali
else
register[i] a registe r removed from pool of free registe rs
add i to active , sorte d by inc rea sing e nd p oint
ExpireOld Intervalsi
foreach int erval j in active ,in order o f inc re asing end p oint
if end point[j ] startpoint[i] then
return
remove j from active
ad d register[j ]topooloffree registe rs
SpillAtInterval i
spil l last inte rval in active
if endpoint[spil l] > endpoint[i] then
register[i] register[s pil l]
locatio n[s pil l] ne w st ack lo cat ion
remove sp il l from active
ad d i to active ,sorted byincre asing end p oint
else
locatio n[i] new stack lo c ation
Fig. 7. Line ar sca n registe r allo c ation .
and color a graph, but rather assigns registerstovariab les in one p ass over a sorted
list of live interva ls. Given an ordering for example, linearlayout order, or depth-
rstorder of the i nstruct ions in a owgraph , a liveintervalofavariable v is t he
interval[m; n]suchthat v is notliveprior t o instruction m or af ter in struction n.
Once the list of liveintervals is com puted, allo cat ing R available registerssoasto
minim ize the number of spil ledintervals requires rem oving the smallest number of
liveintervals so th at no morethan R li veintervals overlap. The algorit hm skips for-
ward through the sorted list of liveintervals fromstart p oint to start p oint, keeping
trackof the set of overlapping intervals. When more than R intervals overlap, it
heurist ically spil ls the interval th at ends furthest away,and moves on to the next
startpoint. The algorithm app ears in Figure7,and is discussed in detail in [Poletto
and S arkar 1998].
icode canobt ain l iveinterval information in two di erentways. The rst m ethod
is simp ly to comput e live variable information by iterative analysis, as for graph
coloring,and to then coarsen th is information to one liveinterval p er variable. This
technique pro duces liveintervals thatareasaccurate as p ossible, but is not fast.
The second met ho d is considerably faster, b ut pro d uces slightly more conservative
intervals. The algorithm nds and top ologically sorts th e strongly connected com-
p onents SCCs of the owgraph. If a variable is de n ed or used in an SCC, it is
assumed livethroughout the whole S CC. The liveinterval of avariab le therefore
20 Poletto , Hsieh, Eng ler, K aas hoek
stretches f rom the topologically rst SCC where i t app ears t o t he last. Likethe
linear scan algorithm , t his technique is analyzed in [Poletto and Sarkar 1998].
These di erentalgorithms allow icode to provide a varietyof tradeo s of compi le-
tim e overhead versus qualityof co de. Graph coloring is most exp en sive and usually
pro d uces the b est co de, linear scan is consid erably f aster but sometimes pro duces
worse code, and t he usage count allo catorisfaster th an linear scan but can pro-
duce considerably worse co de. However, given the relatively small size ofmost `C
dynamic co de, the algorith ms perform simi larly on the b enchmarks presented in
this pap er. A s a result, Section 5 presents measurements only forarepresentative
case, the linear scan allo cator using live intervals derived from f ull live variable
inf ormation.
3.3.2.3 Code Ge nerati on. The n al phase of co de generation wit h icode is t he
translation oftheintermediate representation int o executab le co de. The co de emit-
ter makes one passthroughtheicode intermed iate representation: it invokes t he
vcode macro that corresponds to each icode instruction, and prep ends and ap-
p end s spill co d e as necessary.
icode has severalhundred in structions the Cartesian pro duct of op eration kinds
and op erand typ es, so a translator for th e entire instruction set is quit e large.
Most `C programs, however, use only a small subset of all icode instructions. tcc
thereforekeeps track of the icode instructions used byanappl ication. It enco des
this usage i nformation for a given `C source le in d ummy symbol names in the
corresp onding ob ject le. A pre-linking pass t hen scans all the les ab out to be
linked and em its an addit ional ob ject le containin g an icode-to-binary translator
tailored sp eci cally to the icode macros present in the executable. This si mple
trick signi cantly reduces the size of the icode co de generator; for example, for
the b enchm arks presented in this pap er i t usually shrank the co de generators bya
factor of5or6.
3.3.2.4 Othe r Features. icode is designed to b e a generic f ramework for dynamic
co de optim ization: it is p ossible to exten d it wit h addit ional opt imization passes,
suchascopypropagation, common su b expression elim inati on, etc. However, pre-
liminary measurements indicate t hatmuch dynamic optimi zation b eyond register
allo cation is probably not pract ical: the increase in dynamic comp ile time is not
just i ed bysu cientimprovements in t he sp eed of the resulting co d e.
4. APP LICATIONS
`C is valuable in a number of practical set tings. The language can be employed
to increase performance through the use of dynamic co de generati on, as well as
to simplif y the creationof programs t hatcannot easily b e writ ten i n AN SI C. For
examp le, ` C can b e u sed to b uild ecient searching and sort ing routi nes, im plement
dynamic i ntegrated layer pro cessing f or highperformance network subsystems, and
create com piling interpreters and \just -in-tim e" comp ilers. This section presents
severalways in which`Cand dynamic co de generationcan help to solve practical
problems. Wehavedivi ded the examples into four broadcategories: sp ecialization,
dynamic fun ction call construction, dynamic in lining, and compilation. Many of
the appli cations describ ed b elow are also used for th e p erformance evaluation in
Section 5.
`C and tcc: A Language and Compiler for Dynamic Co de Generation 21
struct hte f / Hash ta ble entry str ucture /
intval; / Key that entry is as socia ted with /
struct hte next ; / Po inter to next entry /
/ ... /
g;
struct ht f / Hash ta ble stru cture /
int sc att er; / Va lue usedtoscatter keys /
intnorm; / Va lue used to norm alize /
struct hte hte; / Vector o f po inters to h ash ta ble entries /
g;
/ Has h returns a po inter to the h ash ta bl e entry, if any, that ma tch esval./
struct hte ha shstruct ht ht, intval f
struct hte hte = ht!hte [val ht !sc att er / ht !norm];
while ht e && hte !val != val hte=hte!next ;
return hte ;
g
Fig. 8 . A h ash func tion writt en inC.
4.1 Sp ecialization
`C provides programmers with a general set of mechanisms to build co de at run
tim e. Dynamic co de generation can b e used t o hard-wire run-t ime valu es into t he
instruction stream, wh ichcan enable co de optim izations suchas strength reduction
and dead co de eli mination. In add ition, `C enables more unusual and complicated
op erations, such as sp ecializing a piece of co de t o a part icular input data structure
f orexample, a given array or som e class ofdat a structuresfor examp le, all arrays
with elements of a given length.
4.1.1 Hash ing. A simple example of `C is th e optimization of a generic hash
fun ction, wh ere the t able size is determined at run t ime, and where t he f unction
uses a run-t ime value to help its hash. Consi der the C code in Figure8. The
C functi on has three values that can be treated as run-time constants: ht!hte,
ht!scatter , and ht!norm. As illustrated in Figure9, using `C to sp eciali ze the
fun ction for th ese values requires only a few changes. The resulting co de can be
considerab ly faster than t he equivalentCversion, b ecause tcc hard-co des the run-
tim e constants hte, scatter , and norm in the instruction stream, and reduces the
multip lication and division operations in strength. The cost of using t he resulting
dynamic f unction is an in direct jum p on a fun ctionpointer.
4.1.2 Vector Dot Product. Matrix and vect ormanip ulati ons such as dot pro du ct
provide m any opp ortuniti es for dynamic co de generation. They often involve a large
number of operations on values whichchangerelativel y inf requently. Matrices may
have run-time characteristics i.e., largenumb ers of zeros and small integers that
can improveperformance of matrix op erations, bu t cannot b e exploited by static
com pilation techniques. In ad dition, sparsematrix techniques areonly ecientfor
matrices with a high degree of sparseness.
In the context of matrix multipli cation, dynamic code generation can remove
multip lication by zeros, and st rength-reduce mult iplicationby small integers. Be-
22 Poletto , Hsieh, Eng ler, K aas hoek
/ Type of the functio n generatedbymk ha sh: takes a val ue as inpu t
and produ cesaposs ibly nul l po inter to a h ash ta ble entry /
typ e def struc t hte hp tr intval;
/ Co nstru ct a h as h functio n with the size, scatter, a nd h ash ta ble po inter ha rd coded. /
h ashstruct ht ht f hptr mk
intvspec val = pa ramint, 0;
void c sp ec c o de = `f
struct hte hte=$ht!ht e[val $ht !sc att er / $ht!norm];
while hte && hte!val !=valhte = hte !ne xt;
return hte;
g;
return compilecode, struct hte ; / Com pile and return the resul t /
g
Fig. 9. Sp e cializ ed hash fun ction writ ten in `C.
void dot int a, int b, intnf
int sum,k;
for sum=k=0;k< n; k++ sum += a [k]b[k];
return sum;
g
Fig. 10 . A do t-pro d uct rout ine written in C.
cause co de for each row is created once an d then used once for each column, the
costs of co de generation can b e recovered easily. Consider the C co de t o compu te
dot pro duct in Figure10. At run tim e several optim izations can be employed.
For exam ple, the programmer can directly eliminate multiplication by zero. The
corresp onding `C co de app ears in Figure11.
The dot pro d uct writ ten in `C can p erform substantially b etter than its static C
counterpart. The ` C co de do es not emit co d e for multipl ications by zero. In addi-
tion, the `C comp iler can enco d e values as immediates in arithm et ic instructions,
and can reduce multiplications by the runtim e constant $row[k] in strength.
4.1.3 B inary Search . Figure 12 illustratesarecursive implementation of binary
search. We present a recursive version here for clarity: the static code used for
measurements i n Secti on 5 is a more e cient , iterativeversion. Nonet heless, even
the it erative implementation incurssome overhead due to lo oping an d b ecauseit
needs t o ref erence into the input array.
When th e input arraywillbesearched several times, however, one can use `C to
wri te co de likethat in Figure 13. The struct ureof this codeisvery similar to that
of the recursive binary search. By adding a few dynamic code generation primit ives
to t he original algorit hm, wehave created a fu nction t hatreturns a csp ec for binary
search that is tailored to a given input set. We can create a C f unction pointer
from t his csp ec:
typ edef intipintkey;
ip mksearchintn,int x f
`C and tcc: A Language and Compiler for Dynamic Co de Generation 23
void csp e c mkdot introw[], intn f
intk;
int vsp ec c ol = p aramint ,0;/ Inp ut vector fo r dynamic function /
int c sp ec sum = `0; / Spec fo r sum o f prod ucts ; inital ly 0 /
for k=0;k < n ; k++ / Only generatecode for n on zero multipl ications /
if row[k] / Specia lize on index of co l[k] a nd val ue of ro w[k] /
sum = ` sum + c ol[$k ] $row[k];
return `f return sum; g ;
g
Fig. 1 1. `C c o de to bu ild a sp e cialize d dot-pro duc t rou tine.
int binint x, intkey,intl,intu,intr f
intp;
if l > u return 1;
p=u r;
if x[p] ==key return p;
else if x[p] < key return bin x, key, p+1, u, r/2 ;
else return binx, key,l,p 1, r/2 ;
g
Fig. 12. A tail-recu rsiveimplementat io n of binary se arch.
void csp e c ge nint x, int vsp ec key,intl,intu,intrf
intp;
if l > u return `f return 1; g;
p=u r;
return ` f
if $ x[p] == key return $p;
else if $x [p] < ke y @g en x, key, p+1, u, r/2 ;
else @gen x, key,l,p 1 , r/2 ;
g;
g
Fig. 13. `Ccodetocreate a \self-se arching" ex ecut able array.
24 Poletto , Hsieh, Eng ler, K aas hoek
typ e def dou bled ptr ;
dptr mkp ow intexpf
do uble vsp e c base = pa ram doub le, 0; / Argum ent: the bas e /
do uble vsp e c result = lo c alregister doub le; / Local : ru nning prod uct /
void c sp ec squ ares;
intbit=2;
/ Initia lize the running prod uct /
if 1&ex p squ ares = `f result=base ; g;
else squ ares = `f result=1.; g ;
/ Mu ltip ly som e m ore, if necessar y /
while bit exp f
sq uares = `f @squ ares; base = base; g;
if b it & e xp sq uares = `f @squ are s; result = base ; g;
bit=bit 1;
g
/ Comp ile a functio n wh ichreturns the resul t /
return compile`f @squa res; return result; g, d ouble ;
g
Fig . 14 . Co de t o c reateaspecialize d e xp on entia tion fun ction.
int vsp ec key=param int, 0; / One argument:th e key to search fo r /
retur n ipcompilegenx, key,0,n 1, n/2, int;
g
In the resul ting co de, the values from the input array are hard-wi red into the
instruction stream, an d the lo op is u nrolled into a binarytree of nested if statements
that compare the value to b e found t o constants. As a resu lt, th e searchinvolves
neither loads from m emory norlooping overhead, so the dynamicall y constructed
co de is considerably more ecient than it s static counterpart. For small inpu t
vect ors on t he order of 30 elements, t his result s in lo okup p erformance sup erior
even to that of a hash t able.
4.1.4 Exponentia ti on. Another exam ple of tailoring co de to an input set comes
from computer graphics [Draves 1995], where it is sometimes necessary to apply
an exp onentiation f unction to a large data set. Traditionally, exp onentiation is
com puted in a lo op wh ich p erforms repeatedmult iplication and squarin g. Given a
xed exp onent, we can unroll this lo op and obtain straight-line co de t hatcontains
the m inimum number ofmultiplications. The `C co d e to p erform this optimization
app ears in Fi gure 14.
4.1.5 Fi niteState Ma ch ines. It is p ossib le to use `C to sp ecialize co de f ormore
com plex data than just arrays and prim itivevalues. For exam ple, tcc can compil e a
DFA descript ioninto sp ecialized co de, as shown in Figure15. The f unction mk dfa
accept s a dat a structure th at describ es a D FA wit h a unique start state and some
number of accept states: at each st ate, the DFA transitions to the next state
and pro duces one character of output based on the next charact er in it s input .
mk df a uses `C's inter-csp ec control ow prim itives, jump and lab el Section 2.4,
`C and tcc: A Language and Compiler for Dynamic Co de Generation 25
typ e def struc t f
intn; / State numberstart state h as n=0 /
int a cce ptp ; / No n zero if this is an a ccept state /
char in; / I /O and next state info : on inpu t in[k] /
char out ; / produce o utput ou t[k] and goto state /
int next ; / num ber next[k] /
t; gsta te
typ e def struc t f
int size; / Number of states /
t states; / Descrip tionofeach state / sta te
t; gdfa
int mk dfad fa t dfa cha r in, char ou t f
char vsp ec in=paramcha r , 0; / I npu t to d fa /
char vsp ec out = pa ramcha r ,1;/ Output bu er /
char vsp ec t = lo c alchar;
void csp ec la b els = void csp e c mallo c dfa!size sizeofvoid c sp ec ;
void csp ec co de = `fg ; / I nitia l l y d yna mic code isem pty /
inti;
st ate s; i++ for i=0;i< dfa!n
la b els[i] = lab e l; / Create labels to m ark each s tate /
st ate s;i++ f / For each sta te ... / for i=0;i< dfa!n
state t cur=dfa !sta te s[i];
intj=0;
co de = `f @c o de; / ... p repend the code so fa r /
@lab e ls[i]; / ... a dd the label to ma rk th is state /
t= in; g; / ... read current input /
while c ur!in[j] f / ... a dd code todoth e right thing if /
code=`f @co d e; / this is a n inpu t w e expect /
if t == $c ur!in[j] f
in++; ou t++ = $cur!out[j];
jump lab els[cu r!next [j]];
gg;
j++;
g
co de = `f @c o de; / ... a dd code toreturn 0 if we're a t end /
if t == 0 f / of inpu t in an a ccept state, o r /
if $c ur!ac cep tp return 0;
else return 2; / 2 if we're in ano ther state /
g / or 1ifnotra nsitio n and not en d of inpu t /
else return 1; g;
g
return compile c o de, int ;
g
Fig. 15 . Co de to c rea te a hard-co de d nite stat e machine .
to create co d e that directl y implements the given DFA : eachstat e is imp lemented
byaseparate piece of dynamic co de, and state transit ions are simpl y condit ional
branches. Th e dynamic co de contains no references into the originaldat a structure
that describ es the DFA.
4.1.6 Swap . The examples so far have illustrated sp ecialization to sp eci c val-
ues. Value sp ecialization can imp rove performance, bu t it may be impractical if
26 Poletto , Hsieh, Eng ler, K aas hoek
typ e def void fpvoid ,void ;
fpmk swap intsize f
long vsp ec src = paramlon g , 0 ; / Arg 0 : source /
long vsp ec d st = paramlong , 1 ; / Arg 1 : d estina tion /
long v spec tmp = lo c allong ; / Tempora ry for s wap s /
voidcspecs=`fg; / Code to be built up, initia l ly em pty /
inti;
for i=0;i< size/size oflon g; i++ / Buil d swa p code /
s= `f @s; t mp = src[$i];src[$i] = dst [$i]; dst [$i]=tmp; g;
return fpc ompile s, void;
g
Fig. 16 . `Ccodetogenerate a specialized swap routine .
the values change to o f requently. A relatedapproach that do es not su er from this
drawback is sp ecialization based onprop erties, suchas size, of d ata typ es. For in-
stance, it is often necessarytoswap th e contents of tworegions of memory: in-place
sorting algorithm s are one such examp le. As lon g as the data b eing manipulated is
no largerthan a machine word, this pro cess is quite e cient. However, when m a-
nipulating larger regions f or example, C structs,thecodeisoft en inecient. One
waytocopy t he regions is t o invoke the C library mem ory copyroutine, memcpy ,
repeatedly. Using memcpy incurs funct ion call overhead, as well overhead with in
memcpy itself. Anot her way is to iterati vely swap one word at a ti me, but this
metho d incurs lo op overhead.
`C allows us to create a swaprouti ne that is sp ecial ized to t he size of t he region
b ein g swapp ed. The co de in Figure16isan exam ple, simpli ed t o handle only t he
case where the size of the regionisamultip le of sizeof long . This rout ine returns
a pointer to a f unction that contains only assignm ents, and swaps the region of
the given size without resorti ng to lo oping ormultiple calls t o memcpy . The size
of the generated code will usuall y be rath er small, whi ch makes t his a pro table
optimization. Sect ion5 evaluates a `C heapsort imp lementati on in which the swap
routin e is customized to t he size of the ob ject s b eing sorted and dynamically in lined
into t he m ain sorting co de.
4.1.7 C opy . Copying a memory region ofarbitrary size is anoth er common op-
eration. An imp ortant application of this is compu ter graphics [Pikeetal. 1985].
Similar to t he previous co de f orswapping regions of memory, the example co de in
Figure17 ret urns a f unction customized for copying a memory region of a given
size.
copy takes twoargum ents: the size of the regions to b e copied, The p ro cedure mk
and t he numberof tim es that the i nner copying lo op should b e u nrolled. It creates
a cspec for a functi onthat takes twoarguments, p ointerstosource and destination
regions. It t hen creates p rologue co de t o copyregions whichwould not b e copied
by t he unrolled lo opifn mo d unrol lx 6= 0, and generates t he b o dy of t he un rolled
lo op. Finall y,itcomp oses these two csp ecs, invokes compile,and returns a p ointer
to t he resulting cust omized copyroutine.
`C and tcc: A Language and Compiler for Dynamic Co de Generation 27
typ e def void fpvoid ,void ;
copyintn,intunrollx f fpmk
inti,j;
un signe d vsp e c src=paramunsigne d , 0; / Arg 0 : sou rce /
un signe d vsp e c dst = pa ram unsigned , 1 ; / Arg 1 : d estina tion /
intvspeck=localint; / Local: loop coun ter /
csp e c voidcopy=`fg, unrollbody=`fg;/ Code to build , initial ly empty /
for i=0;i< n unrollx ; i++/ Unrol l the rema inder copy /
copy=`f @c opy; dst[$i] = src[$i]; g ;
if n unrollx / Unrol l copy loop u nro l lx times /
for j=0;j < unrollx; j ++
un rollb o dy = `f @unrollb o dy; dst[k+$j] = src[k+$j]; g ;
co py=`f / Co mpose rem ainder co py with the ma inunrol ledloop /
@co py;
for k = $i; k < $n; k += $un rollx @unrollb o dy;
g;
/ Comp ile and retu rn a function pointer /
return fpc ompile co py,void ;
g
Fig. 17. G ene rat in g sp e cializ ed copycodein`C.
4.2 DynamicFunction Call Construction
`C allows programmers to generate funct ions and calls to fun ctions th at have
arguments whose number and typ es are not known at compile time. This func-
tionality dist inguishes `C: neither ANSI C nor any of the dynamic compilation
system s discussed in Section6 provid es mechani sms for construct ing f unction calls
dynamically.
A useful application of dynamic f unction call constructi on is the generation of
co de to marshal and unmarshal argum ent s stored in a byte vector. These op erations
arefrequently p erformed to sup p ortremote pro cedu re call [Birrell and Nelson1984].
Generatin g sp ecialized co de for the most active functions resul ts in sub stantial
p erformance b ene ts [Thekkath and Levy 1993].
We present two functi ons, marshal and unmarshal, that dynamically construct
marshaling and u nmarshaling co de, resp ectively, given a \format vector," typ es,
that sp eci es the typ es of arguments. The sample co de in Figure18 generates a
marshaling fun ction for argu ment s with a parti cular set oftyp es in t his exam ple,
int, void *,and double. First , it sp eci es co de to allo cate storage for a byte vector
large enoughtohold the argu ment s describ ed by the type formatvector. Then, f or
every typ e in the type vector, it creates a vsp ec th at refers to the corresp onding
parameter,an d constructscodetostore th e parameter's value into the byte vector at
adist inct run -ti me constant o set. Finally,itsp eci es co de that ret urns a p ointer
to the byte vector of marshaled arguments. Af ter dynamic co de generation, the
fun ction that has been constructed will store all of its paramet ers at xed, non-
overlapping o set s in the result vector. Since all typ e and o set comput ations
28 Poletto , Hsieh, Eng ler, K aas hoek
typ e def union f int i; do uble d; void p; g typ e;
typ e def enum f INTEG ER, DOUBLE, POI NTER g typ e t; / Types w e expect to m arsh al /
extern void allo c ;
marshaltyp e t typ e s, intnargs f void csp e c mk
inti;
typ e vsp ec m = lo cal typ e ; / Specofpointer to resu lt vector /
voidcspecs=`f m= typ e allo c na rgs sizeoftype; g;
for i=0;i< na rgs; i++ f / Add code to m arsh al each param /
switchtyp e s[i] f
case INTEGER: s = `f @s; m[$i].i = param$i,int ; g; break;
case DOUBLE: s= `f @s; m[$i].d = param$i, double ; g ; break;
case POINTER: s = `f @s; m[$i].p = param$i, void ; g; break;
g
g
/ Return code specto mars hal para meters a nd return result vector /
return `f @s; return m; g ;
g
Fig . 18 . Sample marsha ling co de in `C.
typ e def intfp tr ; / Typeofthe fu nctio n w e wil l becal ling /
void csp e c mk unmarsha ltyp e t ty p es, int nargs f
inti;
fptr vsp ec fp=paramfpt r,0;/ Arg 0: the functio n to invoke /
typ e vsp ec m = paramtyp e , 1 ; / Arg1:the vector to u nmars hal /
void c sp ec a rgs = push init; / Initia lize the dynam icargument list /
for i=0;i< na rgs; i++ f / Buil d up th e d yna mic argu ment l ist /
switchtyp e s[i] f
case INTEGER: push args,`m[$i].i; break;
case DOUBLE: push args,`m[$i].d ; break;
case POINTER: push args,`m[$i].p ; break;
g
g
/ Return code spectocal l the given function w ith u nma rsha ledargs /
return `f fp arg s; g ;
g
Fig. 19. Unmarshalin g c o de in `C.
havebeen done du ring environment binding, the generated co de will be ecient.
Further performance gain s could be achieved if the co de were to manage details
such as endianness and alignment.
Dynamic generation of un marshaling code is equally usefu l. The pro cess relies
on `C's mechanism f orconstructin g calls to arbitrary funct ions at run ti me. It not
only improves e ciency, b ut also provides valuable f unctionality. For examp le, in
Tcl [Ousterhout 1994] the runtime system can make up calls into an application.
However, b ecause Tcl cann ot dynamical ly create co de to call an arbitrary f unction,
it marshals all of t he up call arguments into a single byte vector, and forces appli-
cations to explicitly unmarshal them. If system s suchas Tcl used `C to construct
`C and tcc: A Language and Compiler for Dynamic Co de Generation 29
up calls, clients would b e able to write t heir co de as normal C routin es, whichwould
increase t he easeofexpression and decrease t he chance for errors.
The co d e in Figure 19 generates an unmarshaling fu nctionthatworks wit h t he
marshaling code in Figure 18. The generated co de takes a funct ion p ointer as
its rst argument and a byte vector of marshaled arguments as its second. It
unmarshals the values in the byte vectorinto their appropriate paramet erpositions,
and then invokes t he funct ionpointer. mk unmarshal works as follows: it creates t he
sp eci cations for t he generated f unction's two incoming arguments, and init ializes
the argu ment list. Th en, for every typ e in the typ e vector, it creates a csp ec to
index into the byte vector at a xed o set and pushes th is csp ec into its correct
parameter p osition. Finally, it creates the call to th e funct ion p ointed to by the
rst dynamic argum ent.
4.3 Dynamic Inlining
`C makes it easy to in line and comp ose f unctions dynam ically. This feature is
analogous t o dynamic in lining t hrough ind irect fu nction calls. It improves p erfor-
mance by elim inat ing f unction call overhead and by creat ing the opp ortuni tyfor
optimization across fun ctionboundaries.
4.3.1 Pa rameterizedLibrary Functio ns. D ynam ic fu nction comp osi tionisusef ul
when writing and using library f unctions that would normally be parameterized
with f unction p ointers, such as many mathematical and st andard C lib rary rou-
tines. The `C co de for N ewton' s met ho d [Press et al. 1992] in Figure 20 i llustrates
its u se. The f unction newton takes asarguments t he maximu m allowednumber of
iterations, a tolerance, an init ial estimate, and two p ointers t o fun ctions th at return
csp ecstoevaluate a f unction and its derivative. In the calls f p0 and fprimep0 ,
p0 is passed asavsp ec argument. The csp ecsretu rned by these fun ctions are in-
corp orated directly int o th e dynamical ly generated co de. As a result, there i s no
fun ction call overhead, and inter-csp ec opt imizationcan o ccur du ring dynamic co de
generation.
4.3.2 Network P rotocol La yers. Another i mp ortantapp lication of dynamic inlin-
ing is t he op timizationof networking co de. The mo dular comp osition of di erent
proto col layers has long b een a goal in t he networking community [Clark and Ten-
nenhouse 1990]. Eachproto collayerfrequently involves data manipulation op era-
tions, suchaschecksumming and byte-swapping. Since p erformi ng mu ltiple data
manipulation passes is exp ensive, it is desirable to comp osethelayerssothat all
the data handling o ccurs in on e phase [ Clark and Ten nenhouse 1990].
`C can be used to construct a network subsystem that dynamically integrates
proto col data op erations into a single pass over memory e.g., by incorp orating
encrypt ion and compression into a single copy operation. A simple design for
suchasystem d ivi des each data-manipulation stage into pipes th at each consume
a single in put and pro duce a single out put. These pip es can then b e comp osed and
incorporated into a data-copying lo op. The design includes t he abilitytosp ecify
prologue and epilogue co de t hat is executed b efore and aft er t he data-copying lo op,
resp ect ively. As a result, p ip es can manipulate state and make end-to-end checks,
such as ensuring that a checksum is valid af ter it has b een computed.
The pip e in Figure 21can be used to do byte-swapping. Since a byte swapp er
30 Poletto , Hsieh, Eng ler, K aas hoek
typ e def dou ble csp ec d ptr doub levsp ec ;
/ Dyn amica l ly create a Newton Ra ph son routine. n: m ax num berofiteratio ns;
tol : m aximum to lerance; p 0: initia l estim ate; f: function to so lve; fp rim e: d erivative of f. /
p0, dptr f, dpt r fprime f double n ewto nint n , do uble to l, dou ble usr
void c sp ec c s = `f
inti; dou ble p, p 0 = usr p0;
for i=0;i < $n; i++ f
p=p0 fp0 / fprime p0 ; / Compose cspecs retur ned by f a nd fprim e /
if abs p p0 < t ol return p; / Return resul t if we've co nverged eno ugh /
p0=p;/ Seed the next iteratio n /
g
error\ me tho d failed a ft er d ite rat ionsnn" , i;
g;
return compile cs,doub le ; / Co mp ile,ca l l, and return the result. /
g
/ Function tha t constructs a cspectocom pute fx = x+1 ^2 /
double c sp ec fd ouble vsp e c x f return ` x + 1.0 x + 1.0 ; g
/ Function tha t constructs a cspectocalcu late f'x = 2x+1 /
double c sp ec fprime dou ble vsp ec x f return ` 2.0 x + 1.0; g
/ Ca l l newto n to sol veanequation /
newt on void f printf \Ro ot isfn n", newt on 100, .0 000 01, 10., f, fprime ; g void use
Fig. 2 0. `C co de to c re ate and u serou tines t o use Newt on's me tho d fo r solving p o lyn omials. This
2
ex ample c omput es t he ro ot of t he fu nct ion f x=x +1 .
unsig ned c sp ec byte swapu nsig ned v sp ec input f
return ` input 24 j inpu t & 0x 00 8 j
input 8 & 0 x 00 j inpu t 24 & 0x ;
g
/ ``Bytesw ap'' ma inta ins no sta te and so needs no initial or na l code /
initialvoid f return `fg; g void csp e c byt eswap
na lvoid f return `fg; g void csp e c byt eswap
Fig. 21. A samplepipe: bytes wap ret urns a csp ec for c o de th at byt e-swaps in pu t.
do es not need to maintain any state, t here is no need to specify init ial and nal
co de. The byte swapp er sim ply consists of t he \consu mer"routine that manipulates
the data.
To construct the integrated data-copying routin e, t he initial, consumer, and -
nal csp ecsof each pip e are comp osed wit h the corresp onding csp ecsof the p ip e's
neighbors. The comp osed init ial co de is placed at the beginning of the resulting
routin e; the consumer co de is inserted in a lo op, and comp osed with co de which
provides it wit h inpu t and stores it s outp ut; and t he nal co de is placed at the end
of the routin e. Asim pli ed versionwould lo ok li ke the co de fragmentinFigure 22.
In a mature implementation of this co d e,wecould f urt her improve p erformance by
unrollin g the data-copying loop. Addit ionally, pip es would take i nputs and out puts
of di erent sizes t hat th e comp osit ion f unction wou ld reconcile.
`C and tcc: A Language and Compiler for Dynamic Co de Generation 31
typ e def void c spec vpt r;
typ e def unsigne d csp ec u ptr unsigned c sp ec ;
/ Pipe structure: contains po inters to functions tha t return cspecs
fo r th e initializa tio n, pipe, a nd na lization cod e of each p ipe /
struct pip e f
vp tr initial, n al; / initial a nd na l code /
up tr p ipe; / pipe /
g;
/ Return cspecthat resu ltsfrom compo sing the givenvector o f p ipes /
void csp e c co mp osestruct pip e plist, intnf
struct pipe p;
intvspec nwords = paramint, 0; / Arg 0: input size /
un signe d vsp e c input = p aram unsigne d , 1; / Arg1:pipein put /
un signe d vsp e c ou tput = paramunsigne d , 2; / Arg2:pipe outpu t /
void c sp ec initial = `fg ,cspec nal = `fg;/ Prol ogu e a nd epilogue code /
un signe d csp e c pip es = `inpu t[i];/ Ba se p ipe inp ut /
for p=&plist[0]; p < &plist [n]; p++ f / Compo se a l l stages together /
init ial=`f @initial; @p!initial; g;/ Com po se initia l statements /
pipes =`p!pip e pip es; / Co mpos e p ipes: on e p ipe's ou tputisthe next one's inpu t /
nal = `f @ nal; @p! nal ; g ;/ Com po se nal sta tements /
g
/ Create a fun ction w ithinitial s tatem ents rst, co nsumer statements
seco nd, an d nal statements l ast. /
return `f inti;
@init ial;
for i=0;i< nwords; i++ o utp ut[i] = pip es;
@ na l;
g;
g
Fig. 22 . `Ccodefor c omp osing dat a p ipes.
4.4 Compilation
`C's imp erative approach to dynami c co de generationmakes it well-suited for writ-
ing com pilers and compilin g interpreters. `C helps to make such programs both
ecient and easy to writ e: the program mer can f o cus on parsing, and leave the
task of code generationto`C.
4.4.1 Doma in-Speci c La ngua ges. Small, domain-sp eci c languages can b ene t
from dynamic compilation. The small query languages used to search databases
are one class of such languages [Kep p el et al. 1993]. Since databases are large,
dynamically compiled queries will usuall y b e applied many times, wh ichcaneasily
pay for the cost of dynamic co de generation.
query takesavect orof Weprovideatoy example in Figure23. Th e funct ion mk
queries. Each query contains the following elements: a database record eld i.e.,
CHILDREN or INCOME;avaluetocompare t his eld to; and the op eration to use
query dynamically in the comparison i. e., <; >; etc. . Given a query vector, mk
creates a query function whichtakes a database record asanargumentand checks
whether that record satis es all of the constraint s in th e query vector. This check
32 Poletto , Hsieh, Eng ler, K aas hoek
typ e def enum f INCOME, CHI LDREN / ... / g que ry; / Query type /
typ e def enum f LT, LE, G T, G E, NE, EQ g bool op ; / Comparison operation /
struct que ry f
qu eryrec ord e ld; / F ield to us e /
un signe d val; / Val ue tocompareto /
op b o ol op; / Co mpar ison operation / bo ol
g;
struct re cord f int inc ome; intchildren; / ... / g;/ Simp le da tabase record /
/ Function tha t takes a po inter to a d atabase record and returns 0 o r 1,
depend ing o n wh ether the recordmatch es th e query /
typ e def intip trstruct re cord r;
ipt r mk que ry struct que ry q, intnf
int i, c sp ec eld, c sp ec e xpr = `1; / I nitia lize th e boolean exp ression /
struct rec ord vsp ec r = paramstruct rec ord , 0; / Record to examine /
for i=0;i< n; i++ f / Bu il d the rest of the boolean express ion /
switch q[i].re cord eld f / Load th e a pp rop riate eld val ue /
case INCOME: e ld = `r!inco me; break;
case CHI LDREN: e ld = `r!children ; break;
/ ... /
g
switch q[i].bool op f / Co mpare the eld va lue to runtime cons ta nt q[i] /
case LT: e xpr = `e xpr && eld < $q[i].val; break;
case EQ: ex pr = `ex pr && e ld == $q[i].val; break;
case LE: e xpr = `e xpr && e ld $q[i].val; break;
/ ... /
g
g
return iptrc ompile `f return e xpr; g,int ;
g
Fig. 23. Compilat ion of a small qu ery lang uage .
`C and tcc: A Language and Compiler for Dynamic Co de Generation 33
enum f WHI LE, I F, ELSE, ID, CONS T, LE, GE, NE, EQ g; / Multi charactertokens /
int exp ect ; / Consu me th e given tokenfrom the inpu t stream, o r fa il if not found /
int csp e c exp r; / Pars e u nary exp ressio ns /
int get to k ; / Consu me a token from th e inp ut stream /
intlo ok ; / Peek at the next tokenwithout co nsuming it /
t ok; / Current token / int cur
sym ; / Given a token, return correspo nding vspec / int vsp ec lo okup
void csp e c stmt f
int c sp ec e = `0; voidcspecs=`fg ,s1=`fg,s2=`fg ;
switch get tok f
case WHI LE: / 'wh ile' '' expr '' stmt /
exp e ct `'; e = exp r ;
exp e ct `'; s = stmt;
return `f whilee @s; g;
case IF: / 'if' '' exp r '' stmt f 'else' stmt g/
exp e ct `'; e = exp r ;
exp e ct `'; s1 = st mt ;
if lo o kELS E f
ge tt ok ; s2 = st mt ;
return `f if e @s1; else @s2; g;
g else return `f if e @s1; g ;
case `f': / `f 'stmt `g ' /
while !lo ok `g 's=`f @s; @stmt ; g;
return s;
case `;': return `fg;
case ID: f / ID `=' expr `;' /
int vsp ec lvalue = lo okup symc ur to k;
exp e ct `=';e=expr ; exp ect `;' ;
return `f lvalue = e; g;
g
de fault: parse err\ exp e ct ing statement";
g
g
Fig. 24 . A sample stat ementparse r from a c ompiling int erprete r writt en in `C.
is implemented simply as an expression which comput es th e conjuncti on of the
given constraints. The query f unction never references the query vect or, since all
the values and comparison op erations in the vect orhave b een hard-co ded into t he
dynamic co de's instruct ion stream.
The dynamically generated co de exp ects one incoming argum ent, the database
record tobecomp ared. It then \seeds" t he b o olean expression: since weare build-
ing a conjuncti on, t he initial value is 1. The lo op then t raverses the queryvector,
and builds up the dynamic co de f or the con junction accordin g to the elds, values,
and comparisonoperations describ ed in the vector. W hen t he csp ec for the b o olean
expressionisconstructed, mk query com piles it and retu rns a funct ionpointer. That
optimized funct ioncan b e applied to databaseentriestodetermin e wheth er they
match t he given constraints.
34 Poletto , Hsieh, Eng ler, K aas hoek
4.4.2 C ompi ling Interpreters. Comp iling interpreters also known as JIT com-
pilers are im p ortant pieces of technology: they combine the exibility of an in-
terpret ed programming environment wit h th e p erformance of compiled co de. For
a given piece of co de, the user of acom piling interpreter paysaone-tim e cost of
com pilation, which can b e roughly comparable t o thatofinterpretation. Every sub-
sequent use of that co de emp loys the compiled version, whichcanbemuchfaster
than t he interpreted version. Comp iling interpreters are also useful in systems such
as Java, in which\just-in -t ime" comp ilersare common ly used.
Figure 24 contains a fragmentofcode forasi mple com piling interpreter writt en
in `C. Th is interpreter t ranslates a simple subset ofC:asitparses the program, i t
builds up a csp ec that represents it.
5. EVALUATIO N
`C and tcc supp ort ecient dynamiccodegeneration. In particular, the measure-
ments in t his section demonstrate the following results:
|By using `C and tcc,wecanachievegoodsp eedups relative to static C. Sp eedups
by a factoroftwotofour are common for the programs t hatwehave describ ed.
|`C and tcc do not imp ose excessiveoverhead on p erformance. The cost of dynamic
com pilation is usually recovered in und er100 runs of a b enchm ark; som etimes,
this cost can b e recovered in one run ofabenchmark.
|The tradeo between dynamic co de quality and dynamic code generati onsp eed
must b e made on a p er-app lication b asis. For some applications, it is b etter to
generat e co de faster; for ot hers, it is b etter to generate b et ter co de.
|Dynamic co de generation can result in large sp eed ups when it enables large-
scale optimization: when interpret ation can be eliminated, or when dynamic
inlini ng enables f urt her opt imization. It provides smaller sp eedups if only l o cal
optimizations, such as strength reduct ion, are performed dynamically. In such
cases, the cost of dynamic co de generationmay outweigh its b ene ts.
5.1 Exp erimental Metho dology
The b enchmarks that wemeasurehave b een describ ed in previous sections. Ta-
ble I I brie y summarizes each b enchmark, and lists t he section in whi ch it app ears.
The p erforman ce improvements ofdynamic co de generation hinge on customizing
co de to data. A s a result, the p erformance of all of t he b en chmarks in this section
is data-dep en denttosome d egree. In particular, the amount of co de generated by
the b enchmarks i s in some cases dep end ent on t he inp ut data. For example, since
dp generates co de t o comp ute t he dot pro duct of an input vector with a run-t ime
constantvector, the size of t he dynami c co de and hence its i-cache p erformance is
dep end enton the size of the run-time constantvect or. It s p erformance relativeto
static co de also dep ends on the densityof0sintherun -t ime constantvect or, since
those elements are opt imized out when generating the dynamic co de. Sim ilarly,
binary, and dfa generate m ore co de for larger input s, whichgenerally im proves their
p erformance relative to equivalent static co de until negative i-cache e ects come
into pl ay.
Some ot her b enchmarks { ntn, ilp,and query {involvedynamic f unction inlining
that is a ected by i nput d ata. For example, the co de inlin edinntn dep ends onthe
`C and tcc: A Language and Compiler for Dynamic Co de Generation 35
Benchmark Description Sec tion Page
ms Scale a 100 x100 mat rix by the inte ge rsin[10; 1 00] 2.3.2 8
hash Ha sh t able, c onsta nt ta ble size , scat ter value, a nd 4.1.1 22
hash tab le p oint er: one hit and one miss
dp Dot pro duct with a run-time c onsta ntvec to r: 4.1.2 23
len gth 40 , one -third zero e s
binary Bina ry search on a 16-element con stantarray; 4.1.3 23
one hit and one miss
pow Exp on entiat ion of 2 by the int ege rsin[10; 40] 4.1.4 24
dfa Finite st ate machine compu tat ion 4.1.5 25
6 st ate s, 13 transitions, input leng th 16
he ap He apsort, paramete rize d with a sp ec ialized swap: 4.1.6 26
500-entryarrayof12-by te struct ures
mshl Marshal ve arguments into a byte vector 4.2 28
un mshl Unmarsha l a byte vect or, and ca ll a funct ion of 4.2 28
ve arguments
2 9
nt n Ro ot of f x=x +1 to a toleranc e of 10 4.3.1 30
ilp Int egrat ed copy,check sum,byt eswap of a 16 KB bu e r 4.3.2 31
qu ery Qu ery 2 000 reco rds wit h se ven binary c omparisons 4.4.1 32
Table I I. De script ionsofbenchmarks.
fun ction tobecom puted, that in ilp on the nature of the proto col stack, and that
in query on t he typ e of query submit ted by the user. The advantage of dynamic
co de over static co de i ncreases with the opp ort unity for inlining and cross-f unction
optimization. For example, an ilp proto colstack comp osed from many small passes
will p erform relatively b etter in dynamic co de that one comp osed from a few larger
passes.
Lastl y, a few b enchmarks are relatively data-indep endent. pow, heap, mshl, and
unmshl generate varying amounts of co de dep ending, resp ectively, on t he exp onent
used, or t he typ e and sizeoftheob ject s b eing sorted, marshaled or unmarshaled,
but the di erences are small for m ost reasonable i nputs. ms obtains p erformance
improvements byhard-wiring lo op b ounds and strength-reducing mult iplicationby
the scale factor. hash makes similar optimizations when comput ing a hash f unction.
The values of run-time constants maya ect p erformance to some degree for ex-
ample, excessively large constants are not u sef ul for this sortofop timization, but
such e ects aremuchsmaller than t hose of m ore large-scale dynamic optim izations.
Each b enchm ark waswritt en b oth in `C and in static C. Th e `C programs were
com piled with b oth the vcode and the icode-based tcc back ends. When measur-
ing the p erformance oftheicode runtim e system, wealways em ployed li nearscan
register allo cation with live intervals derived from livevariable inf ormation. The
static C programs werecom piled b oth wi th the lcc compiler and wit h the G NU C
com piler, gcc. The co de-generat ing f unctions used for d ynamic co de generation are
created from the lcc interm ediate representation, using that compiler's co de gener-
ation strategies. As a result , t he p erformance of lcc-generat ed co de should b e used
as the baseline to measure t he impact of dynamic co de generation. Measurements
coll ect ed usin g gcc serve to compare tcc to an optim izing com piler of reasonable
quality.
The machin e used for m easurements is a Su n Ultra 2 Mo d el2170 workstation wit h
36 Poletto , Hsieh, Eng ler, K aas hoek
384MB of main memoryand two168 MHz U ltraSPARC-I CPUs. Th e UltraSPARC-
I can issue up to 2 integer and 2 oating p oint instructions per cycle, and has a
wri te-through , non-allo cat ing, direct -mapp ed , on-chip 16KB cache. It implements
the SPARCversion9architecture [SPARCInternational1994]. tcc also generates
co de f or t he M IPS family of pro cessors; werep ort only SPARC measurements f or
clarity,sin ceresult s on t he twoarchitectu res are similar.
Times were obtained bymeasuring a large number oftrials | enough t o provide
several seconds ofgranularity, wit h negligibl e stand ard deviations | using the Unix
getr usage system call. The number of trials varied from100 to 100000, dep endin g on
the b enchmark. The result ing t imes were then divided by the number of iterations
to obt ain t he average overheadof a single run. This formofmeasurement ignores
the e ect s of cache re ll misses, but is representative of how these app lications
would likely b e used for exam ple, in t ight inner lo ops.
Section 5.2 discusses t he p erformance e ects of using d ynamic co de generation:
sp eci cally, the sp eedup of dynam ic co de relative to static co de, and the overheadof
dynamic co d e generation relativetosp eedup. Section 5.3 presents break downs of
the dynamic compilation overhead ofboth vcode and icode in u nits of pro cessor
cycles p er generated instruction.
5.2 Per formance
This section shows that tcc provides low-overhead dynamic co de generation, and
that it can be used to sp eed up a number of b enchmarks. We describ e results
for th e b enchm arks in Table I I and for xv , a freely available image manipulation
package.
Wecomput e the sp eedup due to dynamiccodegenerationby dividing the t ime
required to run the static co de by the time required to run the corresp onding
dynamic co de. We measure overhead bycalculating each b enchmark's \cross-over"
p oint, if one exists. This p ointisthenumber of tim es that dynamic co de must b e
used so that the overhead of dynamic co de generation equals th e tim e gained by
running t he dynamic co de.
The performance of dynamic co de is up to an order of magnitude b etter than
that of unoptimi zed st atic co de. In many cases, the p erformance imp rovementof
using dynamiccodegeneration can b e amortized over fewer than ten runs of t he
dynamic co de. The b enchmarks that achieve the highest sp eedu ps are those in
whichdynamic informati onallows the moste ectiverestructuring of co de relative
to the static version. The main classes of such b enchmarks are num erical co de
in which part icularvalues allowlargeamounts of work to b e optimized awayfor
examp le, dp , co de in which an exp ensivelayer of data structure interpretation can
be removed at run tim e f or examp le, query, and code in which inlinin g can be
p erformed dynamically but not stat ically for examp le, ilp.
vcode generates co de app roximately three to eight times more quickly than
icode. Nevertheless, the co de generated by icode canbeconsiderably f aster than
that generated by vcode. A programmer can cho ose between the two systems
to trade co de quality for co de generation sp eed, dep ending on the needs of the
application.
`C and tcc: A Language and Compiler for Dynamic Co de Generation 37
icode-lcc vcode-lcc 10 icode-gcc vcode-gcc
5 Speedup (static time / dynamic time)
0 ms hash dp binary pow dfa heap mshl unmshl ntn ilp query
Benchmark
Fig. 25. Sp ee dup of dy namic c o de over sta tic co de .
5.2.1 Speedup. Fi gure 25 shows that usin g ` C and tcc improves t he p erformance
of almost all of our b enchmarks. Both in this gure and in F igure 26, the legend
indicates which stat ic and dynamic com pilersare b ein g compared. ico de-lcc com-
pares dynami c co de created with icode to static co de compiled wit h lcc; vco de-lcc
compares dynami c co d e created wit h vcode to static co de compiled with lcc. Sim-
ilarly, ico de-gcc compares icode to static code compiled with gcc, and vco de-gcc
compares vcode to static co de compiled with gcc.
In general, dynamic co de is signi cant ly faster than static co de: sp eed ups bya
factor oftworelativetothebest co de emitt ed by gcc arecommon. Unsurprisingly,
the co de pro duced by icode is faster than that pro duced by vcode, by up to
50 in some cases. Also, the GN U compiler generates b ett er code th an lcc, so
the sp eedups relative to gcc are almost always smaller than th ose relative to lcc.
As mentioned earlier, however, the basis for comp arison should be lcc, since the
co de-generating f unctions are generated byan lcc-style back end, whichdoesnot
p erform static opti mizati ons.
Dynamic code generation do es not pay o in only one b enchmark, unmshl. In
this b enchmark, `C provides f unctionality that d o es n ot exist in C. The static code
used f orcomparison implements a sp ecial caseof the general functionality provided
by t he `C co de, and it is verywell tu ned.
5.2.2 Cro ss-o ver. Figure 26 indicates that the cost ofdynamic co de generation
in tcc is reasonably low. The cross-over p oint on the vertical axis is the number
of times that the dynami c co de must b e usedinorder for the total overheadofits
38 Poletto , Hsieh, Eng ler, K aas hoek
10000 icode-lcc vcode-lcc 1000 icode-gcc vcode-gcc 100
10
1
0.1
0.01 Cross-over point (number of runs, log scale)
0.001 ms hash dp binary pow dfa heap mshl unmshl ntn ilp query
Benchmark
Fig. 2 6. Cross-over p oints, in numberofruns. I f the cross-over p oint do e s not exist, th e ba r is
omitt ed.
com pilation and uses to be equal to the overhead of the same number of uses of
static co de. This numberisameasure of how quickly dynamic co de \pays for itself."
For all b enchmarks except query, one u seofdynamic co de corresp ondstoone run
of th e dynami cally created function. In query,however, t he dynam ic co de is used
as a small p art of t he overall algorit hm: it is t he test function u sed to determine
whether a recordinthedatabase matches a particular query. As a result, in that
case we de ne one use of the dynamic co de to b e one run of the searchalgorith m,
which corresponds to many invo cations one per database entry of the dynamic
co de. This met ho dology realistically measures how sp ecializationisused i n these
cases.
In the caseof unmshl, the dynamiccodeisslower than th e stat ic one, so the cross-
over point never occurs. Usually, however, the p erformance bene t of dynamic
co de generati on o ccurs after a few hundred or fewer runs. In some cases ms,
heap, ilp, and quer y, the dynamic co de pays for itself af ter only one run of the
b enchmark. In ms and heap, this o ccurs b ecause a reasonable probl em size is large
relative to the overheadofdynamic compilation, so even small improvement s in run
tim e from strength reduction, lo op unrolling, and h ard-wiring p ointers outweigh
the co de generation overhead. In addit ion, ilp and query exemplif y the typ es of
applications in which dynamic code generation can be most useful: ilp b ene ts
fromext ensive dynamic fu nction inlining th atcannot b e p erformed statically, and
query dynamically removes alayer of interpret ation inherent in a database query
language.
Figures 25 and 26 show how dynamic compilation sp eed can be exchanged for
`C and tcc: A Language and Compiler for Dynamic Co de Generation 39
Convolutionmask pixe ls Times seconds
lcc gcc tcc icod e DCG overhe ad