Pro Le-Guided Automatic Inline Expansion for C Programs

Pro le-Guided Automatic Inline Expansion for C Programs Pohua P. Chang, Scott A. Mahlke, William Y. Chen and Wen-mei W. Hwu Center for Reliable and High-p erformance Computing Co ordinated Science Lab oratory University of Illinois, Urbana-Champaign SUMMARY This pap er describ es critical implementation issues that must b e addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program mo di cation. An automatic inter- le inliner that uses pro le information has b een implemented and integrated into an optimizing C compiler. The exp erimental results show that this inliner achieves signi cant sp eedups for pro duction C programs. Keywords: Inline expansion, C compiler, Co de optimization, Pro le information INTRODUCTION Large computing tasks are often divided into many smaller subtasks which can b e more easily develop ed and understo o d. Function de nition and invo cation in high level languages provide a natural means to de ne and co ordinate subtasks to p erform the original task. Structured programming techniques therefore encourage the use of functions. Unfortunately, function invo cation disrupts compile-time co de optimizations such as register allo cation, co de compaction, common sub expression elimination, constant propagation, copy propagation, and dead co de removal. Emer and Clark rep orted, for a comp osite VAX workload, 4.5 of all dynamic instructions are function calls and returns [1]. If we assume equal numb ers of call and return instructions, the ab ove numb er indicates that there is a function call instruction for every 44 instructions executed. Eicke- meyer and Patel rep orted a dynamic call frequency of one out of every 27 to 130 VAX instructions 1 To app ear - Software Practice & Exp erience, John-Wiley 2 [2]. Gross, et al., cited a dynamic call frequency of one out of every 25 to 50 MIPS instructions [3]. Patterson and Sequin rep orted that function call is the most costly source language statement [4]. All these previous results argue for an e ective approach to reducing function call costs. Inline function expansion or simply inl ining replaces a function call with the function b o dy. Inline function expansion removes the function call/return costs and provides enlarged and sp e- cialized functions to the co de optimizers. In a recent study, Allen and Johnson identi ed inline expansion as an essential part of a vectorizing C compiler [5]. Schei er implemented an inliner that takes advantage of pro le information in making inlining decisions for the CLU programming language. Exp erimental results, including function invo cation reduction, execution time reduction, and co de size expansion, were rep orted based on four programs written in CLU [6]. Several co de improving techniques may b e applicable after inline expansion. These include register allo cation, co de scheduling, common sub expression elimination, constant propagation, and dead co de elimination. Richardson and Ganapathi have discussed the e ect of inline expansion and co de optimization across functions [7]. Many optimizing compilers can p erform inline expansion. For example, the IBM PL.8 compiler do es inline expansion of low-level intrinsic functions [8]. In the GNU C compiler, the programmers can use the keyword inl ine as a hint to the compiler for inline expanding function calls [9]. In the Stanford MIPS C compiler, the compiler examines the co de structure e.g. lo ops to cho ose the function calls for inline expansion [10]. Parafrase has an inline expander based on program structure analysis to increase the exp osed program parallelism [11]. It should b e noted that the careful use of the macro expansion and language prepro cessing utilities has the same e ect as inline expansion, where inline expansion decisions are made entirely by the programmers. Davidson and Holler have develop ed an automatic source-to-source inliner for C [12] [13]. Be- To app ear - Software Practice & Exp erience, John-Wiley 3 cause their inliner works on the C source program level, many existing C programs for various computer systems can b e optimized by their inliner. The e ectiveness of their inliner has b een con rmed by strong exp erimental data collected for several machine architectures. In the pro cess of developing an optimizing C compiler, we decided to allo cate ab out 6 man- months to construct a pro le-guided automatic inliner [14]. We exp ect that an inliner can enlarge the scop e of co de optimization and co de scheduling, and eliminate a large p ercentage of function calls. In this pap er, we describ e the ma jor implementation issues regarding a fully automatic inliner for C, and our design decisions. Wehave implemented the inliner and integrated it into our prototyp e C compiler. The inliner consists of approximately 5200 lines of commented C co de, not including the pro ler that is used to collect pro le data. The inliner is a part of a p ortable C compiler front-end that has b een p orted to Sun3, Sun4 and DEC-3100 workstations running UNIX op erating systems. CRITICAL IMPLEMENTATION ISSUES The basic idea of inlinin g is simple. Most of the diculties are due to hazards, missing information, and reducing the compilation time. Wehave identi ed the following critical issues of inline expansion: 1 Where should inline expansion b e p erformed in the compilation pro cess? 2 What data structure should b e employed to represent programs? 3 How can hazards b e avoided? 4 How should the sequence of inlinin g b e controlled to reduce compilation cost? 5 What program mo di cations are made for inlining a function call? A static function cal l site or simply cal l site refers to a function invo cation sp eci ed by the To app ear - Software Practice & Exp erience, John-Wiley 4 static program. A function cal l is the activityofinvoking a particular function from a particular call site. A dynamic function cal l is an executed function call. If a call site can p otentially invoke more than one function, the call site has more than one function call asso ciated with it. This is usually due to the use of the call-through-p ointer feature provided in some programming languages. The caller of a function call is the function which contains the call site of that function call. The callee of a function call is the function invoked by the function call. a.c b.c c.c d.c parsing parsing parsing parsing compiler optimization opt opt opt co de gen co de generation co de gen co de gen b.o c.o a.o d.o resolving inter-ob ject linker le reference executable image Figure 1: Separate compilation paradigm. Integration into the compilation pro cess: The rst issue regarding inline function expansion is where inlining should b e p erformed in the translation pro cess. In most traditional program developmentenvironments, the source les of a program are separately compiled into their corresp onding ob ject les b efore b eing linked into an executable le see Figure 1. The compile time is de ned as the p erio d of time when the source les are indep endently translated into ob ject les. The link To app ear - Software Practice & Exp erience, John-Wiley 5 time is de ned as the duration when the ob ject les are combined into an executable le. Most of the optimizations are p erformed at compile time, whereas only a minimal amountofwork to link the ob ject les together is p erformed at link time. This simple two-stage translation paradigm is frequently referred to as the separate compilation paradigm. A ma jor advantage of the separate compilation paradigm is that when one of the source les is mo di ed, only the corresp onding ob ject le needs to b e regenerated b efore linking the ob ject les into the new executable le, leaving all the other ob ject les intact. Because most of the translation work is p erformed at compile time, separate compilation greatly reduces the cost of program recompilation when only a small numb er of source les are mo di ed. Therefore, the two- stage separate compilation paradigm is the most attractive for program developmentenvironments where programs are frequently recompiled and usually a small numb er of source les are mo di ed between each recompilation. There are programming to ols, such as the UNIX make program, to exploit this advantage. Our extension to the separate compilation paradigm to allow inlining at compile time is illus- trated in Figure 2. Performing inline function expansion b efore the co de optimization steps ensures that these co de optimization steps b ene t from inlining. For example, functions are often created as generic mo dules to b e invoked for a variety of purp oses. Inlining a function call places the b o dy of the corresp onding function into a sp eci c invo cation, which eliminates the need to cover the service required by the other callers. Therefore, optimizations such as constant propagation, 1 constant folding, and dead co de removal can b e exp ected to b e more e ective with inlining. Performing inline function expansion at compile time requires the callee function source or 1 Note that it is p ossible to do link-time inline expansion and still p erform global optimization . In the MIPS compiler, for instance, a link phase with function inlini ng can optionall y o ccur b efore the optimizer and subsequent compilation phases [15].

Load more