Inline Function Expansion for Compiling C Programs

Inline Function Expansion for Compiling C Programs Wen-meiW. Hwu and Pohua P. Chang Coordinated Science Laboratory University of Illinois 1101 W. Springfield Ave. Urbana, IL 61801 (217) 244-8270 [email protected] Abstract 1. Introduction Inline function expansion replaces a function call Large computing tasks are often divided into many with the function body. With automatic inline function smaller subtasks which can be more easily developed, expansion, programs can be constructed with many small understood, and reused. Function definition and invoca- functions to handle complexity and then rely on the com- tion in high level languages provide a natural means to pilation to eliminate most of the function calls. Therefore, define and coordinate subtasks to perform to original task. inline expansion serves a tool for satisfying two Structured programming techniques therefore encourage conflicting goals: minizing the complexity of the program the use of functions. Unfortunately, function invocation development and minimizing the function call overhead of disrupts compile-time code optimization such as register program execution. A simple inline expansion procedure allocation, code compaction, common subexpression is presented which uses profile information to address elimination, and constant propagation. The decreased three critical issues: code expansion, stack expansion, and effectiveness of these optimization techniques increases unavailable function bodies. Experiments show that a memory accesses, decreases pipeline efficiency, and large percentage of function calls/returns (about 59%) can increases redundant computation, be eliminated with a modest code expansion cost (about Emer and Clark reported, for a composite VAX 17%) for twelve UNIX* programs. workload, 4.5% of all dynamic instructions are procedure calls and retumsl. If we assume equal numbers of call and return instructions, the above number indicates that * UNIX is a trademark of the AT&T Bell Laboratories. there is a function call instruction for every 44 instructions executed. Berkeley RISC researchers have reported that procedure call is the most costly source language state- ment2. 1.1. Existing Remedies Some recent processors provide hardware support for minimizing the extra memory accessesdue to function calls. For example, the Berkeley RISC processors provide Permission to copy without fee all or part of this material is granted provided that overlapping register windows to reduce the number of the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is memory accessesrequired to save/restore registers and to given that copying is by permission of the Association for Computing Machinery. pass parameters2. Another example is the CRISP proces- To copy other+& or to republish, requires a fee and/or specific permission. 0 I989 ACM O-8979 I -306-X/89/0006/0246 $1.50 sor that uses stack buffers to capture the memory accesses 246 to local variables so that register allocation crossing func- piler, the programmers can use the keyword inline as a tion calls can be simulated in hatdware3. The problems hint to the compiler for inline expanding function calls16 . with these hardware approaches are that they tend to con- In the MIPS C compiler, the compiler examines the code sume a signitlcant amount of hardware, stretch the proces- structure (e.g. loops) to choose the function calls for sor cycle time, and provide little assistance for enlarging inline expansion15. It should be noticed that the careful the scope of compiler code optimization. use of the macro expansion and language preprocessing In the software realm, inter-procedural register allo- utilities has the same effect as inline expansion, when cation schemes have been shown to reduce the register inline expansion decisions are made entirely by the pro- save/restore cost across function call boundaries4. Callers grammers. and callees can also communicate parameters and results The IMPACT-I (Illinois Microarchitecture Project through a small number of registers’. Wall has shown using Advanced Compiler Technology - stage I) C com- that combining profiling and inline expansion, link-time piler expands function calls to increase the effectiveness register allocation is comparable in performance to of compiler code optimization*7-19. As far as the hardware register window scheme@. Inter-procedural hardware is concerned, the goal of inline expansion is to analysis have been shown to be effective in reducing the reduce the function calls so that mechanisms such as negative effects of function calls on the code scheduling register windows and stack buffers become unnecessary. and other code optimization techniques. These software For compiler code optimization, the inline expansion remedies assume that frequent function calls can not be serves to enlarge the scope of register allocation, code avoided. If most of the function calls can be eliminated, scheduling, and other optimizations. The IMPACT-I these complicated remedies would be unnecessary. Profiler to C Compiler interface allows the profile information to be automatically used by the IMPACT-I C 1.2, Inline Function Expansion Compiler. The inline expansion is based on execution Inline expansion replaces a function call with the profile information to ensure that only the important func- function body. Inline expansion removes the function tion calls are expanded. It is critical that the inputs used calls/returns costs and provides larger and specialized for executing the equivalent C program are representative. execution plans to the code optimizers. With automatic Therefore, this approach is more suitable for characteriz- inline function expansion, the advantages of using func- ing realistic programs for which representative inputs can tions remain and the costs are reduced. In a recent study, be easily collex3ed. Allen and Johnson identified inline expansion as an essential part of an optimizing C compiler7-lt. They gave a 1.3. Organization of Paper few critical reasons for implementing inline expansion. Section two discusses major implementation issues First, the variable abasing problem becomes less onerous and potential hazards. Section three describes a simple after inline expansion. Second, the code optimizer can inline expansion procedure. Section four shows the work on the real effects of the callee after inlining. Third, experimental results. Finally, we give some concluding inlining function calls contained in loops may increase the remarks in section five. opportunities for vectorization. Several code improving techniques may be applica- 2. Implementation Issues ble after inline expansion. These include register alloca- The core inline expansion algorithm is fairly sim- tion, code scheduling, common subexpression elimination, ple. Most of the implementation difficulties are due to constant propagation, and dead code elimination. hazards, missing information, and minimizing the compi- Richardson and Ganapathi have discussed the effect of lation time. The implementation issues are listed as fol- inline expansion and code optimization across pro- lows. ceduresJ2. - Determine when inline expansion should be applied. Many optimizing compilers can perform inline - Propose an appropriate program representation. expansion1 l* 13-16. IBM PL.8 compiler does inline expan- - Select call sites for inline expansion, sion of all leaf-level procedures 13 . In the (3Jl-J C corn- -Avoid hazards. 247 - Prevent code eqhsion. main is the first function executed in this program. - Prevent control stack overflow. Each node contains three major pieces of informa- - Evaluate cost. tion: 1) the body of the function, 2) the node weight, and - Inline a funCrion. 3) a set of outgoing arcs which points to the callees. Each - Code duplication. arc contains four essential pieces of information: 1) a -Variable retuaming. unique identifier, 2) the caller, 3) the callee, and 4) the arc - Changes to variable de&rations. weight. It is necessary to assign each arc a unique - Resolve missing information. identifier because there may be several arcs between the - Ehinate unreachablejimctions. same pair of caller and callee. Later, we will also associ- - Reduce the number of expanviom. ate each arc with a status attribute which tells us whether - Order the expansion sequence. this arc should be considered for inline expansion, rejected for inline expansion, or inline expanded. 2.1. When to Perform Mine Function Expansion The node weights and arc weights may be deter- Inline expansion should be performed before other mined either by program structure analysis or by profiling. code optimizations, such as constant folding and dead Since a node may be entered from any one of its incoming code elimination. Therefore, tt is natural to perform inline arcs, it is necessary to know the weights of all outgoing expansion at compile-time. Another advantage of per- arcs associated with a particular incoming arc. Therefore, forming Mine expansion at compile-time is that the pro- after inline expansion the arc weights remain accurate. It gram structures, such as loop and if-then-else. are visible is assumed in the remaining part of this section that we and can be used to make inline expansion decisions. Yet have complete and accurate weights for all nodes and another advantage is that if inline expansion can be arcs. applied to a system-independent three-address code, inline The most

Load more