GCC - An Architectural Overview, Current Status and Future Directions

Diego Novillo Red Hat Canada [email protected]

Abstract GCC one of the most popular compilers in use today. It serves as the system compiler for every Linux distribution and it is also fairly The GNU Compiler Collection (GCC) is one popular in academic circles, where it is used of the most popular compilers available and it for compiler research. Despite this popular- is the de facto system compiler for Linux sys- ity, GCC has traditionally proven difficult to tems. Despite its popularity, the internal work- maintain and enhance, to the extreme that some ings of GCC are relatively unknown outside of features were almost impossible to implement. the immediate developer community. And as a result GCC was starting to lag behind This paper provides a high-level architectural the competition. overview of GCC, its internal modules and how Part of the problem is the size of its code base. it manages to support a wide variety of hard- While GCC is not huge by industry standards, ware architectures and languages. Special em- it still is a fairly large project, with a core phasis is placed on high-level descriptions of of about 1.3 MLOC. Including the runtime li- the different modules to provide a roadmap to braries needed for all the language support, GCC. GCC comes to about 2.2 MLOC1. Finally, the paper also describes recent techno- logical improvements that have been added to Size is not the only hurdle presented by GCC. GCC and discusses some of the major features Compilers are inherently complex and very de- that the developer community is thinking for manding in terms of the theoretical knowl- future versions of the compiler. edge required, particularly in the area of opti- mization and analysis. Additionally, compilers are dull, dreary and rarely provide immediate gratification, as interesting features often take 1 Introduction weeks or months to implement.

Over the last few releases, GCC’s internal in- The GNU Compiler Collection (GCC) has frastructure has been overhauled to address evolved from a relatively modest C compiler these problems. This has facilitated the im- to a multi-language compiler that can generate code for more than 30 architectures. This di- 1Data generated using David A. Wheeler’s ’SLOC- versity of languages and architectures has made Count’.

1 plementation of a new SSA based global opti- lows all directory names are assumed to be rel- mizer, sophisticated data dependency analyses, ative to the root directory where GCC sources a multi-platform vectoriser, a memory bounds live. checker (mudflap) and several other new fea- tures. 2.1 Core This paper describes the major components in GCC and their internal organization. Note that The gcc directory contains the C front end, this is not intended to be a replacement for middle end, target-independent back end com- GCC’s internal documentation. Many modules ponents and a host of other modules needed by are overlooked or described only briefly. The various parts of the compiler. This includes di- intent of this document is to serve as introduc- agnostic and error machinery, the driver pro- tory material for anyone interested in extending gram, option handling and data structures such or maintaining GCC. as bitmaps, sets, etc.

The other front ends are contained in their own subdirectories: gcc/ada, gcc/cp (C++), 2 Overview of GCC gcc/fortran (Fortran 95), gcc/java, gcc/objc (Objective-C), gcc/objcp (Ob- GCC is essentially a big pipeline that converts jective C++) and gcc/treelang, which is a one program representation into another. There small toy language used as an example of how are three main components: front end (FE), to implement front ends. middle end (ME)2 and back end (BE). Source gcc/config code enters the front end and flows through the Directories inside contain all pipeline, being converted at each stage into suc- the target-dependent back end components. cessively lower-level representation forms until This includes the machine description (MD) final code generation in the form of assembly files that describe code generation patterns code that is then fed into the assembler. and support functions used by the target- independent back end functions. Figure 1 shows a bird’s eye view of the com- piler. Notice that the different phases are se- 2.2 Runtime quenced by the Call Graph and Pass managers. The call graph manager builds a call graph for the compilation unit and decides in which or- Most languages and some GCC features require der to process each function. It also drives a runtime component, which can be found at the inter-procedural optimizations (IPO) such the top of the directory tree: as inlining. The pass manager is responsible for sequencing the individual transformations The Java runtime is in boehm-gc (garbage and handling pre and post cleanup actions as collection), libffi (foreign function in- needed by each pass. terface), libjava and zlib.

The source code is organized in three major The Ada, C++, Fortran 95 and groups: core, runtime and support. In what fol- Objective-C runtime are in libada, 2Consistency in naming conventions led to this unfor- libstdc++-v3, libgfortran and tunate term. libobjc respectively.

2 C C++ Java Fortran parser parser parser parser Pass Manager

Front End (FE)

GENERIC

Inlining Constant Propagation (IPCP) Static variable analysis Points-to Middle End (ME) GIMPLE

Flow sensitive and flow insensitive alias analysis Constant Propagation (CCP) Interprocedural Full Redundancy Elimination (FRE) Optimizer (DCE) Forward Propagation Copy Propagation (COPY-PROP) Call Graph SSA Value Range Propagation (VRP) Manager Optimizer Scalar Replacement of Aggregates (SRA) Elimination (DSE) Tail Call discovery Partial Redundancy Elimination (PRE) Loop Optimizations Loop Invariant Motion (LIM) RTL Back End (BE) Loop Unswitching Optimizations If conversion Vectorization RTL Loop Prefetching Optimizer Empty Loop Elimination

Sibling/tail call optimization Assembly Life and Data Flow Analysis Common Subexpression Elimination (CSE) Branch prediction Instruction Combination Mode Switching Register Renaming Peephole Optimizations Branch Shortening Machine Specific Reorganizations

Figure 1: An Overview of GCC

3 The preprocessor is implemented as a separate 2 is the stabilization phase, only minor fea- library in libcpp. tures are allowed and bug fixes that maintain- ers consider safe to include. Stage 3 marks A decimal arithmetic library is included in the preparation for release. During this phase libdecnumber. only bug and documentation fixes are allowed. In particular, these bug fixes are usually re- The OpenMP [14] runtime is in libgomp. quired to have a corresponding entry in GCC’s Mudflap [6], the pointer and memory check bug tracking database (http://gcc.gnu. facility, has its runtime component in org/bugzilla) libmudflap. At the end of stage 3, the release manager will The library functions for SSP (Stack Smash cut a release branch. Stabilization work con- Protection) are in libssp. tinues on the release branch and a release crite- ria is agreed by consensus between the release manager and the maintainers. Release block- 2.3 Support ing bugs are identified in the bugzilla database and the release is done once all the critical bugs have been fixed3. Once the release branch is Various utility functions and generic data struc- created, Stage 1 for the next release begins. tures, such as bitmaps, sets, queues, etc are im- plemented in libiberty. The configuration and build machinery live in build and vari- Using this system, GCC is averaging about a ous scripts useful for developers are stored in couple of releases a year. Once version X.Y is contrib. released, subsequent releases in the X.Y series continues. In this case, another release manager takes over the X.Y series, which accepts no new 2.4 Development Model features, just bug fixes.

Major development that spans multiple releases All the major decisions in GCC are taken by is done in branches. Anyone with write access the GCC Steering Committee. This usually to the GCC repository may create a develop- includes determining maintainership rights for ment branch and develop the new feature on contributors, interfacing with the FSF, approv- the branch. When that feature is ready, they ing the inclusion of major features and other can propose including it at the next Stage 1. administrative and political decisions regard- Vendors usually create their own branches from ing the project. All these decisions are guided FSF release branches. by GCC’s mission statement (http://gcc. gnu.org/gccmission.html). All contributors must sign an FSF copyright re- lease to be able to contribute to GCC. If the GCC goes through three distinct development work is done as part of their employment, their stages, which are coordinated by GCC’s re- employer must also sign a copyright release lease manager and its maintainers. Each stage form to the FSF. usually lasts between 3 and 5 months. Dur- ing Stage 1, big and disruptive changes are 3It may also happen that some of these bugs are sim- allowed. This is where all the major fea- ply moved over to the next release, if they are not deemed tures are incorporated into the compiler. Stage to be as critical as initially thought.

4 3 GENERIC Representation f(int a, int b, int c) { if (g (a + b, c)) Every language front end is responsible for all c = b++ / a the syntactic and semantic processing for the return c corresponding input language. The main inter- } face between an FE and the rest of the compiler is via the GENERIC representation [12]. Ev- Figure 2: A program in GENERIC form. ery front end is free to use its own internal data structures for parsing and validation. Once the for converting them to GIMPLE. In particu- compilation unit is parsed and validated, the FE lar, a front end need not emit GENERIC at converts its parse trees into GENERIC, a high- all. For instance, in the current implementa- level tree representation where all the language- tion, the C and C++ parsers do not actually emit specific features are explicitly represented (e.g., GENERIC during parsing. exception handling, vtable lookups). In practical terms GENERIC is a C-like lan- Due to historic reasons, most FEs use the tree guage. A front end that wants to integrate with data structure for representing their parse trees. GCC can emit any of the tree codes defined in However, the Fortran 95 FE uses its own data tree.def and implement the language hooks structures. This is a desirable property because in langhooks.h. Figure 2 shows a code it shields the FE from the rest of the compiler, fragment in GENERIC. providing a clean hand-off interface to the mid- dle end via GENERIC.

While GENERIC provides a mechanism for 4 GIMPLE Representation a language front end to represent entire func- tions in a language-independent way, there are some features that are not representable in GIMPLE is a subset of GENERIC used for op- GENERIC. For instance, during alias analy- timization. Both its name and the basic gram- sis it is often necessary to determine whether mar are based on the SIMPLE IR used by the two symbols of different types may occupy the McCAT compiler at McGill University [8]. Es- same memory location. Each language has its sentially, GIMPLE is a 3 address language with own rules regarding type conflicts, so the com- no high-level control flow structures: piler provides a call-back mechanism to query the front end. This mechanism is known as lan- guage hook or langhook and it is used when- 1. Each GIMPLE statement contains no ever the compiler needs to involve the front end more than 3 operands (except function in some transformation or analysis. calls) and has no implicit side effects. Temporaries are used to hold intermediate All the language semantics must be explic- values as necessary. itly represented in GENERIC, but there are no restrictions in how expressions are combined 2. There are no lexical scopes. and/or nested. If necessary, a front end can use language-dependent trees in its GENERIC 3. Control structures are lowered to condi- representation, so long as it provides a hook tional gotos.

5 f (int a, int b, int c) f (int a, int b, int c) { { t1 = a + b t1 = a + b t2 = g (t1, c) t2 = g (t1, c) if (t2 != 0) if (t2 != 0) else { : c = b / a c = b / a b = b + 1 b = b + 1 } : else t3 = c { return t3 } } t3 = c return t3 }

(a) High GIMPLE. (b) Low GIMPLE.

Figure 3: High and Low GIMPLE versions for the code in 2.

4. Variables that need to live in memory are form. The gimplifier lives in gimplify.c never used in expressions. They are first and tree-gimple.c. The lowering loaded into a temporary and the temporary between High and Low GIMPLE is in is used in the expression. gimple-low.c.

There are two slightly different versions of GIMPLE used in GCC, namely High GIMPLE 5 Call Graph and Low GIMPLE. The main difference is that in High GIMPLE binding scopes like the body In order to perform interprocedural analyses of an if-then-else construct are nested and optimizations, GCC builds a call graph for with the parent construct, while in Low GIM- the whole compilation unit4. This static call PLE binding scopes are completely linearized graph is used in two ways: to drive the se- using labels and jumps. Figure 3(a) shows the quence in which functions are optimized and High GIMPLE form for the code in Figure 2. to perform interprocedural optimizations. Each The Low GIMPLE version is shown in Fig- node in the call graph represents a function or ure 3(b). The differences between Low and procedure in the compilation unit and edges High GIMPLE are more noticeable when low- represent call operations. Data and attributes ering other constructs like exception handling are stored both on nodes and edges. and OpenMP directives. After the front end is done parsing a func- The process of lowering GENERIC into tion and producing the GENERIC form for it, GIMPLE is known as gimplification. It the middle end is invoked to create the High recursively replaces complex statements with sequences of statements in GIMPLE 4Only at -O2 and higher.

6 GIMPLE form with a call to gimplify_- function_tree. The gimplified function foo (int a, int b, int c) is then added to the call graph. Once all the { a1 = 3; functions have been parsed and added to the b = 5; cgraph_- 2 call graph, the middle end invokes c3 = a1 + b2; optimize, which will: a4 = c3; return a4; } 1. Perform interprocedural optimizations with a call to ipa_passes. When analyzing the statement c3 = a1 +b2, the 2. Optimize every reachable function in the optimizer can easily determine that a1 can be call graph with a call to cgraph_- replaced with 3 and b2 with 5 because both SSA expand_function which in turn calls names are guaranteed to be assigned exactly tree_rest_of_compilation to ex- once. But, in many cases, conditional control ecute all the intraprocedural transfor- flow makes it impossible to statically determine mations by calling the pass manager the most recent version of a variable. For in- (execute_pass_list). stance,

The implementation of the call graph lives in cgraph.c and cgraphunit.c. All passes foo (int a, int b, int c) { executed by the pass manager are defined if (c > 10) and registered by init_optimization_- 3 c4 = a1 + b2; passes in passes.c. else

c5 = a1 − b2; c6 = PHI return c6; 6 SSA Form }

After a function is in Low GIMPLE form, Since it is not possible to statically determine the pass manager will build the Control Flow which branch will be taken at runtime, we don’t Graph (CFG) and rewrite the function in Static know which of c4 or c5 to use at the return state- Single Assignment (SSA) form [3], which is ment. So, the SSA renamer creates a new ver- the main data structure used for analysis and sion, c6, which is assigned the result of “merg- optimization in the middle end. ing” c4 and c5. This PHI function tells the com- piler that at runtime c6 will be either c4 or c5. The SSA form is a representation that exposes data flow by linking read and write operations GCC uses two different SSA variants, a rewrit- using a static versioning scheme. When a vari- ing and a non-rewriting form. In the rewrit- able V is the target of a write operation, a new ing form, symbols are replaced with their corre- version number is created for V and labeled Vi. sponding SSA names. Each SSA name is con- Subsequent read operations, without an inter- sidered a distinct object and, as such, code mo- vening assignment to V, are modified to use Vi. tion transformations are allowed to cause two For example, or more SSA versions for the same symbol to

7 be simultaneously live. When the function is For example, in the following code fragment taken out of SSA form, the SSA names are con- variable A is a global variable that may be clob- verted into regular symbols and new artificial bered by function foo, so when calling foo, symbols are created as needed to satisfy live GCC indicates that A may be modified by it range requirements [13]. In the non-rewriting with a V_MAY_DEF operator, resulting in the form, SSA names are only used to connect vari- following virtual SSA form for A: able uses to their definition sites, distinct SSA names for the same symbol are not allowed to have overlapping live ranges, and so, when int A; taking the function out of SSA form the SSA names are simply discarded. bar () { # A2 = V MAY DEF The rewriting form is applied to local scalar A = 9 variables that may not be modified in ways un- # known to the compiler. That is, they may not A3 = V MAY DEF be modified as a side-effect to a function call, foo () their address has not been taken by any pointer # VUSE and every read/write operation references the 3 x4 = A whole object. GCC labels these variables as return x 5 4 GIMPLE registers . Operand statements that } use GIMPLE registers are referred to as real operands, and so the rewriting SSA form in GCC is also known as real SSA form. The Notice that the virtual SSA form not only links previous two code fragments are examples of uses to definitions (use-def chains). It also links GCC’s real SSA form. definitions to other definitions (def-def chains). This is necessary because V_MAY_DEF oper- In contrast, the non-rewriting SSA form is used ators represent partial/potential stores. In the on memory symbols. These are variables that previous code fragment, the store A = 9 may may be modified in ways unknown to the com- or may not reach the x4 = A statement, so it is piler. That is, they may be clobbered by a necessary to link A3 and A2, or transformations function call or their address has been taken like dead-code elimination would eliminate the or they may be partial references to the object statement A = 9. (e.g. symbols of aggregate types like struc- tures and arrays). Since statements may have The CFG, SSA form and related facilities (such implicit references to memory symbols, GCC as incremental SSA updates) are implemented represents them with special virtual operators in tree-cfg.c, tree-into-ssa.c, attached to the statement. There are two such tree-ssa.c and tree-outof-ssa.c. operators: V_MAY_DEF to indicate a partial and/or potential store to the object, and VUSE 6.1 Aliasing to indicate partial and/or potential load from the object. This non-rewriting form is known in GCC as virtual SSA form. GCC uses two mechanisms for representing aliasing: query or oracle based, used during 5This does not imply that the object will actually be RTL optimization and an explicit representa- allocated a physical register. tion used during GIMPLE optimization.

8 The query based mechanism provides functions tween the explicit representation and the query that, given two memory references will deter- based mechanism, consider the problem of re- mine whether they may overlap or not. Cur- ordering the store operations at lines 6 and rently, the analysis done by this mechanism is 7. With a query based mechanism, the trans- mostly type and structural based. If the two formation pass should call alias_sets_- memory references are to objects whose types conflict_p on the memory references given may not alias according to the rules of the lan- by ∗p2 and ∗q3. guage, then they may not overlap. The basic mechanism uses the notion of alias set num- bers, which are associated to the type of the 1 foo (int i, float *q) 2 { memory location and organized in a hierarchi- 3 int a, b,*p; cal structure according to the rules of the input 4

language. Given two memory references, the 5 p2 = (i1 > 10) ? &a :&b function alias_sets_conflict_p deter- 6 *p2 = 42 7 *q = 0.42 mines whether they may occupy the same 3 8 return *p memory slot based on their alias set numbers. 2 9 } The file alias.c contains the implementation of this mechanism. Figure 4: Query-based alias analysis requires The explicit representation is used during GIM- no additional operators in the IL. PLE optimization and it takes advantage of the virtual SSA representation. After the 1 foo (int i, float *q) code is put into SSA form, an alias analysis 2 { pass (pass_may_alias) computes points- 3 int a, b,*p; to information for all referenced pointers. 4 This is used to build flow-sensitive and flow- 5 p2 = (i1 > 10) ? &a :&b insensitive alias sets that are then associated # a5 = V MAY DEF with special symbols called memory tags that # b7 = V MAY DEF represent pointer dereferences. When a pointer 6 *p = 42 is dereferenced, the compiler will look-up its 2 # associated memory tag, determine what sym- MT9 = V MAY DEF bols belong to the alias set and insert virtual 7 *q3 = 0.42 operators for every symbol in the alias set. # VUSE # < > Each mechanism has its strengths and weak- VUSE b7 8 return *p nesses. The advantage of an explicit represen- 2 9 } tation is that optimizers need not concern them- selves with possible aliasing problems when Figure 5: Explicit representation of aliasing us- doing a transformation. On the other hand, ing virtual SSA form. an explicit representation takes up memory and compile time (unbearably so in some extreme But that relation is made explicit when the pro- cases), so it may become an unnecessary bur- gram is in virtual SSA form (Figure 5). Since den. For example, consider the function in p points to one of a or b, line 6 contains one Figure 46. To illustrate the difference be- 2 virtual operator for each variable. On the other 6 SSA form redacted for simplicity. hand, pointer q3 does not really point to any

9 variable in function foo, so dereferencing q3 is Loop optimizations based on chains of recur- represented with a virtual operator to its mem- rences to recognize scalar evolutions and ory tag (MT). Given this, the stores at line 6 and track induction variables [2]. 7 do not conflict and so the transformation can proceed safely. Traditional scalar optimizations, including constant/copy propagation, dead code While the explicit representation has several elimination, full and partial redundancy advantages over the query mechanism, all the elimination, value range propagation, virtual operators required and their PHI nodes scalar replacement of aggregates, jump will add more bulk to the IR, resulting in in- threading, forward propagation and dead creased compile time and memory consump- store elimination. tion (we are currently working on a new rep- Flow sensitive and flow insensitive alias anal- resentation to alleviate this problem). ysis, including field-sensitive points-to analysis for aggregates [1]. 6.2 SSA Optimizers Automatic instrumentation for pointer check- ing for C and C++ [6]. In general, transformations at the GIMPLE level are target-independent because the IL does not expose attributes such as word size, 7 RTL address arithmetic, registers, calling conven- tions, etc. However, there are transformations that need to span multiple IL abstractions to Register Transfer Language (RTL) is the orig- work properly. One of the prime examples is inal intermediate representation used by GCC. vectorization. The analysis required to deter- It was developed at the University of Arizona mine whether some code may be vectorized as part of their research in re-targetable com- requires high-level dataflow information that pilers in the early 80s [4, 5]. It is also used by is available in GIMPLE. However, the actual VPCC (Very Portable C Compiler). Basically, transformation depends on hardware capabili- RTL is an assembler language for an abstract ties, such as size of vector registers and avail- machine with an infinite number of registers. able vector operations. This communication is As opposed to the C-like representation used by done via special IL codes and call backs be- GENERIC and GIMPLE, RTL resembles Lisp tween the middle end and back end. (although it is possible to obtain an assembly- like rendering for debugging purposes). The A variety of SSA-based analyses and optimiza- code fragment in Figure 6(a) shows a GIMPLE tions have been implemented on the GIMPLE code fragment and its conversion to RTL (Fig- representation (Figure 1). Together with other ure 6(b)). The exact RTL will contain more de- cleanup passes and the fact that some of them tailed information and may vary from one pro- are executed more than once, the middle end cessor to another. pipeline runs to about 100 stages. Some of the more notable transformations include RTL is designed to abstract hardware features such as register classes, memory addressing modes, word sizes and types (machine modes), Vectorization [15] supporting multiple archi- compare-and-branch instructions, calling con- tectures. ventions, bitfield operations, type and sign

10 if (a > 10) else ; (insn 20 18 21 3 (set (reg:CCGC 17 flags) (compare:CCGC (reg/v:SI 60 [ a ]) (const int 10 [0xa]))))

(jump insn 21 20 22 3 (set (pc) (if then else (le (reg:CCGC 17 flags) (const int 0 [0x0])) (label ref 26) (pc))))

: (code label 22 21 23 4 3 "" [0 uses])

b = a − 1; (insn 25 23 26 4 (parallel [ (set (reg/v:SI 59 [ b ]) (plus:SI (reg/v:SI 60 [ a ]) (const int −1 [0xffffffff]))) (clobber (reg:CC 17 flags)) ]))

: (code label 26 25 27 5 2 "" [1 uses])

(a) GIMPLE version. (b) RTL version.

Figure 6: GIMPLE and RTL variants of a conditional branch. conversions, and generic instruction patterns. between generic RTL and the target ma- These abstractions are defined and controlled chine. For instance, the define_insn by an elaborate pattern matching mechanism pattern in Figure 7 describes a typical 32- defined in a machine description (MD) file, bit addition operation for a RISC proces- which defines all the necessary code generation sor. The top portion defines the pattern mappings between the back end and the target to be matched. In this case, it’s looking processor. for x = y + z where all the operands are word-sized (SI), general purpose registers Machine description files together with other ("register_operand" "r"). It also files needed to generate target code are com- indicates that the x operand is modified monly referred-to as ports. They are stored by the instruction ("=r"). The bottom in sub-directories under gcc/config/. Cur- portion indicates the final assembly code rently, GCC contains more than 30 such ports, that should be emitted when this pattern is and while the implementation of a new port matched (add x, y, z). is not a trivial task, it is perhaps one of the aspects of GCC with the most exten- sive available documentation (http://gcc. Target description macros describe hard- gnu.org/onlinedocs/). There are two ware capabilities such as register classes, main components that make up a port: calling conventions, data types and sizes, predicates that validate moves between Instruction templates define the mappings memory and registers, etc.

11 (define insn "addsi3" [(set (match operand:SI 0 "register_operand" "=r") (plus:SI (match operand:SI "register_operand" "r") (match operand:SI "register_operand" "r")))] "" "add %0, %1, %2")

Figure 7: An RTL code generation pattern for 32-bit addition.

7.1 RTL passes for moving values between registers and memory. Several efforts exist to re- implement this pass, but is generally con- Historically, all the optimization work was sidered to be a fairly difficult problem done in RTL, but the current trend is to [9, 10, 11, 16]. move most of the heavy lifting from RTL into the GIMPLE optimizers (currently there are Scheduling. This pass tries to take advan- around 60 RTL passes and more than 100 GIM- tage of the implicit parallelism provided PLE passes). The final goal is to implement by the multiple functional units in mod- analyses and transformations at the right level ern processors that allow the simultaneous of abstraction. RTL is ideally suited for low- execution of multiple instructions. The level transformations such as register allocation scheduler rearranges the instructions ac- and scheduling, but most of the generic trans- cording to data dependency restrictions formations are more efficient to implement in in an effort to increase instruction paral- GIMPLE. lelism. The scheduler is implemented in haifa-sched.c and sched-rgn.c. RTL and GIMPLE also share common infras- tructure code such as the pass and call graph is implemented us- managers, the flowgraph, dominance informa- ing Swing Modulo Scheduling (SMS) tion and type-based aliasing. Some of the main [7]. This pass complements instruction transformations done in RTL include scheduling by improving parallelism in- side loops by overlapping the execution of instructions from different loop iterations. Register allocation. Arguably, one of the Other optimizations include common subex- more complex passes in GCC. It is orga- pression elimination, instruction re- nized as a multi-pass allocator: a local combination, mode switching reduction, pass (local-alloc.c) allocates regis- peephole optimizations and machine ters within a basic block, and a global pass specific reorganizations. (global.c) which works across basic block boundaries. The actual code mod- ification is done by a third pass known as reload (reload.c and reload1.c). 8 Current Status and Future Work Most of the complexity in register alloca- tion lies in the multitude of targets sup- ported by GCC. Every target will have The open development model used by GCC has its own set of register classes and rules all the usual advantages of other FOSS projects.

12 It attracts a wide variety of developers and since 8.2 Internal modularity it is the system compiler of every Linux dis- tribution, it is a fairly stable and robust com- piler. Furthermore, the wide variety of sup- The basic compiler infrastructure is encapsu- ported languages, platforms and flexible archi- lated as much as C allows. While this re- tecture makes it a compelling option for both mains one of the weakest points in the imple- industry and academic compiler projects. mentation, we try to draw strict API bound- aries to abstract the major conceptual modules, GCC has changed quite significantly in the such as call graph, control flow graph, inter- last 3-4 years. The addition of GENERIC, mediate representation, fundamental data types GIMPLE and the SSA framework allowed the (bitmaps, hash tables), SSA form, etc. development of features that had traditionally been considered difficult or impossible to im- GCC would probably benefit from switching to plement, including vectorization, OpenMP and an implementation language with more capa- advanced loop transformations. bilities, such as C++. In fact, the topic comes up every now and again on the development lists. The consensus seems to be in favor of 8.1 New languages switching, but 1+ million lines of code repre- sent a lot of inertia to overcome, and imple- mentation discipline can go a long way. Not The introduction of the GENERIC representa- every module of the compiler is implemented tion has further simplified the task of introduc- in C, however. Front ends, for instance, are free ing a new language front end to GCC. The in- to use different implementation languages (Ada creased separation between the front end and being the prime example). the rest of the compiler provides a lot of in- dependence to language designers. While we do not claim GENERIC to be the perfect tar- 8.3 High performance computing get for all languages, it has proven to be suf- ficiently flexible for the variety of languages currently supported by GCC, including C, C++, With the advent of multi-core processors, opti- Java, Ada, Objective-C/C++ and Fortran 95. mizations and languages that take advantage of task/data parallelism will become increasingly The addition of new languages may require ex- important. GCC includes a multi-platform vec- tending and/or adapting GENERIC. In terms toriser that is unique in its class and start- of language-specific analyses and transforma- ing with version 4.2, it will include support tions, GCC’s strategy is to, as much as possible, for OpenMP (http://www.openmp.org), implement in GIMPLE where all the data and which provides compiler directives for specify- control-flow information is gathered and use ing parallelism in C, C++ and Fortran. langhooks to communicate with the front end when necessary. Most transformations in this Other important features for future releases in- area are expected to be in the area of abstrac- clude an automatic parallelization option built tion removal such as method devirtualization in on top of the OpenMP framework, memory lo- OO languages. There is also interest in more cality optimizations, more advanced loop op- sophisticated for languages like timizations and additional vectorization im- Java. provements.

13 8.4 Static analysis tools compilers may only see a small portion of the whole program. All this provides a big incen- This is another area that is starting to gain tive to move parts of the compilation process widespread interest. Compilers are in a unique into the runtime system. position to provide this facility because they This is generally known as Just-In-Time com- already have a synthetic representation of the pilation (JIT). These systems work on top of a input program. However, the set of interest- bytecode language and virtual machine which ing analyses to perform may vary widely, some converts the bytecodes into native form at run- people will want to check for security prob- time. While this provides a lot of flexibility in lems, others may want to enforce coding guide- terms of portability and dynamic features, the lines, others may want to check for buffer over- runtime overhead can be pretty significant, so flows, etc. hiding compile time latencies becomes funda- GCC already includes some of the more mentally important in these environments. commonly requested features such as pointer We are currently planning to extend GCC to checking and buffer overflow prevention. But support such dynamic compilation schemes. At there are other types of checks specific enough the time of this writing, we still do not have that it usually does not make sense to include any concrete plans, but it is certainly an area in in the compiler. At the same time, people inter- which we plan to take GCC in the medium to ested in them may not have the interest nor the long term. time to invest in delving inside the compiler to implement their analysis.

There are plans to provide some form of exten- References sibility mechanism so that external developers would be able to connect ad-hoc analysis code [1] D. Berlin. Structure Aliasing in GCC. by interfacing with GCC. In Proceedings of the 2005 GCC Summit, Ottawa, Canada, June 2005. 8.5 Dynamic Compilation [2] D. Berlin, D. Edelsohn, and S. Pop. High- Level Loop Optimizations for GCC. In Static compilation techniques are generally be- Proceedings of the 2004 GCC Summit, lieved to have reached a saturation point. Com- Ottawa, Canada, June 2004. pilers do not have a sufficiently global view of the program to perform more aggressive opti- [3] R. Cytron, J. Ferrante, B. Rosen, M. Weg- mizations. Modern software is usually spread man, and K. Zadeck. Efficiently comput- over many files and it likely uses many services ing static single assignment form and the from shared libraries, all of which is hidden control dependence graph. ACM Trans- from the compiler at compile time. And since actions on Programming Languages and most libraries use dynamic linking. the code Systems, 13(4):451–490, October 1991. may even be hidden from the compiler at link time. [4] J. W. Davidson and C. W.Fraser. The Design and Application of a Retargetable To compound this problem, languages like Java Peephole Optimizer. ACM Transactions and C# have fairly powerful dynamic proper- on Programming Languages and Systems, ties, such as class loading. Therefore, static 2(2):191–202, April 1980.

14 [5] J. W. Davidson and D. B. Whalley. 2004 GCC Summit, Ottawa, Canada, June Quick Compilers Using Peephole Opti- 2004. mization. Software - Practice and Experi- ence, 19(1):79–97, 1989. [14] D. Novillo. OpenMP and automatic par- allelization in GCC. In Proceedings of the [6] F. Ch. Eigler. Mudflap: Pointer Use 2006 GCC Summit, Ottawa, Canada, June Checking for C/C++. In Proceedings of 2006. the 2003 GCC Summit, Ottawa, Canada, May 2003. [15] D. Nuzman and R. Henderson. Multi- platform auto-vectorization. In The 4th [7] M. Hagog and A. Zaks. Swing Modulo Annual International Symposium on Code Scheduling for GCC. In Proceedings of Generation and Optimization (CGO), the 2004 GCC Summit, Ottawa, Canada, March 2006. June 2004. [16] M. Punjani. Register in [8] L. Hendren, C. Donawa, M. Emami, GCC. In Proceedings of the 2004 GCC G. Gao, Justiani, and B. Sridharan. De- Summit, Ottawa, Canada, June 2004. signing the McCAT Compiler Based on a Family of Structured Intermediate Repre- sentations. In Proceedings of the 5th In- ternational Workshop on Languages and Compilers for Parallel Computing, pages 406–420. Lecture Notes in Computer Sci- ence, no. 457, Springer-Verlag, August 1992.

[9] V. Makarov. Fighting Register Pressure in GCC. In Proceedings of the 2004 GCC Summit, Ottawa, Canada, June 2004.

[10] V. Makarov. Yet Another GCC Reg- ister Allocator. In Proceedings of the 2005 GCC Summit, Ottawa, Canada, June 2005.

[11] M. Matz. Design and Implementation of the Graph Coloring Register Allocator for GCC. In Proceedings of the 2003 GCC Summit, Ottawa, Canada, June 2003.

[12] J. Merrill. GENERIC and GIMPLE: A New Tree Representation for Entire Func- tions. In Proceedings of the 2003 GCC Summit, Ottawa, Canada, May 2003.

[13] D. Novillo. Design and Implementa- tion of Tree SSA. In Proceedings of the

15