GCC Yesterday, Today and Tomorrow Diego Novillo [email protected] Red Hat Canada
2nd HiPEAC GCC Tutorial Ghent, Belgium January 2007 Yesterday
January 28, 2007 2 Brief History
● GCC 1 (1987) – Inspired on Pastel compiler (Lawrence Livermore Labs) – Only C – Translation done one statement at a time ● GCC 2 (1992) – Added C++ – Added RISC architecture support – Closed development model challenged – New features difficult to add
January 28, 2007 3 Brief History
● EGCS (1997) – Fork from GCC 2.x – Many new features: Java, Chill, numerous embedded ports, new scheduler, new optimizations, integrated libstdc++ ● GCC 2.95 (1999) – EGCS and GCC2 merge into GCC – Type based alias analysis – Chill front end – ISO C99 support
January 28, 2007 4 Brief History
● GCC 3 (2001) – Integrated libjava – Experimental SSA form on RTL – Functions as trees (crucial feature) ● GCC 4 (2005) – Internal architecture overhaul (Tree SSA) – Fortran 95 – Automatic vectorization
January 28, 2007 5 GCC Growth1 Objective C++ 2,200,000 Tree SSA 2,000,000 Fortran 95 1,800,000
1,600,000 Ada 1,400,000
1,200,000 C libjava Total O Runtime L 1,000,000 libstdc++ Front Ends 800,000 Java Compiler Ports 600,000 C++ 400,000
200,000
0 1.21 1.38 2.0 2.8.1 EGCS 2.95 3.0 3.1 4.0 4.1 1988 1990 1992 1998 1998 1999 2001 2002 2005 2006 Releases 1 generated using David A. Wheeler's 'SLOCCount'.
January 28, 2007 6 Core Compiler Growth1
700,000 Tree SSA 600,000
500,000
400,000 C O L 300,000 Compiler Ports
200,000
100,000
0 1.21 1.38 2.0 2.8.1 EGCS 2.95 3.0 3.1 4.0 4.1 1988 1990 1992 1998 1998 1999 2001 2002 2005 2006 Releases 1 generated using David A. Wheeler's 'SLOCCount'.
January 28, 2007 7 Growing pains
● Monolithic architecture ● Front ends too close to back ends ● Little or no internal interfaces ● RTL inadequate for high level optimizations Front End Back End C
C++ RTL Assembly Java
Fortran
January 28, 2007 8 Tree SSA Design
● Goals – Separate FE from BE – Evolution vs Revolution ● Needed IL layers – Two ILs to choose from: Tree and RTL – Moving down the abstraction ladder seemed simpler ● Started with FUD chains over C Trees – Every FE had its own variant of Trees – Complex grammar, side-effects, language dependencies
January 28, 2007 9 Tree SSA Design
● SIMPLE (McGill University) – Used same tree data structure – Simplified and restrictive grammar – Started with C and followed with C++ – Later renamed to GIMPLE – Still not enough ● GENERIC – Target IL for every FE – No grammar restrictions – Only required to remove language dependencies
January 28, 2007 10 Tree SSA Design
● FUD-chains unpopular – No overlapping live ranges (OLR) – Limits some transformations (e.g., copy propagation) ● Replaced FUD-chains with rewriting form – Kept FUD-chains for memory expressions (Virtual SSA) – Needed out-of-SSA pass to cope with OLR ● Several APIs – CFG, statement, operand manipulation – Pass manager – Call graph manager
January 28, 2007 11 Tree SSA Timeline
EH Lowering SRA Statement lists Pass manager Operands API Nov/03 Rewriting SSA Statement iterators Fortran 95 Out of SSA GIMPLE for Java Complex lowering DOM Flow-sensitive PTA Escape analysis PHI opt CCP/DCE forward prop Points-to analysis unnesting
el Mudflap profiling GIMPLE and GENERIC Jan/03 DSE NRV Lev Jul/03
y PRE t tree-ssa-20020619-branch vi i
t SIMPLE for C++
Ac Mainline SIMPLE for C Sep/02 Pretty printer Merge
Private ast-optimizer- development branch 13/May/2004 SSA on C trees 19/Jun /02
O ct/ 00 Ju l/01 Ja n/02
January 28, 2007 12 Today
January 28, 2007 13 Compiler pipeline
C Front End Middle End Back End
C++ GENERIC GIMPLE RTL Assembly Java Inter RTL Fortran Procedural Optimizer Optimizer
SSA Final Code Optimizer Generation
Call Graph Pass Manager Manager
January 28, 2007 14 Major Features
● De-facto system compiler for Linux ● No central planning (controlled chaos) ● Supports – All major languages – An insane number of platforms ● SSA-based high-level global optimizer ● Automatic vectorization ● OpenMP support ● Pointer checking instrumentation for C/C++
January 28, 2007 15 Major Issues
● Popularity – Caters to a wide variety of user communities – Moving in different directions at once ● No central planning (controlled chaos) ● Not enough engineering cycles ● Risk of “featuritis” – Too many flags – Too many passes ● Warnings depending on optimization levels
January 28, 2007 16 Longer Release Cycles
400 Release cycle has almost Estimated (Feb/07) doubled since 3.1 350
300
250 s y 200 a
D Stage 3 150 Release
100
50
0 3.1 3.3 3.4 4.0 4.1 4.2 2002 2003 2004 2005 2006 2007 Release
January 28, 2007 17 A Few Current Projects
● RTL cleanups ● OpenMP ● Interprocedural analysis framework ● GIMPLE tuples ● Link-time optimization ● Memory SSA ● Sharing alias information across ILs ● Register allocation ● Scheduling
January 28, 2007 18 RTL Cleanups
● Removal of duplicate functionality – Mostly done – Goal is for RTL to only handle low-level issues ● Dataflow analysis – Substitute ad-hoc flow analysis with a generic DF solving framework – Increased accuracy. Allows more aggressive transformations – Support for incremental dataflow information – Backends need taming
January 28, 2007 19 OpenMP
● Pragma-based annotations to specify parallelism ● New hand-written recursive-descent parser eased pragma recognition ● GIMPLE extended to support concurrency ● Supports most platforms with thread support ● Available now in Fedora Core 6's compiler ● Official release will be in GCC 4.2
January 28, 2007 20 OpenMP
#include
$ gcc -fopenmp -o hello hello.c $ gcc -o hello hello.c $ ./hello $ ./hello [2] Hello [0] Hello [3] Hello [0] Hello ← Master thread [1] Hello
January 28, 2007 21 Interprocedural Analysis
● Keep the whole call-graph in SSA form ● Increased precision for inter-procedural analyses and transformations ● Unify internal APIs to support inter/intra procedural analysis – No special casing in pass manager – Improve callgraph facilities ● Challenges – Memory consumption – Privatization of global attributes
January 28, 2007 22 GIMPLE Tuples
● GIMPLE shares the tree data structure with FE
● Too much redundant information ● A separate tuple-like data structure provides – Increased separation with FE – Memory savings – Potential compile time improvements ● Challenges – Shared infrastructure (e.g. fold()) – Compile time increase due to conversion
January 28, 2007 23 Link Time Optimzation
● Ability to stream IL to support IPO across – Multiple compilation units – Multiple languages ● Streamed IL representation treated like any other language ● Challenges – Original language must be represented explicitly – Complete compiler state must be preserved – Conflicting flags used in different modules
January 28, 2007 24 Alias Analysis
January 28, 2007 25 Overview
● GIMPLE represents alias information explicitly ● Alias analysis is just another pass – Artificial symbols represent memory expressions (virtual operands) – FUD-chains computed on virtual operands → Virtual SSA ● Transformations may prove a symbol non- addressable – Promoted to GIMPLE register – Requires another aliasing pass
January 28, 2007 26 Symbolic Representation of Memory
● Pointer P is associated with memory tag MT – MT represents the set of variables pointed-to by P
● So *P is a reference to MT
if (...) p points-to {a, b} p = &a p has memory tag MT else p = &b *p = 5 Interpreted as MT = 5
January 28, 2007 27 Associating Memory with Symbols
● Alias analysis – Builds points-to sets and memory tags ● Structural analysis – Builds field tags (sub-variables) ● Operand scanner – Scans memory expressions to extract tags – Prunes alias sets based on expression structure
January 28, 2007 28 Alias Analysis
● Points-to alias analysis (PTAA) – Based on constraint graphs – Field and flow sensitive, context insensitive – Intra-procedural (inter-procedural in 4.2) – Fairly precise ● Type-based analysis (TBAA) – Based on input language rules – Field sensitive, flow insensitive – Very imprecise
January 28, 2007 29 Alias Analysis
● Two kinds of pointers are considered – Symbols: Points-to is flow-insensitive ● Associated to Symbol Memory Tags (SMT) – SSA names: Points-to is flow-sensitive ● Associated to Name Memory Tags (NMT)
● Given pointer dereference *ptr 42
– If ptr has NMT, use it 42 – If not, fall back to SMT associated with ptr
January 28, 2007 30 Structural Analysis
● Separate structure fields are assigned distinct symbols
struct A { int x; ● Variable a will have 3 sub-variables int y; { SFT.1, SFT.2, SFT.3 } int z; }; ● References to each field are mapped to the corresponding sub- struct A a; variable
January 28, 2007 31 IL Representation
foo (i, a, b, *p) { p = (i > 10) ? &a : &b
# a = VDEF foo (i, a, b, *p) # b = VDEF { *p = 3 p =(i > 10) ? &a : &b *p = 3 # VUSE return a + b t1 = a } # VUSE t2 = b
t3 = t1 + t2 return t3 }
January 28, 2007 32 Virtual SSA Form
foo (i, a, b, *p) ● VDEF operand needed { to maintain DEF-DEF p_2 = (i_1 > 10) ? &a : &b links # a_4 = VDEF
January 28, 2007 33 Virtual SSA – Problems
● Big alias sets → Many virtual operators – Unnecessarily detailed tracking – Memory – Compile time – SSA name explosion ● Need new representation that can – Degrade gracefully as detail level grows – Or, better, no degradation with highly detailed information
January 28, 2007 34 Memory SSA
● New representation of aliasing information ● Big alias sets → Many virtual operators – Unnecessarily detailed tracking – Memory, compile time, SSA name explosion ● Main idea – Stores to many locations create a single name – Factored name becomes reaching definition for all symbols involved in store
January 28, 2007 35 Memory SSA
# .MEM_10 = VDEF <.MEM_0> p_3 points-to { a, b, c } *p_3 = ...
# .MEM_11 = VDEF <.MEM_0> q_4 points-to { n, o, p } *q_4 = ... At most one VDEF and # b_12 = VDEF <.MEM_10> b = ... one VUSE per statement
# .MEM_13 = VDEF <.MEM_10, b_12> *p_3 = ... Virtual operators may refer to more than one operand # VUSE <.MEM_13> t_14 = b Factored stores create # VUSE <.MEM_11> t_15 = o “sinks” that group multiple incoming names
January 28, 2007 36 Memory SSA
● Challenges – Overlapping live ranges create a problem for some passes – Requires more detailed tracking in SSA rewriting – Dynamic association between Φ nodes and symbols ● Memory partitions – Have more than one .MEM object – Create static associations between symbols – Association is independent from alias relations – Heuristics control partitioning
January 28, 2007 37 Tomorrow
January 28, 2007 38 Future Work
● Wide variety of projects ● A small sample – Plug-in support – Scheduling – Register pressure reduction – Register allocation – Incremental compilation – Dynamic compilation – Dynamic optimization pipeline
January 28, 2007 39 Plug-in Support
● Extensibility mechanism to allow 3rd party tools ● Wrap some internal APIs for external use ● Allow loading of external shared modules – Loaded module becomes another pass – Compiler flag determines location ● Versioning scheme prevents mismatching ● Useful for – Static analysis – Experimenting with new transformations
January 28, 2007 40 Scheduling
● Several concurrent efforts targetting 4.3 and 4.4 – Schedule over larger regions for increased parallelism – Most target IA64, but benefit all architectures ● Enhanced selective scheduling ● Treegion scheduling ● Superblock scheduling ● Improvements to swing modulo scheduling
January 28, 2007 41 Register Allocation
● Several efforts over the years ● Complex problem – Many different targets to handle – Interactions with reload and scheduling ● YARA (Yet Another Register Allocator) – Experimented with several algorithms ● IRA (Integrated Register Allocator) – Priority coloring, Chaitin-Briggs and region based – Expected in 4.4 – Currently works on x86, x86-64, ppc, IA64, sparc, s390
January 28, 2007 42 Register pressure reduction
● SSA may cause excessive register pressure – Pathological cases → ~800 live registers – RA battle lost before it begins ● Short term project to cope with RA deficiencies ● Implement register pressure reduction in GIMPLE before going to RTL – Pre-spilling combined with live range splitting – Load rematerialization – Tie RTL generation into out-of-ssa to allow better instruction selection for spills and rematerialization
January 28, 2007 43 Dynamic compilation
● Delay compilation until runtime (JIT) – Emit bytecodes – Implement virtual machine with optimizing transformations ● Leverage on existing infrastructure (LLVM, LTO) ● Not appropriate for every case ● Challenges – Still active research – Different models/costs for static and dynamic compilers
January 28, 2007 44 Incremental Compilation
● Speed up edit-compile-debug cycle ● Speeds up ordinary compiles by compiling a given header file “once” ● Incremental changes fed to compiler daemon ● Incremental linking as well ● Side effects – Refactoring – Cross-referencing – Compile-while-you-type (e.g., Eclipse)
January 28, 2007 45 Dynamic Optimization Pipeline
● Phase ordering not optimal for every case ● Current static ordering difficult to change ● Allow external re-ordering – Ultimate control – Allow experimenting with different orderings – Define -On based on common orderings ● Problems – Probability of finding bugs increases – Enormous search space
January 28, 2007 46