Automatic Program Optimization

Automatic Program Optimization SIGPLAN '93 PLDI Tutorial Notes Ron Cytron Washington University St. Louis, Missouri Printing: August 5, 1994 c 1993 by Ron K. Cytron Compiler organizations Program Front end: Operator and storage ab- stractions, alias mechanisms. Middle end: Parser Front Dead code elimination End Code motion Semantics Reduction in strength Constant propagation Common subexpression elimina- Middle tion Optimizer End Fission Fusion Strip mining Code Back Jamming Generation End Splitting Collapsing Executable Back end: Finite resource issues and Text code generation. Copyright c 1993 Ron K. Cytron. All rights reserved ±1± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Some thoughts Misconceptions All too often::: Optimization optimizes your program. Optimization is disabled by default. De- There's probably a better algorithm or sequence of bugging optimized code can be treacherous [71, 42]. program transformations. While optimization hopefully Optimization is often the primary suspect of program improves your program, the result is usually not optimal. misbehaviorÐsometimes deservedly so. ªNo, not the Optimization requires (much) more third switch!º compilation time. For example, dead code Optimization is slow. Transformations are often elimination can reduce the size of program text such applied to too much of a program. Optimizations are that overall compile time is also reduced. often textbook recipes, applied without proper thought. A clever programmer is a good sub- Optimization produces incorrect code. stitute for an optimizing compiler. Although recent work is encouraging [67], optimizations While ef®cient coding of an algorithm is essential, pro- are usually developed ad hoc. grams should not be obfuscated by ªtricksº that are Programmers are trained by their com- architecture- (and sometimes compiler-) speci®c. pilers. A style is inevitably developed that is con- ducive to optimization. Optimization is like sex: Everybody claims to get good results using exotic techniques; Nobody is willing to provide the details. Copyright c 1993 Ron K. Cytron. All rights reserved ±2± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Multilingual systems C FTN. ADA C FTN. ADA IL IBM SUN IBM SUN . CRAY . CRAY RS/6000 SPARC RS/6000 SPARC Architecting an intermediate language reduces the incremental cost of accom- modating new source languages or target architectures [7]. Moreover, many optimizations can be performed directly on the intermediate language text, so that source- and machine-independent optimizations can be performed by a common middle-end. Copyright c 1993 Ron K. Cytron. All rights reserved ±3± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Intermediate languages It's very easy to devote much time and effort toward choosing the ªrightº IL. Below are some guidelines for choosing or developing a useful intermediate language: The IL should be a bona ®de language, and not just an aggregation of data structures. The semantics of the IL should be cleanly de®ned and readily apparent. The IL's representation should not be overly verbose: ± Although some expansion is inevitable, the IL-to-source token ratio should be as low as possible. ± It's desirable for the IL to have a verbose, human-readable form. The IL should be easily and cleanly extensible. The IL should be suf®ciently general to represent the important aspects of multiple front-end languages. The IL should be suf®ciently general to support ef®cient code generation for multiple back-end targets. A sampling of dif®cult issues: Ideally, an IL has fractal characteristics: optimization can proceed at a given How should a string operation be level; the IL can be ªloweredº; opti- represented (intact or as a ªloopº)? mization is then applied to the freshly How much detail of a procedure's exposed description. behavior is relevant? Copyright c 1993 Ron K. Cytron. All rights reserved ±4± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Intermediate languages (cont'd) Components of an IL: SymbolTable: declares the name space of a procedure. AliasRelations: declares sets of aliased names. Semantics: gives a formal speci®cation of the procedure's behavior. Example program: Procedure foox; y The procedure randomly assigns the ad- declare dress of a or b to p, and then stores x + y into the location dereferenced by p. x; y integer a; b integer This example is intended to illustrate ?p integer may-alias behavior: ?p may be an alias for a or b, and static analysis certainly p r and ?&a :&b can't decide which alias will hold at run- ?p x + y time. end We'll now look at a possible intermediate representation of this procedure, using a Lisp-like notation that is easy to parse and to extend. The verbose keywords are easily coded and stored ef®ciently as terminals of the language's grammar. Copyright c 1993 Ron K. Cytron. All rights reserved ±5± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Symbol table (SymbolTable (Symbol (NumSymbols 5) (SymbolName a) (Symbol (SymbolID 4) (SymbolName x) ) (SymbolID 1) (Symbol ) (SymbolName b) (Symbol (SymbolID 5) (SymbolName y) ) (SymbolID 2) ) ) (Symbol (SymbolName p) (SymbolID 3) ) Symbol attributes (SymbolType, SymbolSize, SymbolVolatile, etc.) can be added as needed. Copyright c 1993 Ron K. Cytron. All rights reserved ±6± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Alias relations In programs with pointers or reference (AliasRelations parameters, use of a lexical name could (NumAliasRelations 2) refer to one of multiple, distinct storage (AliasRelation names. We therefore specify sets of (AliasID 1) names that might be treated in this man- (MayAliases 2 a b) ner. Subsequently, a given reference to ) a lexical name can be associated with (AliasRelation an alias relation entry, to describe the (AliasID 2) potential effects of that reference. (MustAliases 1 y) (MayAliases 1 x) In this example, we have an alias re- ) lation with may-aliases for a and b and (AliasRelation no must-aliases. This relation might be (AliasID 3) appropriate for the assignment through (MustAliases 1 x) ?p. (MayAliases 1 y) Also shown are two relations in which x ) and y are alternately must- and may- ) aliased. If parameters are passed ªby- addressº, then a use of x may be a use of y if each parameter is supplied the same address. Copyright c 1993 Ron K. Cytron. All rights reserved ±7± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Procedure semantics (ProcSemantics (NodeSemantics (NumNodes 5) (NodeID 2) (Def (NodeSemantics (DefID 2) (NodeID 1) (SymbID ?) (Def (AliasWith 1) (DefID 1) (DefValue (SymbID p) (+ (DefValue (Use (Choose (UseID 1) ((= rand() 0) (SymbID x) (Addr a) ) (Addr b) (Use ) (UseID 2) ) (SymbID y) ) ) ) ) (Jump 2) ) ) ) ) ) Copyright c 1993 Ron K. Cytron. All rights reserved ±8± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL What happens in the middle end? Essentially, the program is transformed Control Flow Graph into an observably equivalent while less resource-consumptive program. Such Depth-First Numbering transformation is often based on: Spanning Tree Assertions provided by the program Dominators Intervals author or benefactor. Program Dominance Semantics Profiling The program dependence Frontiers graph [50, 35, 10]. Sparse Evaluation Static Single Static single assignment (SSA) Graph Assignment Control form [26, 5, 69, 28]. Form Dependence Data Flow Static information gathered by solv- Edges ing data ¯ow problems [44, 51, 52, Problems 53, 41, 54, 55, 48]. Data Dependence Run-time information collected by Edges pro®ling [61]. Program Program Dependence Graph Transformation Let's take a look at an example that bene®ts greatly from optimization::: Copyright c 1993 Ron K. Cytron. All rights reserved ±9± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Unoptimized matrix multiply for i = 1 to N do for j = 1 to N do A[i; j ] 0 for k = 1 to N do A[i; j ] A[i; j ]+B [i; k ] C [k; j ] od od od Note that A[i; j ] is really Addr A+i 1 K1 +j 1 K2 which takes 6 integer operations. The innermost loop of this ªtextbookº program takes 24 integer ops 3loads 1 ¯oating add 1 ¯oating mpy 1store 30 instructions Copyright c 1993 Ron K. Cytron. All rights reserved ±10± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL Optimizing matrix multiply for i = 1 to N do The expression A[i; j ] is loop-invariant j = N for 1 to do with respect to the k loop. Thus, code a &A[i; j ] motion can move the address arith- A[i; j ] for k = 1 to N do metic for out of the innermost loop. ?a ?a + B [i; k ] C [k; j ] The resulting innermost loop contains od only 12 integer operations. od od for i = 1 to N do As loop k iterates, addressing arithmetic b &B [i; 1] for B changes from B [i; k ] to B [i; k + 1]. for j = 1 to N do Induction variable analysis detects the a &A[i; j ] constant difference between these ex- for k = 1 to N do pressions. ?a ?a + ?b C [k; j ] The resulting innermost loop contains b b + K B only 7 integer operations. od od od Similar analysis for C yields only 2 integer operations in the innermost loop, for a speedup of nearly 5. We can do better, especially for large arrays. Copyright c 1993 Ron K. Cytron. All rights reserved ±11± SIGPLAN '93 PROGRAM OPTIMIZATION TUTORIAL If optimization is::: so great becase: then: A good compiler can sell (even a slow) Why does it take so long? Compilation machine. Optimizing compilers easily provide a time is usually 2±5 times slower, and programs with large factor of two in performance. Moreover, the analysis procedures often take longer. Often this is the result of performed during program optimization can be incor- poor engineering: better data structures or algorithms porated into the ªprogramming environmentº [50, 25, can help in the optimizer. 68]. Why does the resulting program some- New languages and architectures mo- times exhibit unexpected behavior? tivate new program optimizations. Al- Sometimes the source program is at fault, and a bug though some optimizations are almost universally bene®- is uncovered when the optimized code is executed; cial, the advent of functional and parallel programming sometimes the optimizing compiler is itself to blame.

Load more