Lecture Outline Intermediate Code & • Intermediate code Local Optimizations • Local optimizations

2

Code Generation Summary Why Intermediate Languages?

• We have so far discussed ISSUE: When to perform optimizations – Runtime organization – On abstract syntax trees – Simple stack machine code generation • Pro : Machine independent – Improvements to stack machine code generation • Con: Too high level – On assembly language • Our goes directly from the abstract • Pro: Exposes most optimization opportunities syntax tree (AST) to assembly language... • Con: Machine dependent – ... and does not perform optimizations • Con: Must re-implement optimizations when re-targeting – (optimization is the last compiler phase, which is by – On an intermediate language far the largest and most complex these days) • Pro : Exposes optimization opportunities • Most real use intermediate • Pro: Machine independent languages

3 4 Why Intermediate Languages? Kinds of Intermediate Languages

High-level intermediate representations: • Have many front-ends into a single back-end – closer to the source language; e.g., syntax trees – gcc can handle C, C++, Java, Fortran, Ada, ... – easy to generate from the input program – each front-end translates source to the same – code optimizations may not be straightforward generic language (called GENERIC) Low-level intermediate representations: – closer to target machine; e.g., P-Code, U-Code (used • Have many back-ends from a single front-end in PA-RISC and MIPS), GCC’s RTL, 3-address code – Do most optimization on intermediate representation – easy to generate code from before emitting code targeted at a single machine – generation from input program may require effort “Mid”-level intermediate representations: – Java bytecode, Microsoft CIL, LLVM IR, ...

5 6

Intermediate Code Languages: Design Issues Intermediate Languages

• Designing a good ICode language is not trivial • Each compiler uses its own intermediate • The set of operators in ICode must be rich language enough to allow the implementation of source – IL design is still an active area of research language operations • Nowadays, usually an intermediate language is • ICode operations that are closely tied to a a high-level assembly language particular machine or architecture, make – Uses register names, but has an unlimited number retargeting harder – Uses control structures like assembly language

• A small set of operations – Uses opcodes but some are higher level • E.g., push translates to several assembly instructions – may lead to long instruction sequences for some • Most opcodes correspond directly to assembly opcodes source language constructs,

– but on the other hand makes retargeting easier

7 8 Architecture of gcc Three-Address Intermediate Code

• Each instruction is of the form x := y op z – y and z can only be registers or constants – Just like assembly • Common form of intermediate code • The expression x + y * z gets translated as

t1 := y * z

t2 := x + t1 – temporary names are made up for internal nodes – each sub-expression has a “home”

9 10

Generating Intermediate Code Generating Intermediate Code (Cont.)

• Similar to assembly code generation • igen(e, t) function generates code to compute • Major difference the value of e in register t

– Use any number of IL registers to hold • Example: intermediate results igen(e1 + e2, t) =

Example: if (x + 2 > 3 * (y - 1) + 42) then z := 0; igen(e1, t 1) (t1 is a fresh register)

t1 := x + 2 igen(e2, t2) (t2 is a fresh register)

t2 := y - 1 t := t1 + t2

t3 := 3 * t2 • Unlimited number of registers t := t + 42 4 3 ⇒ simple code generation if t1 =< t4 goto L z := 0 L:

11 12 An Intermediate Language From 3-address Code to Machine Code

P → S P | ε This is almost a macro expansion process S → id := id op id • id’s are register names 3-address code MIPS assembly code | id := op id x := A[i] load i into r1 | id := id • Constants can replace id’s la r2, A

| push id • Typical operators: +, -, * add r2, r2, r1 | id := pop lw r2, (r2) • Typical relops: =, >, >= sw r2, x | if id relop id goto L x := y + z load y into r1 | L: load z into r2 | goto L add r3, r1, r2 sw r3, x if x >= y goto L load x into r1 load y into r2 bge r1, r2, L

13 14

Basic Blocks Basic Block Example

• A basic block is a maximal sequence of Consider the basic block instructions with: L: (1) – no labels (except at the first instruction), and t := 2 * x (2) – no jumps (except in the last instruction) w := t + x (3) if w > 0 goto L’ (4)

• Idea: – Cannot jump into a basic block (except at beginning) • No way for (3) to be executed without (2)

– Cannot jump out of a basic block (except at end) having been executed right before – Each instruction in a basic block is executed after – We can change (3) to w := 3 * x all the preceding instructions have been executed – Can we eliminate (2) as well?

15 16 Identifying Basic Blocks Control-Flow Graphs

• Determine the set of leaders, i.e., the first A control-flow graph is a directed graph with instruction of each basic block: – Basic blocks as nodes – The first instruction of a function is a leader – An edge from block A to block B if the execution – Any instruction that is a target of a branch is a can flow from the last instruction in A to the first leader instruction in B – Any instruction immediately following a (conditional E.g., the last instruction in A is goto LB or unconditional) branch is a leader E.g., the execution can fall-through from block A to block B • For each leader, its basic block consists of itself and all instructions up to, but not Frequently abbreviated as CFGs including, the next leader (or end of function)

17 18

Control-Flow Graphs: Example Constructing the Control Flow Graph

• Identify the basic blocks of the function

x := 1 • The body of a function • There is a directed edge between block B1 to i := 1 (or procedure) can be block B2 if represented as a control- – there is a (conditional or unconditional) jump from

L: flow graph the last instruction of B1 to the first instruction of x := x * x B2 or i := i + 1 – B2 immediately follows B1 in the textual order of if i < 42 goto L • There is one initial node the program, and B1 does not end in an unconditional jump. • All “return” nodes are terminal

19 20 Optimization Overview A Classification of Optimizations

• Optimization seeks to improve a program’s For languages like C there are three granularities utilization of some resource of optimizations – Execution time (most often) (1) Local optimizations – Code size • Apply to a basic block in isolation – Network messages sent (2) Global optimizations – (Battery) power used, etc. • Apply to a control-flow graph (function body) in isolation (3) Inter-procedural optimizations • Apply across method boundaries • Optimization should not alter what the program

computes Most compilers do (1), many do (2) and very few do (3) – The answer must still be the same – Observable behavior must be the same • this typically also includes termination behavior

21 22

Cost of Optimizations Local Optimizations

• In practice, a conscious decision is made not • The simplest form of optimizations to implement the fanciest optimization known • No need to analyze the whole procedure body – Just the basic block in question • Why? – Some optimizations are hard to implement • Example: algebraic simplification – Some optimizations are costly in terms of compilation time – Some optimizations have low benefit – Many fancy optimizations are all three above!

• Goal: maximum benefit for minimum cost

23 24 Algebraic Simplification Constant Folding

• Some statements can be deleted • Operations on constants can be computed at x := x + 0 compile time x := x * 1 • In general, if there is a statement • Some statements can be simplified x := y op z x := x * 0 ⇒ x := 0 – And y and z are constants y := y ** 2 ⇒ y := y * y – Then yopz can be computed at compile time x := x * 8 ⇒ x := x << 3

x := x * 15 ⇒ t := x << 4; x := t - x • Example: x := 2 + 2 ⇒ x := 4 (on some machines << is faster than *; but not on all!) • Example: if 2 < 0 goto L can be deleted • When might constant folding be dangerous?

25 26

Flow of Control Optimizations Single Assignment Form

• Eliminating : • Some optimizations are simplified if each – Code that is unreachable in the control-flow graph register occurs only once on the left-hand – Basic blocks that are not the target of any jump or side of an assignment “fall through” from a conditional • Intermediate code can be rewritten to be in – Such basic blocks can be eliminated single assignment form

x := z + y b := z + y • Why would such basic blocks occur? a := x ⇒ a := b x := 2 * x x := 2 * b • Removing unreachable code makes the (b is a fresh temporary) program smaller • More complicated in general, due to control – And sometimes also faster flow (e.g. loops) • Due to memory cache effects (increased spatial locality)

27 28 Common Subexpression Elimination Copy Propagation

• Assume • If w := x appears in a block, all subsequent – A basic block is in single assignment form uses of w can be replaced with uses of x – A definition x := is the first use of x in a block • All assignments with same RHS compute the • Example: same value b := z + y b := z + y a := b ⇒ a := b • Example: x := 2 * a x := 2 * b x := y + z x := y + z … ⇒ … • This does not make the program smaller or w := y + z w := x faster but might enable other optimizations (the values of x, y, and z do not change in the … code) – Constant folding –

29 30

Copy Propagation and Constant Folding Copy Propagation and Dead Code Elimination

• Example: If a := 5 a := 5 w := RHS appears in a basic block x := 2 * a ⇒ x := 10 w does not appear anywhere else in the program

y := x + 6 y := 16 Then t := x * y t := x << 4 the statement w := RHS is dead and can be eliminated – Dead = does not contribute to the program’s result

Example: (a is not used anywhere else) x := z + y b := z + y b := z + y a := x ⇒ a := b ⇒ x := 2 * b x := 2 * x x := 2 * b

31 32 Applying Local Optimizations An Example

• Each local optimization does very little by Initial code: itself a := x ** 2 b := 3 c := x • Typically optimizations interact d := c * c – Performing one optimization enables another e := b * 2 f := a + d g := e * f • Optimizing compilers repeatedly perform optimizations until no improvement is possible assume that only f and g are used in the rest of program

– The optimizer can also be stopped at any time to limit the compilation time

33 34

An Example An Example

Algebraic simplification: Algebraic simplification: a := x ** 2 a := x * x b := 3 b := 3 c := x c := x d := c * c d := c * c e := b * 2 e := b << 1 f := a + d f := a + d g := e * f g := e * f

35 36 An Example An Example

Copy and constant propagation: Copy and constant propagation: a := x * x a := x * x b := 3 b := 3 c := x c := x d := c * c d := x * x e := b << 1 e := 3 << 1 f := a + d f := a + d g := e * f g := e * f

37 38

An Example An Example

Constant folding: Constant folding: a := x * x a := x * x b := 3 b := 3 c := x c := x d := x * x d := x * x e := 3 << 1 e := 6 f := a + d f := a + d g := e * f g := e * f

39 40 An Example An Example

Common subexpression elimination: Common subexpression elimination: a := x * x a := x * x b := 3 b := 3 c := x c := x d := x * x d := a e := 6 e := 6 f := a + d f := a + d g := e * f g := e * f

41 42

An Example An Example

Copy and constant propagation: Copy and constant propagation: a := x * x a := x * x b := 3 b := 3 c := x c := x d := a d := a e := 6 e := 6 f := a + d f := a + a g := e * f g := 6 * f

43 44 An Example An Example

Dead code elimination: Dead code elimination: a := x * x a := x * x b := 3 c := x d := a e := 6 f := a + a f := a + a g := 6 * f g := 6 * f

This is the final form

45 46

Peephole Optimizations on Assembly Code Implementing Peephole Optimizations

• The optimizations presented before work on • Write peephole optimizations as replacement intermediate code rules

– They are target independent i1, …, in → j1, …, jm – But they can be applied on assembly language also where the RHS is the improved version of the LHS • Example:

Peephole optimization is an effective technique move $a $b, move $b $a → move $a $b for improving assembly code – Works if move $b $a is not the target of a jump

– The “peephole” is a short sequence of (usually • Another example: contiguous) instructions addiu $a $a i, addiu $a $a j → addiu $a $a i+j – The optimizer replaces the sequence with another equivalent one (but faster)

47 48 Peephole Optimizations Peephole Optimizations (Cont.)

• Redundant instruction elimination, e.g.: • Many (but not all) of the basic block ...... optimizations can be cast as peephole goto L ⇒ L: optimizations L: . . . – Example: addiu $a $b 0 → move $a $b . . . – Example: move $a $a → • Flow of control optimizations, e.g.: – These two together eliminate addiu $a $a 0 ...... goto L1 ⇒ goto L2 • Just like for local optimizations, peephole ...... L1: goto L2 L1: goto L2 optimizations need to be applied repeatedly to ...... get maximum effect

49 50

Local Optimizations: Concluding Remarks

• Intermediate code is helpful for many optimizations

• Many simple optimizations can still be applied on assembly language

• “” is grossly misnamed – Code produced by “optimizers” is not optimal in any reasonable sense – “Program improvement” is a more appropriate term

• Next time: global optimizations 51