Code Generation and Local Optimization

Generating assembly Code generation and • How do we convert from three-address code to assembly? • Seems easy! But easy solutions may not be the best local optimization option • What we will cover: • Instruction selection • Peephole optimizations • “Local” common subexpression elimination • “Local” register allocation Wednesday, October 22, 14 Wednesday, October 22, 14 Naïve approach Why is this bad? (I) • “Macro-expansion” LD A, R1 Treat each 3AC instruction separately, generate code in • MOV 4, R2 isolation MUL A, 4, B MUL R1, R2, R3 LD A, R1 ST R3, B LD B, R2 ADD A, B, C ADD R1, R2, R3 ST R3, C LD A, R1 MOV 4, R2 MUL A, 4, B MUL R1, R2, R3 ST R3, B Wednesday, October 22, 14 Wednesday, October 22, 14 Why is this bad? (I) Why is this bad? (I) LD A, R1 LD A, R1 MOV 4, R2 MOV 4, R2 MUL A, 4, B MUL A, 4, B MUL R1, R2, R3 MUL R1, R2, R3 ST R3, B ST R3, B LD A, R1 MUL A, 4, B MULI R1, 4, R3 ST R3, B Too many instructions Too many instructions Should use a different instruction type Should use a different instruction type Wednesday, October 22, 14 Wednesday, October 22, 14 Why is this bad? (II) Why is this bad? (II) LD A, R1 LD A, R1 LD B, R2 LD B, R2 ADD A, B, C ADD A, B, C ADD R1, R2, R3 ADD R1, R2, R3 ST R3, C ST R3, C LD A, R1 LD A, R1 LD B, R2 LD B, R2 ADD R1, R2, R3 ADD R1, R2, R3 ADD A, B, C ST R3, C ADD A, B, C ST R3, C ADD C, A, E LD C, R4 ADD C, A, E LD C, R4 LD A, R5 LD A, R5 ADD R4, R5, R6 Redundant load of C ADD R4, R5, R6 ST R6, E Redundant load of A ST R6, E Uses a lot of registers Wednesday, October 22, 14 Wednesday, October 22, 14 Why is this bad? (II) Why is this bad? (III) LD A, R1 LD A, R1 LD B, R2 LD B, R2 ADD A, B, C ADD A, B, C ADD R1, R2, R3 ADD R1, R2, R3 ST R3, C ST R3, C LD A, R1 LD A, R1 LD B, R2 LD B, R2 ADD R1, R2, R3 ADD R1, R2, R3 ADD A, B, C ST R3, C ADD A, B, C ST R3, C ADD C, A, E LD C, R4 ADD A, B, D LD A, R4 LD A, R5 LD B, R5 Redundant load of C ADD R4, R5, R6 ADD R4, R5, R6 Redundant load of A ST R6, E ST R6, D Uses a lot of registers Wednesday, October 22, 14 Wednesday, October 22, 14 Why is this bad? (III) How do we address this? LD A, R1 LD B, R2 ADD A, B, C ADD R1, R2, R3 • Several techniques to improve performance of generated ST R3, C code Instruction selection to choose better instructions LD A, R1 • LD B, R2 • Peephole optimizations to remove redundant instructions ADD R1, R2, R3 ADD A, B, C ST R3, C • Common subexpression elimination to remove redundant ADD A, B, D LD A, R4 computation LD B, R5 Register allocation to reduce number of registers used ADD R4, R5, R6 • Wasting instructions recomputing A + B ST R6, D Wednesday, October 22, 14 Wednesday, October 22, 14 Instruction selection More choices for instructions • Even a simple instruction may have a large set of possible • Auto increment/decrement (especially common in address modes and combinations embedded processors as in DSPs) + A B C • e.g., load from this address and increment it • Can be indirect, register, memory • Why is this useful? address, indexed, etc. • Three-address instructions • Can be literal, register, memory • Specialized registers (condition registers, floating point address, indexed, etc. registers, etc.) • Can be literal, register, memory • “Free” addition in indexed mode address, indexed, etc. MOV (R1)offset R2 • Dozens of potential combinations! • Why is this useful? Wednesday, October 22, 14 Wednesday, October 22, 14 Peephole optimizations Peephole optimizations • Simple optimizations that can be performed by pattern • Constant folding matching ADD lit1, lit2, Rx MOV lit1 + lit2, Rx MOV lit1, Rx Intuitively, look through a “peephole” at a small segment MOV lit1 + lit2, Ry • ADD li2, Rx, Ry of code and replace it with something better • Strength reduction • Example: if code generator sees ST R X; LD X R, eliminate load MUL operand, 2, Rx SHIFTL operand, 1, Rx • Can recognize sequences of instructions that can be DIV operand, 4, Rx SHIFTR operand, 2, Rx performed by single instructions • Null sequences LDI R1 R2; ADD R1 4 R1 replaced by MUL operand, 1, Rx MOV operand, Rx LDINC R1 R2 4 //load from address in R1 then inc by 4 ADD operand, 0, Rx MOV operand, Rx Wednesday, October 22, 14 Wednesday, October 22, 14 Peephole optimizations Superoptimization • Combine operations JEQ L1 Peephole optimization/instruction selection writ large JMP L2 JNE L2 • L1: ... • Given a sequence of instructions, find a different sequence of instructions that performs the same computation in less Simplifying • time SUB operand, 0, Rx NEG Rx • Huge body of research, pulling in ideas from all across • Special cases (taking advantage of ++/--) computer science ADD 1, Rx, Rx INC Rx Theorem proving SUB Rx, 1, Rx DEC Rx • • Address mode operations • Machine learning MOV A R1 ADD @A R2 R3 ADD 0(R1) R2 R3 Wednesday, October 22, 14 Wednesday, October 22, 14 Common subexpression Common subexpression elimination elimination • Goal: remove redundant computation, don’t calculate the same expression multiple times • Two varieties of common subexpression elimination (CSE) Local: within a single basic block 1: A = B * C Keep the result of statement 1 in a • 2: E = B * C temporary and reuse for statement 2 • Easier problem to solve (why?) • Difficulty: how do we know when the same expression will • Global: within a single procedure or across the whole produce the same result? program Intra- vs. inter-procedural 1: A = B * C B is “killed.” Any expression using B is • 2: B = <new value> no longer “available,” so we cannot • More powerful, but harder (why?) reuse the result of statement 1 for 3: E = B * C statement 3 • Will come back to these sorts of “global” optimizations later • This becomes harder with pointers (how do we know when B is killed?) Wednesday, October 22, 14 Wednesday, October 22, 14 CSE in practice Maintaining available expressions For each 3AC operation in a basic block • Idea: keep track of which expressions are “available” during • the execution of a basic block • Create name for expression (based on lexical representation) • Which expressions have we already computed? If name not in available expression set, generate code, Issue: determining when an expression is no longer • • add it to set available Track register that holds result of and any variables This happens when one of its components is • • used to compute expression assigned to, or “killed.” If name in available expression set, generate move Idea: when we see an expression that is already available, • • instruction rather than generating code, copy the temporary If operation assigns to a variable, kill all dependent Issue: determining when two expressions are the same • • expressions Wednesday, October 22, 14 Wednesday, October 22, 14 Example Example Three address code Generated code Three address code Generated code + A B T1 + A B T1 ADD A B R1 + T1 C T2 + T1 C T2 + A B T3 + A B T3 + T1 T2 C + T1 T2 C + T1 C T4 + T1 C T4 + T3 T2 D + T3 T2 D Available expressions: Available expressions: “A+B” Wednesday, October 22, 14 Wednesday, October 22, 14 Example Example Three address code Generated code Three address code Generated code + A B T1 ADD A B R1 + A B T1 ADD A B R1 + T1 C T2 ADD R1 C R2 + T1 C T2 ADD R1 C R2 + A B T3 + A B T3 MOV R1 R3 + T1 T2 C + T1 T2 C + T1 C T4 + T1 C T4 + T3 T2 D + T3 T2 D Available expressions: “A+B” “T1+C” Available expressions: “A+B” “T1+C” Wednesday, October 22, 14 Wednesday, October 22, 14 Example Example Three address code Generated code Three address code Generated code + A B T1 ADD A B R1 + A B T1 ADD A B R1 + T1 C T2 ADD R1 C R2 + T1 C T2 ADD R1 C R2 + A B T3 MOV R1 R3 + A B T3 MOV R1 R3 + T1 T2 C ADD R1 R2 R5; ST R5 C + T1 T2 C ADD R1 R2 R5; ST R5 C + T1 C T4 + T1 C T4 ADD R1 C R4 + T3 T2 D + T3 T2 D Available expressions: “A+B” “T1+C” “T1+T2” Available expressions: “A+B” “T1+T2” “T1+C” Wednesday, October 22, 14 Wednesday, October 22, 14 Example Downsides • What are some downsides to this approach? Consider the two highlighted operations Three address code Generated code Three address code Generated code + A B T1 ADD A B R1 + A B T1 ADD A B R1 + T1 C T2 ADD R1 C R2 + T1 C T2 ADD R1 C R2 + A B T3 MOV R1 R3 + A B T3 MOV R1 R3 + T1 T2 C ADD R1 R2 R5; ST R5 C + T1 T2 C ADD R1 R2 R5; ST R5 C + T1 C T4 ADD R1 C R4 + T1 C T4 ADD R1 C R4 + T3 T2 D ADD R3 R2 R6; ST R6 D + T3 T2 D ADD R3 R2 R6; ST R6 D Available expressions: “A+B” “T1+T2” “T1+C” “T3+T2” Wednesday, October 22, 14 Wednesday, October 22, 14 Downsides Aliasing • What are some downsides to this approach? Consider the • One of the biggest problems in compiler analysis is to two highlighted operations recognize aliases – different names for the same location in memory Three address code Generated code Aliases can occur for many reasons + A B T1 ADD A B R1 • + T1 C T2 ADD R1 C R2 • Pointers referring to same location, arrays referencing the + A B T3 MOV R1 R3 same element, function calls passing the same reference + T1 T2 C ADD R1 R2 R5; ST R5 C in two arguments, explicit storage overlapping (unions) + T1 C T4 ADD R1 C R4 + T3 T2 D ST R5 D • Upshot: when talking about “live” and “killed” values in optimizations like CSE, we’re talking about particular • This can be handled by an optimization called value variable names numbering, which we will not cover now (although we may • In the presence of aliasing, we may not know which variables get to it later) get killed when a location is written to Wednesday, October 22, 14 Wednesday, October 22, 14 Memory disambiguation Register allocation • Simple code generation: use a register for each temporary, load from a variable on each read, store

Code Generation and Local Optimization

Resourceable, Retargetable, Modular Instruction Selection Using a Machine-Independent, Type-Based Tiling of Low-Level Intermediate Code

Precise Null Pointer Analysis Through Global Value Numbering

A Formally-Verified Alias Analysis

Aliases, Intro. to Optimization

CS415 Compilers Register Allocation and Introduction to Instruction Scheduling

MAO - an Extensible Micro-Architectural Optimizer

Automatic Parallelization of C by Means of Language Transcription

Lecture Notes on Instruction Selection

Automatic Loop Parallelization Via Compiler Guided Refactoring

Synthesizing an Instruction Selection Rule Library from Semantic Specifications

A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion

Machine Independent Code Optimizations