The Design of an Optimizing

Wulf, Johnsson, Weinstock, Hobbs, and Geschke

and

The Production-Quality Compiler-Compiler Project

Leverett, Cattell, Hobbs, Newcomer, Reiner, Schatz, Wulf

Bliss/11

• an outstanding for the PDP-11 • a fairly simple systems • relied heavily on special case knowledge • strong emphasis on code selection & limited resources

Comp 12, Spring 1996 Lecture 5, Page 1 Structure of the Bliss/11 Compiler

delay tla - lex syn flo - - - rank

final - pack - code - -

The phases lexsynflo — scanning, parsing, data-flow analysis delay — determines shape & evaluation order, estimates costs tla, rank, pack — register allocation (memory–memory ops) code — tries to generate optimal code for each expression final — produce object code and perform peephole optimizations

Comp 12, Spring 1996 Lecture 5, Page 2 lex-syn-flo lex-syn

• hand constructed dfa finds lexemes & builds tables • top-down, recursive descent parser on simplified language (operator language w/o left recursion) • uses an explicit, auxilliary stack to construct tree • local constant folding in constructors • retain extra information to improve error recovery

flo

• flow analysis to identify optimization opportunities • finds common subexpressions and threads them together • finds possible code motion (hoisting, licm) & threads them • works inside out & incrementally (action routines)

Output is set of tables, trees, and threaded opportunities

Comp 12, Spring 1996 Lecture 5, Page 3 delay

Goals

1. determine the “general shape” of the object code 2. estimate cost of each “program segment” 3. determine an evaluation order for each program segment

Engineering

• single treewalk using recursive action routines – makes preliminary decisions and passes them down tree – on return, uses returned context to make final decisions • decides on profitability of potential optimizations • uses Sethi-Ullman modified to track cse’s created & deleted • delays unary complements; chooses destroyed operands • multiply into shifts and adds • distributes multiply over adds • more constant folding

Comp 12, Spring 1996 Lecture 5, Page 4 tla 1st part of TNBIND

Structure

• mutually recursive action routines pass around TNs • performs targeting, cost estimation, lifetime characterization, runtime stack optimization, and label assignment

Engineering

• targeting — parents suggest names for child’s return value • costs — sum over refs of code size, weighted by nesting depth • stack — use “dynamic temporaries” from parameter “pops” (an artifact of PDP-11 idiosyncracies)

Comp 12, Spring 1996 Lecture 5, Page 5 rank 2nd part of TNBIND

Goals

• assign priorities (ranks) to each TN for allocation • costs by number of uses, with arithmetic preference to dense uses

Engineering Divide references into four categories based on binding requirements

1. binding to a specific hardware register 2. must be bound to a register 3. must be bound to a memory location 4. may be bound to memory or a register

Bliss/11 allows programmer to specify 1–3 by declaration

Comp 12, Spring 1996 Lecture 5, Page 6 pack 3rd part of TNBIND

Goals

• make good use of the machine’s register set

• assign TN’s to same register when possible & profitable

Engineering

• build an interference graph • build a preference graph • interference graph drives allocation • preference graph drives assignment • uses bin packing to find an allocation (equivalent to graph coloring, different terminology) pack tries to minimize memory ops, moves, and initialization

TNBIND lives on (in spirit) in DEC

Comp 12, Spring 1996 Lecture 5, Page 7 code

Goals

• capitalize on all the earlier work • perform target-specific case analysis • generate outstanding code

Engineering

• myriad small details • original compiler used massive case analysis

• PQCC substituted automatic pattern matching techniques

• PQCC generated patterns from machine descriptions

Strong arguments for automating this kind of detailed case analysis

“In the final analysis, the quality of the local code has a greater impact on both the size and speed of the final program than any other optimization” Wulf et al., p. 89

Comp 12, Spring 1996 Lecture 5, Page 8 final

Goals

• produce the final “listing” and object code • perform some final optimizations on results of code

Engineering

• phase 1 examines adjacent instructions (lfp-style) – cross-jumping – removing code after unconditional branch – simplify algebraic identities involving literals – simplify tests and compares – address mode manipulations (PDP-11 specific) – redundant store elimination • phase 2 looks at each instruction for simplifications • phase 3 tries to use short branches whenever possible

It is difficult to determine to what extent the final optimizations would be needed if more complete algorithms, rather than heuristics, existed in earlier phases of the compiler. . . . we suspect that there will always be a role for a module similar to final in compilers with optimization aspirations similar to those of Bliss/11. Wulf et al., p. 125

Comp 12, Spring 1996 Lecture 5, Page 9