Optimising Lecture 1: Computer Science Tripos Part II - Lent 2007 Introduction Tom Stuart

A non-optimising An optimising compiler

character stream character stream lexing token stream token stream parsing parse tree optimisation parse tree translation intermediate code optimisation intermediate code decompilation code generation target code optimisation target code

Optimisation (really “amelioration”!) Optimisation Good humans write simple, maintainable, general code. = Compilers should then remove unused generality, and hence hopefully make the code: Analysis • Smaller • Faster + • Cheaper (e.g. lower power consumption) Transformation Analysis + Transformation Analysis + Transformation

• An analysis shows that your program has some property... • Transformation does something dangerous. • ...and the transformation is designed to be • Analysis determines whether it’s safe. safe for all programs with that property... • ...so it’s safe to do the transformation. Analysis + Transformation Analysis + Transformation

int main(void) int main(void) { { return 42; return 42; } }

int f(int x) int f(int x) { { return x * 2; return x * 2; } } ✓ Analysis + Transformation Analysis + Transformation

int main(void) int main(void) { { return f(21); return f(21); } }

int f(int x) int f(int x) { { return x * 2; return x * 2; } } ✗

Analysis + Transformation Analysis + Transformation

int t = k * 2; while (i <= k*2) { while (i <= t) { j = j * i; j = j * i; i = i + 1; i = i + 1; } } ✓ Analysis + Transformation Analysis + Transformation

int t = k * 2; while (i <= k*2) { while (i <= t) { k = k - i; k = k - i; i = i + 1; i = i + 1; } } ✗ Stack-oriented code 3-address code iload 0 MOV t32,arg1 iload 1 MOV t33,arg2 iadd ADD t34,t32,t33 iload 2 MOV t35,arg3 iload 3 MOV t36,arg4 iadd ADD t37,t35,t36 imul ? MUL res1,t34,t37 ireturn EXIT

C into 3-address code C into 3-address code Part A: Classical ‘Dataflow’ Optimisations ENTRY fact int fact (int n) { 1 Introduction MOV t32,arg1 if (n == 0) { CMPEQ t32,#0,lab1 Recall the structure of a simpreturnle non-optimising 1; compiler (e.g. from CST Part Ib ). SUB arg1,t32,#1 } else { CALL fact

return nparse * fact(n-1); MUL res1,t32,res1 character -token - -intermediate -target  stream }stream tree code co de EXIT lex syn trn gen lab1: MOV res1,#1  }     EXIT In such a compiler “intermediate code” is typically a stack-oriented abstract machine code (e.g. OCODE in the BCPL compiler or JVM for Java). Note that stages ‘lex’, ‘syn’ and ‘trn’ are in principle source language-depFloendewgrant, but nphsot target architecture-dependent whereas Flowgraphs stage ‘gen’ is target dependent but not language dependent. To ease optimisation (really ‘amelioration’!) we need an intermediate code which makes ENTRY fact inter-instruction dependencies explicit to ease moving computations around. Typically we MOV t32,arg1 use 3-address code (sometime• A gras phcalled repr‘qesentationuadruples of’). aThi programs is also near to modern RISC archi- tectures and so facilitates target-dependent stage ‘gen’. This intermediate code is stored in Each node stores 3-address instruction(s) CMPEQ t32,#0 a flowgraph G—a graph• whose nodes are labelled with 3-address instructions (or later ‘basic blocks’). We write • Each edge represents (potential) control flow: SUB arg1,t32,#1

MOV res1,#1 CALL fact pred(n) = n! (n!, n) edges(G) { | ∈ } succ(n) = n! (n, n!) edges(G) { | ∈ } MUL res1,t32,res1 for the sets of predecessor and successor nodes of a given node; we assume common graph EXIT EXIT theory notions like path and cycle. Forms of 3-address instructions (a, b, c are operands, f is a procedure name, and lab is a label): Basic blocks Basic blocks ENTRY f: no predecessors; • ENTRY fact EXIT: no successors; • MOV t32,arg1 ALU a, b, c: oneA maximalsuccess orsequence(ADD, MUL of ,instructions. . . ); n1, ..., nk which have • CMPEQ t32,#0 CMP cond a, b, lab: two successors (CMPNE, CMPEQ, . . . ) — in straight-line code these • " # exactly one predecessor (except possibly for n1) instructions take•a label argument (and fall through to the next instruction if the branch SUB arg1,t32,#1 doesn’t occur), whexactlereas yin onea fl osuccessorwgraph th (exceptey hav epossibltwo succey forss nork) edges. • MOV res1,#1 CALL fact Multi-way branches (e.g. case) can be considered for this course as a cascade of CMP in- structions. Procedure calls (CALL f) and indirect calls (CALLI a) are treated as atomic MUL res1,t32,res1 instructions like ALU a, b, c. Similarly one distinguishes MOV a, b instructions (a special case EXIT EXIT of ALU ignoring one operand) from indirect memory reference instructions (LDI a, b and STI a, b) used to represent pointer dereference including accessing array elements. Indirect branches (used for local goto exp ) terminate a basic block (see later); their successors must " # include all the possible branch targets (see the description of Fortran ASSIGNED GOTO).

4 Basic blocks Basic blocks ENTRY fact

ENTRY fact MOV t32,arg1 MOV t32,arg1 CMPEQ t32,#0 CMPEQ t32,#0

SUB arg1,t32,#1 SUB arg1,t32,#1 MOV res1,#1 CALL fact MOV res1,#1 CALL fact EXIT MUL res1,t32,res1 MUL res1,t32,res1 EXIT

EXIT Basic blocks Basic blocks

Reduce time and space requirements for analysis algorithms by calculating and storing data flow information

A basic block doesn’t contain any interesting control flow. once per block (and recomputing within a block if required) instead of once per instruction.

Basic blocks Basic blocks

MOV t32,arg1 MOV t32,arg1 MOV t33,arg2 MOV t33,arg2 ADD t34,t32,t33 ADD t34,t32,t33 MOV t35,arg3 MOV t35,arg3 MOV t36,arg4 MOV t36,arg4 ADD t37,t35,t36 ADD t37,t35,t36 MUL res1,t34,t37 MUL res1,t34,t37?

Basic blocks Types of analysis (and hence optimisation)

? Scope: • Within basic blocks (“local” / “peephole”) ? • Between basic blocks (“global” / “intra-procedural”) ? • e.g. , available expressions • Whole program (“inter-procedural”) ? • e.g. unreachable-procedure elimination ? Peephole optimisation Types of analysis (and hence optimisation) ADD t32,arg1,#1 replace MOV r0,r1 MOV x,y Type of information: matches MOV r1,r0 MOV y,x • Control flow MUL t33,r0,t32 with • Discovering control structure (basic blocks, MOV x,y loops, calls between procedures) ADD t32,arg1,#1 • Data flow MOV r0,r1 • Discovering data flow structure (variable uses, MUL t33,r0,t32 expression evaluation)

Finding basic blocks Finding basic blocks ENTRY fact 1. Find all the instructions which are leaders: MOV t32,arg1 • the first instruction is a leader; CMPEQ t32,#0,lab1 • the target of any branch is a leader; and SUB arg1,t32,#1 CALL fact • any instruction immediately following a branch is a leader. MUL res1,t32,res1 2. For each leader, its basic block consists of EXIT itself and all instructions up to the next leader. lab1: MOV res1,#1 EXIT Finding basic blocks Summary ENTRY fact • Structure of an optimising compiler MOV t32,arg1 • Why optimise? CMPEQ t32,#0,lab1 • Optimisation = Analysis + Transformation SUB arg1,t32,#1 3-address code CALL fact • MUL res1,t32,res1 • Flowgraphs EXIT • Basic blocks lab1: MOV res1,#1 • Types of analysis EXIT • Locating basic blocks Control-flow analysis

? Lecture 2: Unreachable-code & ? ? -procedure elimination ? ? Discovering information about how control (e.g. the program counter) may move through a program. Intra-procedural analysis Dead vs. unreachable code

An intra-procedural analysis collects information about the code inside a single procedure. int f(int x, int y) { We may repeat it many times (i.e. once per procedure), int z = x * y; DEAD but information is only propagated within return x + y; the boundaries of each procedure, } not between procedures.

One example of an intra-procedural control-flow Dead code computes unused values. optimisation (an analysis and an accompanying (Waste of time.) transformation) is unreachable-code elimination.

Dead vs. unreachable code Dead vs. unreachable code

Deadness is a data-flow property: int f(int x, int y) { “May this data ever arrive anywhere?” return x + y; int f(int x, int y) { int z = x * y; UNREACHABLE } int z = x * y; ⋯ Unreachable code cannot possibly be executed. ? (Waste of space.) ? ? Dead vs. unreachable code Safety of analysis

Unreachability is a control-flow property: int f(int x, int y) { “May control ever arrive here?” if (g(x)) { int z = x * y; UNREACHABLE? ? ? } ? return x + y; } ⋯ int z = x * y; bool g(int x) { } return false; } ✓ Safety of analysis Safety of analysis

int f(int x, int y) { int f(int x, int y) { if (g(x)) { if (g(x)) { int z = x * y; UNREACHABLE? int z = x * y; UNREACHABLE? } } return x + y; return x + y; } }

bool g(int x) { In general, this is undecidable. return ...x...; (Arithmetic is undecidable; cf. halting problem.) } ? Safety of analysis Safety of analysis

• If we decide that code is unreachable then we may do something dangerous (e.g. remove it!)... • Many interesting properties of programs are undecidable and cannot be computed precisely... • ...so the safe strategy is to overestimate reachability. • ...so they must be approximated. • If we can’t easily tell whether code is reachable, we just assume that it is. (This is conservative.) • A broken program is much worse than an inefficient one... • For example, we assume • ...so we must err on the side of safety. • that both branches of a conditional are reachable • and that loops always terminate.

Safety of analysis Safety of analysis Naïvely, if (false) { int z = x * y; } Another source of uncertainty is encountered this instruction is reachable, when constructing the original flowgraph: while (true) { the presence of indirect branches ... (also known as “computed jumps”). } int z = x * y; and so is this one. Safety of analysis Safety of analysis ⋯ MOV t33,#&lab1 ⋯ MOV t34,#&lab2 ⋯ MOV t35,#&lab3 MOV t32,r1 MOV t32,r1 lab1: ADD r0,r1,r2 JMP lab1 ⋯ ⋯ ? ⋯ JMPI t32 ADD r0,r1,r2 lab2: MUL r3,r4,r5 lab1: ADD r0,r1,r2 ⋯ ? ⋯ ⋯ lab3: MOV r0,r1 ? ⋯ Safety of analysis Safety of analysis

MOV t33,#&lab1 Again, this is a conservative overestimation of reachability. MOV t34,#&lab2 MOV t35,#&lab3 In the worst-case scenario in which branch-address ⋯ computations are completely unrestricted (i.e. the target of a jump could be absolutely anywhere), the presence of an indirect branch forces us to assume that all ADD r0,r1,r2 MUL r3,r4,r5 MOV r0,r1 instructions are potentially reachable ⋯ ⋯ ⋯ in order to guarantee safety. Safety of analysis Safety of analysis

program instructions “reachable”

sometimes never imprecision executed executed

Safety of analysis Unreachable code

“reachable” This naïve reachability analysis is simplistic, but has the advantage of corresponding to a very straightforward operation on the flowgraph of a procedure: Safe but imprecise. 1. mark the procedure’s entry node as reachable; 2. mark every successor of a marked node as reachable and repeat until no further marking is required.

Unreachable code Unreachable code

ENTRY f ENTRY f

? ? ?

? ? ? ?

EXIT EXIT

Unreachable code Unreachable code

Programmers rarely write code which is completely unreachable in this naïve sense. Obviously, if the conditional expression in an if Why bother with this analysis? statement is literally the constant “false”, it’s safe to assume that the statements within are unreachable.

• Naïvely unreachable code may be introduced as a if (false) { result of other optimising transformations. int z = x * y; UNREACHABLE • With a little more effort, we can do a better job. }

But programmers never write code like that either. Unreachable code Unreachable code

However, other optimisations might produce such code. However, other optimisations might produce such code. For example, copy propagation: For example, copy propagation:

bool debug = false; ⋯ ⋯ if (debug) { if (false) { int z = x * y; int z = x * y; UNREACHABLE } }

Unreachable code Unreachable code

We can try to spot (slightly) more subtle things too. Note, however, that the reachability analysis no longer consists simply of checking whether any paths to an • if (!true) {... } instruction exist in the flowgraph, but whether any of the paths to an instruction are actually executable. • if (false && ...) {... } • if (x != x) {... } With more effort we may get arbitrarily clever at spotting non-executable paths in particular cases, • while (true) {... } ... but in general the undecidability of arithmetic means that • ... we cannot always spot them all.

Unreachable code Unreachable code For example, straightening is an optimisation which can eliminate jumps between basic blocks by coalescing them:

ENTRY f

Although unreachable-code elimination can only make a ? ? program smaller, it may enable other optimisations which make the program faster. ? ?

?

EXIT Unreachable code Unreachable code For example, straightening is an optimisation which can For example, straightening is an optimisation which can eliminate jumps between basic blocks by coalescing them: eliminate jumps between basic blocks by coalescing them:

ENTRY f ENTRY f

? Straightening ? has removed a branch ? instruction, so the new program will execute faster. ? ?

EXIT EXIT Inter-procedural analysis Unreachable procedures

An inter-procedural analysis collects information about an entire program.

Information is collected from the instructions of each Unreachable-procedure elimination is very similar in procedure and then propagated between procedures. spirit to unreachable-code elimination, but relies on a different data structure known as a call graph. One example of an inter-procedural control-flow optimisation (an analysis and an accompanying transformation) is unreachable-procedure elimination.

Call graphs Call graphs Again, the precision of the graph is compromised in the presence of indirect calls. main

main f g f g h i j h And as before, this is a safe overestimation of reachability.

Call graphs Unreachable procedures

In general, we assume that a procedure containing an The reachability analysis is virtually identical to that indirect call has all address-taken procedures as successors used in unreachable-code elimination, but this time in the call graph — i.e., it could call any of them. operates on the call graph of the entire program (vs. the flowgraph of a single procedure): This is obviously safe; it is also obviously imprecise. 1. mark procedure main as callable; As before, it might be possible to do better by application of more careful methods 2. mark every successor of a marked node as callable (e.g. tracking data-flow of procedure variables). and repeat until no further marking is required.

Unreachable procedures Unreachable procedures

main main

f g f g

i j h h Safety of transformations If simplication

• All instructions/procedures to which control may flow at execution time will definitely be marked by the reachability analyses... Empty then in if-then • ...but not vice versa, since some marked nodes might never be executed. if (f(x)) { • Both transformations will definitely not delete } any instructions/procedures which are needed to execute the program... (Assuming that f has no side effects.) • ...but they might leave others alone too.

If simplication If simplication

Empty else in if-then-else Empty then in if-then-else

if (f(x)) { if (f(x)) { z = x * y; } else { } else { z = x * y; } }

If simplication If simplication

Empty then in if-then-else Empty then and else in if-then-else

if (!f(x)) { if (f(x)) { } else { } else { z = x * y; } }

If simplication If simplication

Nested if with common subexpression Constant condition if (x > 3 && t) { ⋯ if (true) { if (x > 3) { z = x * y; z = x * y; } } else { z = y - x; } } Loop simplification Loop simplification int x = 0; int i = 0; int x = 0; i = i + 1; int i = 0; x = x + i; while (i < 4) { i = i + 1; i = i + 1; x = x + i; x = x + i; i = i + 1; } x = x + i; i = i + 1; x = x + i;

Loop simplification Summary

• Control-flow analysis operates on the control structure of a program (flowgraphs and call graphs) • Unreachable-code elimination is an intra-procedural optimisation which reduces code size int x = 10; int i = 4; • Unreachable-procedure elimination is a similar, inter- procedural optimisation making use of the program’s call graph • Analyses for both optimisations must be imprecise in order to guarantee safety

Data-flow analysis

MOV t32,arg1 MOV t33,arg2 ADD t34,t32,t33 Lecture 3: MOV t35,arg3 Live variable analysis MOV t36,arg4 ADD t37,t35,t36 MUL res1,t34,t37

Discovering information about how data (i.e. variables and their values) may move through a program.

Motivation Liveness

Programs may contain Liveness is a data-flow property of variables: • code which gets executed but which has no useful “Is the value of this variable needed?” (cf. dead code) effect on the program’s overall result; int f(int x, int y) { • occurrences of variables being used before they are int z = x * y; defined; and ⋯ • many variables which need to be allocated registers and/or memory locations for compilation. ? The concept of variable liveness is useful in ? dealing with all three of these situations. ? Liveness Semantic vs. syntactic

At each instruction, each variable in the program is either live or dead.

We therefore usually consider liveness from an There are two kinds of variable liveness: instruction’s perspective: each instruction (or node of the flowgraph) has an associated set of live variables. • Semantic liveness ⋯ • Syntactic liveness n: int z = x * y; live(n) = { s, t, x, y } return s + t;

Semantic vs. syntactic Semantic vs. syntactic

A variable x is semantically live at a node n if there is A variable x is semantically live at a node n if there is some execution sequence starting at n whose (externally some execution sequence starting at n whose (externally observable) behaviour can be affected by changing the observable) behaviour can be affected by changing the value of x. value of x.

int x = y * z; x LIVE int x = y * z; x DEAD ⋯ ⋯ return x; x = a + b; ⋯ return x;

Semantic vs. syntactic Semantic vs. syntactic

A variable is syntactically live at a node if there is a path to the exit of the flowgraph along which its value may be used before it is redefined. Semantic liveness is concerned with the execution behaviour of the program. Syntactic liveness is concerned with properties of the syntactic structure of the program. This is undecidable in general. (e.g. Control flow may depend upon arithmetic.) Of course, this is decidable.

So what’s the difference?

Semantic vs. syntactic Semantic vs. syntactic int t = x * y; t DEAD MUL t,x,y if ((x+1)*(x+1) == y) { ADD t32,x,#1 t = 1; MUL t33,t32,t32 } CMPNE t33,y,lab1 MOV t,#1 if (x*x + 2*x + 1 != y) { lab1: MUL t34,x,x t = 2; MUL t35,x,#2 } ADD t36,t34,t35 return t; ADD t37,t36,#1 Semantically: one of the conditions will be true, so on CMPEQ t37,y,lab2 every execution path t is redefined before it is returned. MOV t,#2 lab2: MOV res1,t The value assigned by the first instruction is never used. Semantic vs. syntactic Semantic vs. syntactic

LIVE MUL t,x,y t ADD t32,x,#1 MUL t33,t32,t32 On this path through the CMPNE t33,y flowgraph, t is not redefined before it’s MOV t,#1 used, so t is syntactically So, as we’ve seen before, syntactic liveness MUL t34,x,x live at the first MUL t35,x,#2 is a computable approximation of ADD t36,t34,t35 instruction. ADD t37,t36,#1 semantic liveness. CMPEQ t37,y Note that this path never MOV t,#2 actually occurs during execution. MOV res1,t

2 Live Variable AnalSemanticysis—LV vs.A syntactic Semantic vs. syntactic

A variable x is semantically live at nodeprnogramif th variablesere is some execution sequence syntacticallstartingy livate n at n whose I/O behaviour can be affected by changing the value of x. A variable x is syntactically live at node n if there is a path in the flowgraph to a node n! at which the current value of x may be used (i.e. a path from n to n! which contains no definition of x and with n! containing a reference to x). Note that such a path may not actually occur during any execution, e.g. semantically semantically imprecision l1: ; /* is ’t’ live here?live at n */ dead at n if ((x+1)*(x+1) == y) t = 1; if (x*x+2*x+1 != y) t = 2; l2: print t; Semantic vs. syntactic Live variable analysis LVA is a backwards data-flow analysis: usage information Because of the optimisations we will later base on the results of LVA, frsafetom futury consistse instructionsof moustver- be propagated backwards estimating liveness, i.e. through the program to discover which variables are live. sem-live(n) syn-live(n) ⊆ int f(int x, int y) { where live(n) is the set of variable live at n. Logicians might note the connection intof szeman = x tic* y; Using syntactic methods, we ⋯ liveness and = and also syntactic livenesssafely overandestimate. liveness. | " From the non-algorithmic definition of syntactic liveness we can obtain dataflow equatiprintons: z; int a = z*2; if (z > 5) { live(n) = live(s) def (n) ref (n)   \ ∪ s succ(n) ∈ # Liv e variable analysis Live variable analysis You might prefer to derive these in two stages, writing in-live(n) for variables live on entry to node n and out-live(n) for those live on exit. This gives

in-liveV(ariablen) =livenessout-l flowsive (backwar(n) defds) thr(nough) ref (n) the program in a continuous\ stream. ∪ out-live(n) = in-live(s) An instruction makes a variable live when it references (uses) it. Each instructions suc chas(n) an effect on the liveness information∈ # as it flows past. Here def (n) is the set of variables defined at node n, i.e. x in the instruction x = x+y and { } ref (n) the set of variables referenced at node n, i.e. x, y . { } Notes: These are ‘backwards’ flow equations: liveness depends on the future whereas normal • execution flow depends on the past;

Any solution of these dataflow equations is safe (w.r.t. semantic liveness). • Problems with address-taken variables—consider: int x,y,z,t,*p; x = 1, y = 2, z = 3; p = &y; if (...) p = &y; *p = 7; if (...) p = &x; t = *p; print z+t;

8 Live variable analysis Live variable analysis { eb, fc , }e, f }

a = b * c;c; REFERENCE b, c { fe , }f } An instruction makes a variable dead d = e + 1; REFERENCE e when it defines (assigns to) it. { f} }

print f; REFERENCE f { }

Live variable analysis Live variable analysis { a} } We can devise functions ref(n) and def(n) a = 7; DEFINE a which give the sets of variables referenced and defined by the instruction at node n. { a, }b }

b = 11; DEFINE b ref( x = 3 ) = { } ref( print x ) = { x } def( x = 3 ) = { x } def( print x ) = { } { a, b, }c }

c = 13; DEFINE c ref( x = x + y ) = { x, y } { a, b, c } def( x = x + y ) = { x }

Live variable analysis Live variable analysis

As liveness flows backwards past an instruction, we If an instruction both references and defines want to modify the liveness information by adding any variables, we must remove the defined variables variables which it references (they become live) and before adding the referenced ones. removing any which it defines (they become dead).

{ x, y } { y } { xz, }zy , }z } def( x = x + y ) = { x } ref( print x ) = { x } def( x = 3 ) = { x } x = x + y ref( x = x + y ) = { x, y } { y } { x, y } { x, z }

Live variable analysis Live variable analysis n n n n in-live( ) = !out-live( ) \ def ( )" ∪ ref ( ) So, if we consider in-live(n) and out-live(n), the sets of variables which are live in-live(n) = (out-live(n) ∖ def(n)) ∪ ref(n) immediately before and immediately after = ({ x, z } ∖ { x }) ∪ { x, y } n: x = x + y a node, the following equation must hold: = { z } ∪ { x, y } = { x, y, z } out-live(n) = { , } n n n n x z in-live( ) = !out-live( ) \ def ( )" ∪ ref ( )

def(n) = { x } ref(n) = { x, y } Live variable analysis Live variable analysis

So we know how to calculate in-live(n) from the values of def(n), ref(n) and out-live(n). But how do we calculate out-live(n)? In straight-line code each node has a unique successor, and the variables live at the exit of a ∪ in-live(n) = (out-live(n) ∖ def(n)) ref(n) node are exactly those variables live at the entry of its successor. n: x = x + y

out-live(n) = ?

Live variable analysis Live variable analysis l: out-live(l) = { s, t, x, y } in-live(m) = { s, t, x, y } m: z = x * y; In general, however, each node has an arbitrary out-live(m) = { s, t, z } number of successors, and the variables live at in-live(n) = { s, t, z } the exit of a node are exactly those variables n: print s + t; live at the entry of any of its successors. out-live(n) = { z } in-live(o) = { z } o: Live variable analysis Live variable analysis m: x = 17; { x, z } { x, z } n: y = 19; { x, z } ∪ { y, z } So the following equation must also hold: { x, y, z } = { x, y, z } out-live(n) = ! in-live(s) { x, z } { y, z } s∈succ(n) o: s = x * 2; p: t = y + 1; { s, z } { t, z } { s, z } { t, z } Data-flow equations Data-flow equations

These are the data-flow equations for live variable analysis, and together they tell us everything we Each is expressed in terms of the other, so we can need to know about how to propagate liveness combine them to create one overall liveness equation. information through a program.

n n n n in-live( ) = !out-live( ) \ def ( )" ∪ ref ( ) live(n) =  live(s) \ def (n) ∪ ref (n) out-live(n) = in-live(s) # ! s∈succ(n)   s∈succ(n) Algorithm Algorithm

“Doing the analysis” consists of computing a value live(n) for each node n in a flowgraph such that the We now have a formal description of liveness, but we liveness data-flow equations are satisfied. need an actual algorithm in order to do the analysis. A simple way to solve the data-flow equations is to adopt an iterative strategy.

Algorithm Algorithm def x, y { } def x, y { }

def z { x}, y } def z { x}, y }

ref x { x}, y, z } ref x { x}, y, z }

✗ ref y { y}, z } ✓ ref y { yx}, zy , }z }

ref z { z} } ref z { z} }

Algorithm Algorithm

Here we are unsure whether the assignment *p = 7; assigns to x or y. Similarly we are uncertain whether the reference t = *p; references x or y (but we are certain that bThisoth algorithm is guaranteed to terminate since there p reference ). These are ambiguous definitions and references. For safety we treat (for LVarA)e a finite number of variables in each program and an ambiguous referencefor asi re=fe re1nc ingtoany nadd doress-tak live[i]en variable (cf. :=lab el{}variable and pro- the effect of one iteration is monotonic. cedure variables—anwhileindirec (live[]t reference is ju scthanga ‘varesiabl)e ’ dovariable). Similarly an ambiguous definition is just ignored. Hence in the above, for *p = 7; we have ref = p and def = { } {} whereas t = *p; has forref = ip , x=, y 1and todef =n tdo. { } { } Furthermore, although any solution to the data-flow Algorithm (implement live as an array live[]): equations is safe, this algorithm is guaranteed to give for i=1 to N do live[i] := {}  live[s] \ def (i) ∪ ref (i) the smallest (and therefore most precise) solution. live[i] := # while (live[] changes) do s∈succ(i)   for i=1 to N do (See the Knaster-Tarski theorem if you’re interested.) live[i] := live[s] def (i) ref (i).   \ ∪ s succ(i) ∈ #   Clearly if the algorithm terminates then it results in a solution of the dataflow equation. Actually the theory of complete partial orders (cpo’s) means that it always terminates with the least solution, the one with as fewAlgvariablesorithmas possible live consistent with safety. (The Safety of analysis powerset of the set of variables used in the program is a finite lattice and the map from old-liveness to new-liveness in the loImplementationop is continuous.) notes: • Syntactic liveness safely overapproximates semantic Notes: liveness. • If the program has n variables, we can implement we can implement the live[] array as a bit vector using bit k being set to repreThesent usual problem occurs in the presence of address- • each element of live[] as an n-bit value, with • that variable xk (acceachor dinbitg rtoepraesentinggiven num theberin livgenessscheme) ofi sonelive. variable. taken variables (cf. labels, procedures): ambiguous we can speed execution and reduce store consumption by storing liveness informationdefinitions and references. For safety we must • We can store liveness once per basic block and only once per•basic block and re-computing within a basic block if needed (typicalloyverestimate ambiguous references (assume all only during the ruecomputese of LVA to insidevalidate a blocka tran whensformation necessar). In thisy. In casethis the dataflo• w equations becomecase: , given a basic block n of instructions i1, ..., ik: address-taken variables are referenced) and • underestimate ambiguous definitions (assume no live(n) = live(s) def (ik) ref (ik) def (i1) ref (i1) variables are defined); this increases the size of the   \ ∪ · · · \ ∪ s succ(n) ∈ # smallest solution.   where (i1, . . . , ik) are the instructions in basic block n.

3 Available expressions

Available expressions analysis (AVAIL) has many similarities to LVA. An expression e (typ- ically the RHS of a 3-address instruction) is available at node n if on every path leading to n the expression e has been evaluated and not invalidated by an intervening assignment to a variable occurring in e. This leads to dataflow equations:

avail(n) = p pred(n) (avail(p) kill(p) gen(p)) if pred(n) = ∈ \ ∪ " {} avail(n) = if pred(n) = . &{} {} Here gen(n) gives the expressions freshly computed at n: gen(x = y+z) = y + z , for exam- { } ple; but gen(x = x+z) = because, although this instruction does compute x + z, it then {}

9 Safety of analysis Summary MOV x,#1 • Data-flow analysis collects information about how MOV y,#2 data moves through a program MOV z,#3 Variable liveness is a data-flow property MOV t32,#&x • MOV t33,#&y • Live variable analysis (LVA) is a backwards data-flow MOV t34,#&z analysis for determining variable liveness ⋯ LVA may be expressed as a pair of complementary def(m) = { } • m: data-flow equations, which can be combined STI t35,#7 ref(m) = { t35 } ⋯ def(n) = { } A simple iterative algorithm can be used to find the n: t36 • LDI t36,t37 ref(n) = { t37, x, y, z } smallest solution to the LVA data-flow equations

Motivation

Lecture 4: Programs may contain code whose result is needed, but in which some computation is simply a redundant repetition of earlier analysis computation within the same program. The concept of expression availability is useful in dealing with this situation.

Expressions Availability

Any given program contains a finite number of expressions Availability is a data-flow property of expressions: (i.e. computations which potentially produce values), “Has the value of this expression already been computed?” so we may talk about the set of all expressions of a program. ? ? int z = x * y; ? print s + t; ⋯ int w = u / v; ⋯ int z = x * y; } program contains expressions { x*y, s+t, u/v, ... }

Availability Availability

At each instruction, each expression in the program So far, this is all familiar from live variable analysis. is either available or unavailable. Note that, while expression availability and variable We therefore usually consider availability from an liveness share many similarities (both are simple data-flow instruction’s perspective: each instruction (or node of the properties), they do differ in important ways. flowgraph) has an associated set of available expressions. By working through the low-level details of the availability int z = x * y; property and its associated analysis we can see where the print s + t; differences lie and get a feel for the capabilities of the n: int w = u / v; avail(n) = { x*y, s+t } general data-flow analysis framework. ⋯ Semantic vs. syntactic Semantic vs. syntactic

For example, availability differs from earlier examples in a subtle but important way: we want An expression is semantically available at a node n if its to know which expressions are definitely available value gets computed (and not subsequently invalidated) (i.e. have already been computed) at an along every execution sequence ending at n. instruction, not which ones may be available.

As before, we should consider the distinction int x = y * z; between semantic and syntactic (or, alternatively, ⋯ dynamic and static) availability of expressions, and return y * z; y*z AVAILABLE the details of the approximation which we hope to discover by analysis.

Semantic vs. syntactic Semantic vs. syntactic

An expression is syntactically available at a node n if its value gets computed (and not subsequently An expression is semantically available at a node n if its invalidated) along every path from the entry of value gets computed (and not subsequently invalidated) the flowgraph to n. along every execution sequence ending at n. As before, semantic availability is concerned with int x = y * z; the execution behaviour of the program, whereas ⋯ syntactic availability is concerned with the y = a + b; program’s syntactic structure. ⋯ return y * z; y*z UNAVAILABLE And, as expected, only the latter is decidable.

Semantic vs. syntactic Semantic vs. syntactic

if ((x+1)*(x+1) == y) { ADD t32,x,#1 s = x + y; MUL t33,t32,t32 } CMPNE t33,y,lab1 ADD s,x,y if (x*x + 2*x + 1 != y) { lab1: MUL t34,x,x t = x + y; MUL t35,x,#2 } ADD t36,t34,t35 return x + y; x+y AVAILABLE ADD t37,t36,#1 CMPEQ t37,y,lab2 Semantically: one of the conditions will be true, so on ADD t,x,y every execution path x+y is computed twice. lab2: ADD res1,x,y The recomputation of x+y is redundant. Semantic vs. syntactic Semantic vs. syntactic

ADD t32,x,#1 MUL t33,t32,t32 CMPNE t33,y On this path through the flowgraph, x+y is only If an expression is deemed to be available, we ADD s,x,y computed once, so x+y may do something dangerous (e.g. remove an instruction which recomputes its value). MUL t34,x,x is syntactically unavailable MUL t35,x,#2 ADD t36,t34,t35 at the last instruction. ADD t37,t36,#1 Whereas with live variable analysis we found CMPEQ t37,y safety in assuming that more variables were live, here we find safety in assuming that fewer ADD t,x,y Note that this path never actually occurs during expressions are available. ADD res1,x,y execution. x+y UNAVAILABLE Semantic vs. syntactic Semantic vs. syntactic

program expressions syntactically available at n

2 Live Variable Analysis—LVA semantically semantically imprecision A variable x is semantically alivevailableat no atd en n if theunare vailableis some atex necution sequence starting at n whose I/O behaviour can be affected by changing the value of x. A variable x is syntactically live at node n if there is a path in the flowgraph to a node n! at which the current Semanticvalue of x may be vs.used (i.syntactice. a path from n to n! which contains no Warning definition of x and with n! containing a reference to x). Note that such a path may not actually occur during any execution, e.g. l1: ; /* is ’t’ live here? */ Danger: there is a standard presentation of if ((x+1)*(x+1) == y) t = 1; available expression analysis (textbooks, notes if (x*x+2*x+1sem-!=avy)ailt(n=) 2;⊇ syn-avail(n) for this course) which is formally satisfying but l2: print t; contains an easily-overlooked subtlety. Because of the optimisations we will later base on the results of LVA, safety consists of over- estimating liveness, i.e. This time, we safely underestimate availability. We’ll first look at an equivalent, more intuitive (cf. sem- live ( n ) sy n - live ( n ) ) bottom-up presentation, then amend it slightly ⊆ to match the version given in the literature. where live(n) is the set of variable live at n. Logicians might note the connection of semantic liveness and = and also syntactic liveness and . | " From the non-algorithmic definition of syntactic liveness we can obtain dataflow equations:

live(n) = live(s) def (n) ref (n) Available expression \ anal∪ ysis Available expression analysis s succ(n) ∈ # Available expressions is a forwards data-flow analysis: You might prefer to derive these in two stages, writing in-live(n) for variables live on entry information from past instructions must be propagated to node n and out-live(n) for those live on exit. This gives forwards through the program to discover which in-live(nexpr) =essionsout-l arivee( anvailable) def (.n) ref (n) Unlike variable liveness, expression availability flows \ ∪ out-live(n) = in-live(s) forwards through the program. t = x * y; s succ(n) print x * y; ∈ # if (x*y > 0) As in liveness, though, each instruction has an effect Here def (n) is the set of variables defined at node n, i.e. x in the instruction x = x+y and { } on the availability information as it flows past. ref (n) the set of variables referenced at node n, i.e. x, y . ⋯ { } Notes: int z = x * y; These are ‘backwards’ flow equations: liveness depends on the future whereas normal • } execution flow depends on the past;

Any solution of these dataflow equations is safe (w.r.t. semantic liveness). • Available expression analysis Available expression analysis Problems with address-taken variables—consider: { } int x,y,z,t,*p; x = 1, y = 2, z = 3; print a*b; GENERATE a*b p = &y; if (...) p = &y; { a*b} } *p = 7; An instruction makes an expression available c = d + 1;1; GENERATE d+1 if (...) p when= &x; it generates (computes) its current value. t = *p; { a*b , }d+1 } print z+t; e = f / g;g; GENERATE f/g 8 { a*b, d+1 , }f/g } Available expression analysis Available expression analysis { a*b, c+1, d/e, d-1 }

a = 7; KILL a*b { a*bc+1, c+1d/e, d/ed-1, }d-1 } An instruction makes an expression unavailable c = 11; KILL c+1 when it kills (invalidates) its current value. { c+1d/e, d/ed-1, }d-1 }

d = 13; KILL d/e, d-1 { d/e} , d-1 }

Available expression analysis Available expression analysis

So, in the following, Ex is the set of expressions in the As in LVA, we can devise functions gen(n) and kill(n) program which contain occurrences of x. which give the sets of expressions generated and killed by the instruction at node n. gen( x = 3 ) = { } gen( print x+1 ) = { x+1 }

The situation is slightly more complicated this time: an kill( x = 3 ) = Ex kill( print x+1 ) = { } assignment to a variable x kills all expressions in the program which contain occurrences of x. gen( x = x + y ) = { x+y }

kill( x = x + y ) = Ex

Available expression analysis Available expression analysis

As availability flows forwards past an instruction, we If an instruction both generates and kills expressions, want to modify the availability information by adding any we must remove the killed expressions after adding the expressions which it generates (they become available) generated ones (cf. removing def(n) before adding ref(n)). and removing any which it kills (they become unavailable).

{ y+1 } { x+1, y+1 } { x+1, y+1 } gen( x = x + y ) = { x+y } gen( print x+1 ) = { x+1 } kill( x = 3 ) = Ex x = x + y kill( x = x + y ) = Ex { x+1, y+1 } { y+1 } { x+1y+1, }y+1x+y , }y+1 }

Available expression analysis Available expression analysis n n n n out-avail( ) = !in-avail( ) ∪ gen( )" \ kill( ) So, if we consider in-avail(n) and out-avail(n), the sets of expressions which are available in-avail(n) = { x+1, y+1 } immediately before and immediately after a n: x = x + y node, the following equation must hold: out-avail(n) = (in-avail(n) ∪ gen(n)) ∖ kill(n) n n n n = ({ x+1, y+1 } ∪ { x+y }) ∖ { x+1, x+y } out-avail( ) = !in-avail( ) ∪ gen( )" \ kill( ) = { x+1, x+y, y+1 } ∖ { x+1, x+y } = { y+1 }

gen(n) = { x+y } kill(n) = { x+1, x+y } Available expression analysis Available expression analysis

As in LVA, we have devised one equation for calculating out-avail(n) from the values of gen(n), kill(n) and in-avail(n), When a node n has a single predecessor m, the and now need another for calculating in-avail(n). information propagates along the control-flow edge as you would expect: in-avail(n) = out-avail(m).

in-avail(n) = ? When a node has multiple predecessors, the expressions available at the entry of that node are exactly those n: x = x + y expressions available at the exit of all of its predecessors (cf. “any of its successors” in LVA). out-avail(n) = (in-avail(n) ∪ gen(n)) ∖ kill(n)

Available expression analysis Available expression analysis { x+5 } { y-7 } { x+5 } { y-7 } m: z = x * y; n: print x*y; { x+5, x*y } { x*y, y-7 } So the following equation must also hold:

{ x+5, x*y } ∩ { x*y } in-avail(n) = ! out-avail(p) { x*y, y-7 } o: x = 11; p∈pred(n) = { x*y } { } { } p: y = 13;

Data-flow equations Data-flow equations

These are the data-flow equations for available expression analysis, and together they tell us everything we need to Each is expressed in terms of the other, so we can know about how to propagate availability information combine them to create one overall availability equation. through a program.

in-avail(n) = ! out-avail(p) p∈pred(n) avail(n) = (avail(p) ∪ gen(p)) \ kill(p) ! " # n n n n p∈pred(n) out-avail( ) = !in-avail( ) ∪ gen( )" \ kill( )

Data-flow equations Data-flow equations

Danger: we have overlooked one important detail. avail(n) = ∩ ((avail(p) ∪ gen(p)) ∖ kill(p)) pred(n) = { } p ∈ pred(n) With this correction, our data-flow equation for n: x = 42; = ∩ { } expression availability is

(i.e. all expressions ((avail(p) ∪ gen(p)) \ kill(p)) if pred(n) "= { } = U avail(n) = p∈pred(n) in the program) ! {"} if pred(n) = { } Clearly there should be no expressions available here, so we must stipulate explicitly that avail(n) = { } if pred(n) = { }. Data-flow equations Data-flow equations

n n n n The functions and equations presented so far are in-live( ) = !out-live( ) \ def ( )" ∪ ref ( ) correct, and their definitions are fairly intuitive. out-live(n) = ! in-live(s) However, we may wish to have our data-flow s∈succ(n) equations in a form which more closely matches These differences are inherent in the analyses. that of the LVA equations, since this emphasises the similarity between the two analyses and in-avail(n) = ! out-avail(p) hence is how they are most often presented. p∈pred(n) n n n n A few modifications are necessary to achieve this. out-avail( ) = !in-avail( ) ∪ gen( )" \ kill( )

Data-flow equations Data-flow equations

n n n n in-live( ) = !out-live( ) \ def ( )" ∪ ref ( ) We might instead have decided to define gen(n) and kill(n) to coincide with the following (standard) definitions: out-live(n) = ! in-live(s) s∈succ(n) • A node generates an expression e if it must These differences are an arbitrary result of our definitions. compute the value of e and does not subsequently redefine any of the variables occuring in e. in-avail(n) = out-avail(p) ! • A node kills an expression e if it may redefine p∈pred(n) some of the variables occurring in e and does not n n n n subsequently recompute the value of e. out-avail( ) = !in-avail( ) ∪ gen( )" \ kill( )

Data-flow equations Data-flow equations

By the old definition: Since these new definitions take account of which gen( x = x + y ) = { x+y } expressions are generated overall by a node (and exclude kill( x = x + y ) = Ex those which are generated only to be immediately killed), we may propagate availability information through a node by removing the killed expressions before adding By the new definition: the generated ones, exactly as in LVA. gen( x = x + y ) = { } kill( x = x + y ) = Ex n n nn n out-avail( ) = !in-avail( ) \∪kgielln(( ))""∪\gkeilnl( ) (The new kill(n) may visibly differ when n is a basic block.)

Data-flow equations Algorithm

From this new equation for out-avail(n) we may produce our final data-flow equation for expression availability: • We again use an array, avail[], to store the available expressions for each node. ((avail(p) \ kill(p)) ∪ gen(p)) if pred(n) "= { } avail(n) = p∈pred(n) ! {"} if pred(n) = { } • We initialise avail[] such that each node has all expressions available (cf. LVA: no variables live). This is the equation you will find in the course • We again iterate application of the data-flow equation notes and standard textbooks on program at each node until avail[] no longer changes. analysis; remember that it depends on these more subtle definitions of gen(n) and kill(n). Algorithm Algorithm

for i = 1 to n do avail[i] := U We can do better if we assume that the flowgraph has a single entry node (the first node in avail[]). while (avail[] changes) do for i = 1 to n do Then avail[1] may instead be initialised to the avail[i] := ! ((avail[p] \ kill(p)) ∪ gen(p)) empty set, and we need not bother recalculating p∈pred(i) availability at the first node during each iteration.

Algorithm Algorithm

As with LVA, this algorithm is guaranteed to terminate avail[1] := {} since the effect of one iteration is monotonic (it only for i = 2 to n do avail[i] := U removes expressions from availability sets) and an empty availability set cannot get any smaller. while (avail[] changes) do for i = 2 to n do Any solution to the data-flow equations is safe, but avail[i] := ! ((avail[p] \ kill(p)) ∪ gen(p)) this algorithm is guaranteed to give the largest (and p∈pred(i) therefore most precise) solution.

Algorithm Algorithm Implementation notes: Implementation notes:

• If we arrange our programs such that each assignment assigns to a distinct temporary variable, we may • Again, we can store availability once per basic block number these temporaries and hence number the and recompute inside a block when necessary. Given expressions whose values are assigned to them. each basic block n has kn instructions n[1], ..., n[kn]: • If the program has n such expressions, we can implement each element of avail[] as an n-bit avail(n) = ! (avail(p) \ kill(p[1]) ∪ gen(p[1]) · · · \ kill(p[kp]) ∪ gen(p[kp])) value, with the mth bit representing the availability of p∈pred(n) expression number m.

Safety of analysis Analysis framework

• Syntactic availability safely underapproximates semantic availability. The two data-flow analyses we’ve seen, LVA and AVAIL, clearly share many similarities. • Address-taken variables are again a problem. For safety we must In fact, they are both instances of the same simple data- • underestimate ambiguous generation (assume no flow analysis framework: some program property is expressions are generated) and computed by iteratively finding the most precise solution to data-flow equations, which express the relationships • overestimate ambiguous killing (assume all between values of that property immediately before and expressions containing address-taken variables are immediately after each node of a flowgraph. killed); this decreases the size of the largest solution. Analysis framework Analysis framework

n n n n LVA’s data-flow equations have the form in-live( ) = !out-live( ) \ def ( )" ∪ ref ( ) ∪ out-live(n) = in-live(s) in(n) = (out(n) ∖ ...) ... out(n) = ∪ in(s) ! s ∈ succ(n) s∈succ(n) union over successors

AVAIL’s data-flow equations have the form in-avail(n) = ! out-avail(p) p∈pred(n) out(n) = (in(n) ∖ ...) ∪ ... in(n) = ∩ out(p) ∈ n n n n p pred(n) out-avail( ) = !in-avail( ) \ kill( )" ∪ gen( ) intersection over predecessors

Analysis framework Analysis framework ∩ ∪ So, given a single algorithm for iterative pred AVAIL RD solution of data-flow equations of this form, we may compute all these analyses and any others which fit into the framework. succ VBE LVA

...and others

Summary • Expression availability is a data-flow property • Available expression analysis (AVAIL) is a forwards data-flow analysis for determining expression Lecture 5: availability • AVAIL may be expressed as a pair of complementary Data-flow anomalies data-flow equations, which may be combined and clash graphs • A simple iterative algorithm can be used to find the largest solution to the AVAIL data-flow equations • AVAIL and LVA are both instances (among others) of the same data-flow analysis framework Motivation Optimisation vs. debugging

Data-flow anomalies may manifest themselves in Both human- and computer-generated programs different ways: some may actually “break” the program sometimes contain data-flow anomalies. (make it crash or exhibit undefined behaviour), others may just make the program “worse” (make it larger or These anomalies result in the program being slower than necessary). worse, in some sense, than it was intended to be. Any compiler needs to be able to report when a Data-flow analysis is useful in locating, and program is broken (i.e. “compiler warnings”), so the sometimes correcting, these code anomalies. identification of data-flow anomalies has applications in both optimisation and bug elimination. Dead code Dead code

⋯ ⋯ { x, y, z } Dead code is a simple example of a data-flow a = x + 11; { a, y, z } anomaly, and LVA allows us to identify it. b = y + 13; { a, b, z } Recall that code is dead when its result goes c DEAD c = a * b; unused; if the variable is not live on exit { z } x ⋯ from an instruction which assigns some value ⋯ { z } to x, then the whole instruction is dead. print z; { }

Dead code Dead code

⋯ ⋯ { x, y, z } a = x + 11; { a, y, z } b = y + 13; For this kind of anomaly, an automatic remedy is not { a, b, z } only feasible but also straightforward: dead code with c = a * b; no live side effects is useless and may be removed. { z } ⋯ ⋯ { z } print z; { }

Dead code Dead code

⋯ ⋯ { x, y, z } a = x + 11; { ay, yz, }z } b = y + 13; The program resulting from this transformation { a, b, z } will remain correct and will be both smaller and c = a * b; faster than before (cf. just smaller in unreachable { z } ⋯ code elimination), and no programmer ⋯ intervention is required. { z } print z; { } Successive iterations may yield further improvements. Uninitialised variables Uninitialised variables

In some languages, for example C and our 3-address intermediate code, it is syntactically legitimate for a This kind of behaviour is often undesirable, so we program to read from a variable before it has would like a compiler to be able to detect and warn definitely been initialised with a value. of the situation.

If this situation occurs during execution, the effect of Happily, the liveness information collected by LVA the read is usually undefined and depends upon allows a compiler to see easily when a read from an unpredictable details of implementation and undefined variable is possible. environment. Uninitialised variables Uninitialised variables { } x = 11; { x } y = 13; In a “healthy” program, variable liveness produced by { x, y } later instructions is consumed by earlier ones; if an z = 17; { x, y } instruction demands the value of a variable (hence ⋯ making it live), it is expected that an earlier instruction ⋯ { x, y } will define that variable (hence making it dead again). ✓ print x; { y } print y; { }

Uninitialised variables Uninitialised variables { z } z LIVE x = 11; { x, z } y = 13; If any variables are still live at the beginning of a { x, y, z } program, they represent uses which are potentially ⋯ ⋯ unmatched by corresponding definitions, and hence { x, y, z } indicate a program with potentially undefined (and print x; ✗ { y, z } therefore incorrect) behaviour. print y; { z } print z; { }

Uninitialised variables Uninitialised variables { x} }∪ {x x LIVE } if (p) { { } x = 42; In this situation, the compiler can issue a warning: { x } “variable z may be used before it is initialised”. } { x } ⋯ ⋯ However, because LVA computes a safe (syntactic) { x } ∪ { } overapproximation of variable liveness, some of these ✗ if (p) { compiler warnings may be (semantically) spurious. { x } print x; { } } { } Uninitialised variables Uninitialised variables

Here the analysis is being too safe, and the warning is unnecessary, but this imprecision is the nature of our Although dead code may easily be remedied by the computable approximation to semantic liveness. compiler, it’s not generally possible to automatically fix the problem of uninitialised variables. So the compiler must either risk giving unnecessary warnings about correct code (“false positives”) or As just demonstrated, even the decision as to failing to give warnings about incorrect code (“false whether a warning indicates a genuine problem negatives”). Which is worse? must often be made by the programmer, who must also fix any such problems by hand. Opinions differ. Uninitialised variables Uninitialised variables

Note that higher-level languages have the concept of int x = 5; (possibly nested) scope, and our expectations for int y = 7; if (p) { variable initialisation in“healthy” programs can be { x, y, z } z LIVE extended to these. int z; ⋯ In general we expect the set of live variables at the ✗ print z; beginning of any scope to not contain any of the } variables local to that scope. print x+y;

Write-write anomalies Write-write anomalies

While LVA is useful in these cases, some similar data-flow anomalies can only be spotted with a different analysis. A simple data-flow analysis can be used to track which variables may have been written but not Write-write anomalies are an example of this. They occur yet read at each node. when a variable may be written twice with no intervening read; the first write may then be considered unnecessary In a sense, this involves doing LVA in reverse in some sense. (i.e. forwards!): at each node we should remove all variables which are referenced, then add all x = 11; variables which are defined. x = 13; print x;

Write-write anomalies Write-write anomalies { } x = 11; in-wnr(n) = out-wnr(p) { } ! y is also x p∈pred(n) y = 13; dead here. { x, y } out-wnr(n) = in-wnr(n) \ ref (n) ∪ def (n) z = 17; ! " { x, y, z } ⋯ ⋯ { x, y, z } y is rewritten print x; wnr(n) = (wnr(p) \ ref (p)) ∪ def (p) { y, z } ! " # here without p∈pred(n) y = 19; ever having { y, z } been read. ⋯ ⋯

Write-write anomalies Write-write anomalies

x = 11; if (p) { But, although the second write to a variable x = 13; may turn an earlier write into dead code, the } presence of a write-write anomaly doesn’t print x; necessarily mean that a variable is dead — hence the need for a different analysis. x is live throughout this code, but if p is true during execution, x will be written twice before it is read. In most cases, the programmer can remedy this. Write-write anomalies Write-write anomalies if (p) { if (p) { x = 13; x = 13; } } else { if (!p) { x = 11; x = 11; } } print x; print x; Again, the analysis may be too approximate to notice This code does the same job, but avoids writing to x that a particular write-write anomaly may never occur twice in succession on any control-flow path. during any execution, so warnings may be inaccurate.

Write-write anomalies Clash graphs

As with uninitialised variable anomalies, the The ability to detect data-flow anomalies is a programmer must be relied upon to investigate nice compiler feature, but LVA’s main utility is in the compiler’s warnings and fix any genuine deriving a data structure known as a clash graph problems which they indicate. (aka interference graph).

Clash graphs Clash graphs

x = (a*b) + c; y = (a*b) + d; When generating intermediate code it is convenient to simply invent as many variables as lex, parse, translate necessary to hold the results of computations; the extreme of this is “normal form”, in which a new temporary variable is used on each occasion MUL t1,a,b that one is required, with none being reused. ADD x,t1,c MUL t2,a,b ADD y,t2,d

Clash graphs Clash graphs

Before we can work on improving the situation, we This makes generating 3-address code as straightforward must collect information about which variables actually as possible, and assumes an imaginary target machine need to be allocated to different registers on the target with an unlimited supply of “virtual registers”, one to machine, as opposed to having been incidentally placed hold each variable (and temporary) in the program. in different registers by our translation to normal form.

Such a naïve strategy is obviously wasteful, however, and LVA is useful here because it can tell us which variables won’t generate good code for a real target machine. are simultaneously live, and hence must be kept in separate virtual registers for later retrieval. Clash graphs Clash graphs

MOV x,#11 { } x = 11; MOV y,#13 { x } y = 13; ADD t1,x,y { x, y } z = (x+y) * 2; MUL z,t1,#2 { t1 } a = 17; MOV a,#17 { z } b = 19; MOV b,#19 { a, z } z = z + (a*b); MUL t2,a,b { a, b, z } ADD z,z,t2 { t2, z }

Clash graphs Clash graphs

{ } { x } x y t1 This graph shows us, In a program’s clash gra ph ther e is { x, y } for example, that a, b one node for each vir tual r egister { t1 } and z must all be kept and an edge between nodes when t2 a b in separate registers, { } their two registers ar e e v er z but that we may reuse simultaneousl y liv e . { a, z } those registers for the { a, b, z } z other variables. { t2, z }

Clash graphs Summary

• Data-flow analysis is helpful in locating (and MOV x,#11a,#11 sometimes correcting) data-flow anomalies x y t1 MOVThis gray,#13b,#13ph shows us, ADDfor example t1,x,ya ,a, that,b a, b • LVA allows us to identify dead code and possible andMUL z mz,t1,#2z,usta all ,#2 be kept uses of uninitialised variables t2 a b in separate registers, MOV a,#17 Write-write anomalies can be identified with a but that we may reuse • similar analysis thoseMOV rb,#19egisters for the z MULother t2,a,ba variables.,a,b • Imprecision may lead to overzealous warnings ADD z,z,t2z,z,a • LVA allows us to construct a clash graph

Motivation

Normal form is convenient for intermediate code. Lecture 6: However, it’s extremely wasteful. Real machines only have a small finite number of registers, so at some stage we need to analyse and transform the intermediate representation of a program so that it only requires as many (physical) registers as are really available.

This task is called register allocation. Graph colouring

Register allocation depends upon the solution of a closely related problem known as graph colouring.

Graph colouring Graph colouring

Graph colouring Graph colouring

For general (non-planar) graphs, however, four colours are not sufficient; there is no bound on how many may be required.

Graph colouring Graph colouring

red red green green blue blue ✗ yellow ✓ yellow purple ? brown Allocation by colouring Allocation by colouring

This is essentially the same problem that MOV x,#11 we wish to solve for clash graphs. MOV y,#13 x y t1 • How many colours (i.e. physical registers) are ADD t1,x,y necessary to colour a clash graph such that no two MUL z,t1,#2 t2 a b connected vertices have the same colour (i.e. such MOV a,#17 that no two simultaneously live virtual registers are MOV b,#19 stored in the same physical register)? MUL t2,a,b z • What colour should each vertex be? ADD z,z,t2

Allocation by colouring Algorithm

MOV x,#11xr0,#11,#11 x y MOV y,#13yr1,#13,#13 t1 Finding the minimal colouring for a graph is NP-hard, and ADD t1,x,yt1,t1r0,xr0,y,y,r1 therefore difficult to do efficiently. MUL z,t1,#2z,zr2,t1,r0,#2,#2 t2 a b MOV a,#17ar0,#17,#17 However, we may use a simple heuristic algorithm which MOV b,#19br1,#19,#19 chooses a sensible order in which to colour vertices and MUL t2,a,bt2r0,a,b,ar0,b,b,r1 z usually yields satisfactory results on real clash graphs. ADD z,z,t2z,z,zr2,z,,r2t2,r0

Algorithm Algorithm d Choose a vertex (i.e. virtual register) which has • c the least number of incident edges (i.e. clashes). x b • Remove the vertex and its edges from the graph, a b and push the vertex onto a LIFO stack. a w y Repeat until the graph is empty. w • c d • Pop each vertex from the stack and colour it in y the most conservative way which avoids the z z colours of its (already-coloured) neighbours. x

Algorithm Algorithm

Bear in mind that this is only a heuristic. x r0 z a b a x r1 y w y x c d r2 b c y z c z r3 b a Algorithm Algorithm

Bear in mind that this is only a heuristic. Bear in mind that this is only a heuristic.

a x a x

b c y z b c y z

A better (more minimal) colouring may exist.

Spilling Spilling

The quantity of physical registers is strictly limited, but it is usually reasonable to assume that This algorithm tries to find an approximately fresh memory locations will always be available. minimal colouring of the clash graph, but it assumes new colours are always available when required. So, when the number of simultaneously live values exceeds the number of physical registers, In reality we will usually have a finite number of we may spill the excess values into memory. colours (i.e. physical registers) available; how should the algorithm cope when it runs out of colours? Operating on values in memory is of course much slower, but it gets the job done.

Spilling Algorithm • Choose a vertex with the least number of edges. ADD a,b,c • If it has fewer edges than there are colours, vs. • remove the vertex and push it onto a stack, • otherwise choose a register to spill — e.g. the LDR t1,#0xFFA4 least-accessed one — and remove its vertex. LDR t2,#0xFFA8 ADD t3,t1,t2 • Repeat until the graph is empty. STR t3,#0xFFA0 • Pop each vertex from the stack and colour it. • Any uncoloured vertices must be spilled.

Algorithm Algorithm

w x x d a b a b c w y w y y c d c d z z z x

a: 3, b: 5, c: 7, d: 11, w: 13, x: 17, y: 19, z: 23 Algorithm Algorithm

Choosing the right virtual register to spill will x result in a faster, smaller program. a b r0 The static count of “how many accesses?” is a good start, but doesn’t take account of more w y r1 complex issues like loops and simultaneous c d a and b liveness with other spilled values. spilled to memory z One easy heuristic is to treat one static access inside a loop as (say) 4 accesses; this generalises to 4n accesses inside a loop nested to level n.

Algorithm Algorithm

“Slight lie”: when spilling to memory, we (normally) need When we are popping vertices from the stack and one free register to use as temporary storage for values assigning colours to them, we sometimes have more loaded from and stored back into memory. than one colour to choose from. If any instructions operate on two spilled values If the program contains an instruction “MOV a,b” then simultaneously, we will need two such temporary storing a and b in the same physical register (as long as registers to store both values. they don’t clash) will allow us to delete that instruction. So, in practise, when a spill is detected we may need to restart register allocation with one (or two) fewer We can construct a preference graph to show which physical registers available so that these can be kept free pairs of registers appear together in MOV instructions, for temporary storage of spilled values. and use it to guide colouring decisions.

Non-orthogonal instructions Non-orthogonal instructions

We have assumed that we are free to choose physical We can handle the situation tidily by pre-allocating a registers however we want to, but this is simply not virtual register to each of the target machine’s physical the case on some architectures. registers, e.g. keep v0 in r0, v1 in r1, ..., v31 in r31. • The x86 MUL instruction expects one of its arguments When generating intermediate code in normal form, we in the AL register and stores its result into AX. avoid this set of registers, and use new ones (e.g. v32, v33, ...) for temporaries and user variables. • The VAX MOVC3 instruction zeroes r0, r2, r4 and r5, storing its results into r1 and r3. In this way, each physical register is explicitly represented by a unique virtual register. We must be able to cope with such irregularities.

Non-orthogonal instructions Non-orthogonal instructions

We must now do extra work when generating If (hypothetically) ADD on the target architecture intermediate code: can only perform r0 = r1 + r2: • When an instruction requires an operand in a specific physical register (e.g. x86 MUL), we generate a MOV v32,#19 MOV v33,#23 preceding MOV to put the right value into the x = 19; corresponding virtual register. MOV v1,v32 y = 23; MOV v2,v33 • When an instruction produces a result in a specific z = x + y; physical register (e.g. x86 MUL), we generate a trailing ADD v0,v1,v2 MOV to transfer the result into a new virtual register. MOV v34,v0 Non-orthogonal instructions Non-orthogonal instructions

This may seem particularly wasteful, but many of This may seem particularly wasteful, but many of the MOV instructions will be eliminated during the MOV instructions will be eliminated during register allocation if a preference graph is used. register allocation if a preference graph is used. MOV v32,#19v32r1,#19,#19 v34 v32 v33 v34 v32 v33 MOV v33,#23v33r2,#23,#23 v34 v32 v33 MOV v1,v32v1r1,v32r1 v0 v1 v2 v0 v1 v2 MOV v2,v33v2r2,v33r2 v0 v1 v2 ADD v0,v1,v2v0r0,v1r1,v2r2 preference graph clash graph MOV v34,v0v34r0,,r0v0 clash graph

Non-orthogonal instructions Non-orthogonal instructions

And finally, If (hypothetically) MUL on the target architecture corrupts the contents of r0: • When we know an instruction is going to corrupt the contents of a physical register, we insert an edge on the clash graph between the corresponding MOV v32,#6 v32 v33 v34 virtual register and all other virtual registers live at MOV v33,#7 that instruction — this prevents the register MUL v34,v32,v33 v0 v1 v2 allocator from trying to store any live values in the ⋯ corrupted register. clash graph

Non-orthogonal instructions Procedure calling standards

If (hypothetically) MUL on the target architecture This final technique of synthesising edges on the clash corrupts the contents of r0: graph in order to avoid corrupted registers is helpful for dealing with the procedure calling standard of the target architecture. v32 v33 v34 MOV v32,#6v32r1,#6,#6 Such a standard will usually dictate that procedure calls MOV v33,#7v33r2,#7,#7 (e.g. CALL and CALLI instructions in our 3-address MUL v34,v32,v33v34r0,,r1v32,r2,v33 v0 v1 v2 code) should use certain registers for arguments and ⋯ results, should preserve certain registers over a call, clash graph and may corrupt any other registers if necessary.

Procedure calling standards Procedure calling standards

Since a procedure call instruction may corrupt some of the registers (r0-r3, r9, and r12-r15 on the On the ARM, for example: ARM), we can synthesise edges on the clash graph • Arguments should be placed in r0-r3 before a between the corrupted registers and all other procedure is called. virtual registers live at the call instruction. • Results should be returned in r0 and r1. As before, we may also synthesise MOV instructions • r4-r8, r10, r11 and r13 should be preserved to ensure that arguments and results end up in the over procedure calls. correct registers, and use the preference graph to guide colouring such that most of these MOVs can be deleted again. Procedure calling standards Procedure calling standards

MOV v32,#7 MOV v32,#7 MOV v33,#11 MOV v33,#11 x = 7; MOV v34,#13 v32 v33 v34 v35 v36 MOV v34,#13 y = 11; MOV v0,v32 MOV v0,v32 z = 13; MOV v1,v33 MOV v1,v33 a = f(x,y)+z; CALL f v0 v1 v2 v3 v9 CALL f MOV v35,v0 MOV v35,v0 ADD v36,v34,v35 v4 v5 ... ADD v36,v34,v35

Procedure calling standards Summary • A register allocation phase is required to assign each MOV v32,#7v32r0,#7,#7 virtual register to a physical one during compilation MOV v33,#11v33r1,#11,#11 • Registers may be allocated by colouring the vertices v32 v33 v34 v35 v36 MOV v34,#13v34r4,#13,#13 of a clash graph MOV v0,v32v0r0,v32r0 • When the number of physical registers is limited, MOV v1,v33v1r1,v33r1 some virtual registers may be spilled to memory v0 v1 v2 v3 v9 CALL f MOV v35,v0v35r0,,r0v0 • Non-orthogonal instructions may be handled with v4 v5 ... ADD v36,v34,v35v36r0,,r4v34,r0,v35 additional MOVs and new edges on the clash graph • Procedure calling standards are also handled this way

Motivation

Some expressions in a program may cause redundant recomputation of values. Lecture 7: If such recomputation is safely eliminated, the program Redundancy elimination will usually become faster.

There exist several redundancy elimination optimisations which attempt to perform this task in different ways (and for different specific meanings of “redundancy”).

Common subexpressions Common subexpressions

Common-subexpression elimination is a transformation Recall that an expression is available at an instruction if its which is enabled by available expression analysis (AVAIL), value has definitely already been computed and not been in the same way as LVA enables dead-code elimination. subsequently invalidated by assignments to any of the variables occurring in the expression. Since AVAIL discovers which expressions will have been computed by the time control arrives at an instruction in If the expression e is available on entry to an instruction the program, we can use this information to spot and which computes e, the instruction is performing a remove redundant computations. redundant computation and can be modified or removed. Common subexpressions Common subexpressions

We consider this redundantly-computed expression to be a common subexpression: it is common to more than one instruction in the program, and in each of its occurrences it may appear as a subcomponent of We can eliminate a common subexpression by some larger expression. storing its value into a new temporary variable when it is first computed, and reusing that variable later when the same value is required. x = (a*b)+c; ⋯ print a * b; a*b AVAILABLE

Algorithm Algorithm

• Find a node n which computes an already-available expression e a = y * z b = y * z c = y * z • Replace the occurrence of e with a new temporary variable t • On each control path backwards from n, find the first instruction calculating e and add a new instruction to store its value into t x = y * z y*z AVAILABLE • Repeat until no more redundancy is found

Algorithm Common subexpressions

Our transformed program performs (statically) at = y * z bt = y * z ct = y * z a = t b = t c = t fewer arithmetic operations: y*z is now computed in three places rather than four.

However, three register copy instructions have also been generated; the program is now larger, and whether it is faster depends upon x = t characteristics of the target architecture.

Common subexpressions Copy propagation

This simple formulation of CSE is fairly careless, and The program might have “got worse” as a result of assumes that other compiler phases are going to tidy performing common-subexpression elimination. up afterwards.

In particular, introducing a new variable increases In addition to register allocation, a transformation register pressure, and might cause spilling. called copy propagation is often helpful here.

Memory loads and stores are much more expensive In copy propagation, we scan forwards from an x=y than multiplication of registers! instruction and replace x with y wherever it appears (as long as neither x nor y have been modified). Copy propagation Copy propagation a = y * z at3 = =y y* *z z a = t3

b = y * z bt2 = =y t3* z b = t2

c = y * z ct1 = =y t2* z c = t1

d = y * z d = t1 Copy propagation Code motion t3 = y * z a = t3 Transformations such as CSE are known collectively as code motion transformations: they operate by moving instructions and computations around t2 = t3 programs to take advantage of opportunities b = t3 identified by control- and data-flow analysis.

t1 = t3 Code motion is particularly useful in eliminating c = t3 different kinds of redundancy. It’s worth looking at other kinds of code motion. d = t3 Code hoisting Code hoisting

Code hoisting reduces the size of a program by moving duplicated expression computations to the same place, x = 19 x+y VERY BUSY where they can be combined into a single instruction. y = 23 Hoisting relies on a data-flow analysis called very busy expressions (a backwards version of AVAIL) which finds expressions that are definitely going to be evaluated later in the program; these can be moved earlier and possibly a = x + y b = x + y combined with each other.

Code hoisting Code hoisting

x = 19 y = 23 Hoisting may have a different effect on execution t1 = x + y time depending on the exact nature of the code. The resulting program may be slower, faster, or just the same speed as before. a = t1 b = t1 Loop-invariant code motion Loop-invariant code motion

Some expressions inside loops are redundant in the a = ...; sense that they get recomputed on every iteration even b = ...; though their value never changes within the loop. while (...) { x = a + b; Loop-invariant code motion recognises these redundant ... computations and moves such expressions outside of } loop bodies so that they are only evaluated once. print x;

Loop-invariant code motion Loop-invariant code motion

This transformation depends upon a data-flow analysis a = ...; to discover which assignments may affect the value of a b = ...; variable (“reaching definitions”). whilex = a (...)+ b; { while x = a(...) + b; { If none of the variables in the expression are redefined ... inside the loop body (or are only redefined by } computations involving other invariant values), the print x; expression is invariant between loop iterations and may safely be relocated before the beginning of the loop.

Partial redundancy Partial redundancy

Partial redundancy elimination combines common- subexpression elimination and loop-invariant a = ...; code motion into one optimisation which b = ...; improves the performance of code. while (...) { ... = a + b; An expression is partially redundant when it is a = ...; computed more than once on some (vs. all) paths ... = a + b; through a flowgraph; this is often the case for } code inside loops, for example.

Partial redundancy Partial redundancy

a = ...; ab = ...; This example gives a faster program of the same size. b... = ...;= a + b; while (...) { Partial redundancy elimination can be achieved in its ... = a + b; own right using a complex combination of several a = ...; forwards and backwards data-flow analyses in order to ... = a + b; locate partially redundant computations and discover } the best places to add and delete instructions. Putting it all together Putting it all together

a = x + y; ADD a,x,y b = x + y; ADD b,x,y r = z; MOV r,z ADD a,x,y if (a == 42) { ADD b,x,y x+y COMMON r = a + b; ADD r,a,b MOV r,z s = x + y; ADD s,x,y } else { s = a + b; ADD s,a,b ADD r,a,b ADD s,a,b } ADD s,x,y t = b + r; ADD t,b,r u = x + y; ADD u,x,y ⋯ return r+s+t+u; ADD t,b,r ADD u,x,y

Putting it all together Putting it all together

ADD t3,x,y MOV t2,t3 ADD t1,x,y MOV t1,t2 MOV a,t1 MOV a,t1 COPIES OF t3 MOV b,t1 MOV b,t1 MOV r,z MOV r,z

ADD r,a,b ADD r,a,b ADD s,a,b ADD s,a,b ADD s,x,y MOV s,t2 x+y COMMON ADD t,b,r ADD t,b,r ADD u,x,y x+y COMMON MOV u,t3

Putting it all together Putting it all together

ADD t3,x,y ADD t3,x,y MOV t2,t3 MOV t2,t3 MOV t1,t3 MOV t1,t3 t1, t2 DEAD MOV a,t3 MOV a,t3 MOV b,t3 MOV b,t3 MOV r,z MOV r,z

ADD r,a,b ADD r,t3,t3 ADD s,a,b ADD s,t3,t3 MOV s,t2 MOV s,t3

COPIES OF t3 ADD t,b,r ADD t,t3,r MOV u,t3 MOV u,t3

Putting it all together Putting it all together

ADD t3,x,y ADD t3,x,y MOV a,t3 MOV a,t3 MOV b,t3 MOV b,t3 a, b DEAD t3+t3 VERY BUSY MOV r,z MOV r,z ADD t4,t3,t3

ADD r,t3,t3 MOV r,t4 ADD s,t3,t3 MOV s,t4 MOV s,t3 MOV s,t3

ADD t,t3,r ADD t,t3,r MOV u,t3 MOV u,t3 Putting it all together Summary

• Some optimisations exist to reduce or remove redundancy in programs ADD t3,x,y One such optimisation, common-subexpression MOV r,z • ADD t4,t3,t3 elimination, is enabled by AVAIL

MOV r,t4 Copy propagation makes CSE practical MOV s,t4 • MOV s,t3 • Other code motion optimisations can also help to reduce redundancy ADD t,t3,r These optimisations work together to improve code MOV u,t3 •

Motivation

Intermediate code in normal form permits maximum Lecture 8: flexibility in allocating temporary variables to physical registers. Static single-assignment; This flexibility is not extended to user variables, and sometimes more registers than necessary will be used.

Register allocation can do a better job with user variables if we first translate code into SSA form.

Live ranges Live ranges

User variables are often reassigned and reused extern int f(int); many times over the course of a program, so that extern void h(int,int); they become live in many different places. void g() { Our intermediate code generation scheme assumes int a,b,c; that each user variable is kept in a single virtual a = f(1); b = f(2); h(a,b); register throughout the entire program. b = f(3); c = f(4); h(b,c); c = f(5); a = f(6); h(c,a); This results in each virtual register having a large } live range, which is likely to cause clashes.

Live ranges Live ranges

a = f(1); b = f(2); h(a,b);h(a,b);b); a We may remedy this situation by performing a b = f(3); transformation called live range splitting, in which live c = f(4); ranges are made smaller by using a different virtual h(b,c);h(b,c); register to store a variable’s value at different times, b c thus reducing the potential for clashes. c = f(5); a = f(6); 3 registers needed h(c,a);h(c,h(c,a); Live ranges Live ranges

a1 = f(1); extern int f(int); b2 = f(2); extern void h(int,int); h(a1,b2);h(a1,b2);,b2); a1 b2 void g() { b1 = f(3); int a,b,c;a1,a2, b1,b2, c1,c2; c2 = f(4); b1 c2 aa1 = =f(1); f(1); b b2= f(2);= f(2); h(a,b); h(a1,b2); h(b1,c2);h(b1,c2);,c2); bb1 = =f(3); f(3); c c2= f(4);= f(4); h(b,c); h(b1,c2); c1 a2 cc1 = =f(5); f(5); a a2= f(6);= f(6); h(c,a); h(c1,a2); c1 = f(5); } a2 = f(6); 2 registers needed h(c1,a2);h(c1,a2);,a2);

Static single-assignment Static single-assignment

Live range splitting is a useful transformation: it gives the It is straightforward to transform straight-line same benefits for user variables as normal form gives code into SSA form: each variable is renamed by for temporary variables. being given a subscript, which is incremented every time that variable is assigned to. However, if each virtual register is only ever assigned to once (statically), we needn’t perform live range splitting, v1 = 3; since the live ranges are already as small as possible. v2 = v1 + 1; Code in static single-assignment (SSA) form has this v3 = v2 + w;1 important property. w2 = v3 + 2; Static single-assignment Static single-assignment

v1 = 3; When the program’s control flow is more complex, extra effort is required to retain the original data-flow behaviour. v2 = v1 + 1; v4 = v1 - 1; Where control-flow edges meet, two (or v3 = v2 + w; more) differently-named variables must now be merged together. w = v * 2; ?

Static single-assignment Static single-assignment

v1 = 3;

The ϕ-functions in SSA keep track of which variables are merged at control-flow join points. v2 = v1 + 1; v4 = v1 - 1; v3 = v2 + w;1 They are not executable since they do not record which variable to choose (cf. gated SSA form).

v5 = ϕ(v3 ,v4 ); w2 = v5 * 2; Static single-assignment Phase ordering

We now have many optimisations which we can “Slight lie”: SSA is useful for much more than register perform on intermediate code. allocation! It is generally a difficult problem to decide in which In fact, the main advantage of SSA form is that, by order to perform these optimisations; different orders representing data dependencies as precisely as may be more appropriate for different programs. possible, it makes many optimising transformations simpler and more effective, e.g. constant propagation, Certain optimisations are antagonistic: for example, loop-invariant code motion, partial-redundancy CSE may superficially improve a program at the elimination, and strength reduction. expense of making the register allocation phase more difficult (resulting in spills to memory).

Higher-level optimisations Higher-level optimisations

character stream • More modern optimisations than those in Part A Part A was mostly imperative token stream • • Part B is mostly functional optimisation parse tree • Now operating on syntax of source language vs. an intermediate representation optimisation intermediate code decompilation • Functional languages make the presentation clearer, but many optimisations will also be applicable to imperative programs optimisation target code

Algebraic identities Algebraic identities

... e + 0 ... The idea behind peephole optimisation of intermediate code can also be applied to abstract syntax trees. ... e ... There are many trivial examples where one piece of syntax is always (algebraically) equivalent to another piece of syntax which may be smaller or otherwise ... (e + n) + m ... “better”; simple rewriting of syntax trees with these rules may yield a smaller or faster program. ... e + (n + m) ... Algebraic identities Algebraic identities

In a lazy functional language,

These optimisations are boring, however, since they are let x = e in if e′ then ... x ... else e′′ always applicable to any syntax tree.

We’re interested in more powerful transformations if e′ then let x = e in ... x ... else e′′ which may only be applied when some analysis has confirmed that they are safe. provided e′ and e′′ do not contain x.

This is still quite boring. Strength reduction Strength reduction

For example, it may be advantageous to More interesting analyses (i.e. ones that aren’t purely replace multiplication (2*e) with addition syntactic) enable more interesting transformations. (let x = e in x + x) as before. Strength reduction is an optimisation which replaces Multiplication may happen a lot inside loops expensive operations (e.g. multiplication and division) (e.g. using the loop variable as an index into with less expensive ones (e.g. addition and subtraction). an array), so if we can spot a recurring multiplication and replace it with an addition It is most interesting and useful when done inside loops. we should get a faster program.

Strength reduction Strength reduction

int i; int i; char *p; for (i = 0; i < 100; i++) for (i = 0; i < 100; i++) { { v[i] = 0; p = (char *)v + 4*i; } p[0] = 0; p[1] = 0; p[2] = 0; p[3] = 0; }

Strength reduction Strength reduction

int i; char *p; int i; char *p; p = (char *)v; for (i = 0; i < 100; i++) for (i = 0; i < 100; i++) { { p = (char *)v + 4*i; p[0] = 0; p[1] = 0; p[0] = 0; p[1] = 0; p[2] = 0; p[3] = 0; p[2] = 0; p[3] = 0; p += 4; } }

Strength reduction Strength reduction

int i; char *p; int i; int *p; p = (char *)v; p = v; for (i = 0; p < (char *)v + 400; i++) for (i = 0; p < v + 100; i++) { { p[0] = 0; p[1] = 0; *p = 0; p[2] = 0; p[3] = 0; p++; p += 4; } } Strength reduction Strength reduction

int i; int *p; p = v; int *p; for (i = 0; p < v + 100; i++) for (p = v; p < v + 100; p++) { { *p = 0; *p = 0; p++; } } Multiplication has been replaced with addition.

Strength reduction Strength reduction

Note that, while this code is now almost optimal, it We are not restricted to replacing multiplication has obfuscated the intent of the original program. with addition, as long as we have

Don’t be tempted to write code like this! • : i = i ⊕ c

For example, when targeting a 64-bit architecture, • another variable: j = c2 ⊕ (c1 ⊗ i) the compiler may be able to transform the original loop into fifty 64-bit stores, but will have trouble for some operations ⊕ and ⊗ such that with our more efficient version. x ⊗ (y ⊕ z) = (x ⊗ y) ⊕ (x ⊗ z)

Strength reduction Summary

• Live range splitting reduces register pressure • In SSA form, each variable is assigned to only once It might be easier to perform strength reduction on the SSA uses ϕ-functions to handle control-flow merges intermediate code, but only if annotations have been • placed on the flowchart to indicate loop structure. • SSA aids register allocation and many optimisations Optimal ordering of compiler phases is difficult At the syntax tree level, all loop structure is apparent. • • Algebraic identities enable code improvements • Strength reduction uses them to improve loops

Motivation

We reason about programs statically, but we are really trying to make predictions about their dynamic behaviour.

Lecture 9: Why not examine this behaviour directly?

Abstract interpretation It isn’t generally feasible (e.g. termination, inputs) to run an entire computation at compile-time, but we can find things out about it by running a simplified version.

This is the basic idea of abstract interpretation. Abstract interpretation Abstract interpretation

The key idea is to use an abstraction: a model of (otherwise unmanageable) reality, which Warning: this will be a heavily simplified view • discards enough detail that the model becomes of abstract interpretation; there is only time to manageable (e.g. small enough, computable give a brief introduction to the ideas, not enough), but explore them with depth or rigour. • retains enough detail to provide useful insight into the real world.

Abstract interpretation Abstract interpretation

For example, to plan a trip, you might use a map. Crucially, a road map is a useful abstraction because the • A road map sacrifices a lot of detail — route you plan is probably still valid back in reality. • trees, road conditions, individual buildings; • A cartographer creates an abstraction of reality (a map), • an entire dimension — • you perform some computation on that • but it retains most of the information which is abstraction (plan a route), important for planning a journey: • and then you transfer the result of that • place names; computation back into the real world (drive to • roads and how they are interconnected. your destination). Abstract interpretation Multiplying integers

A canonical example is the multiplication of integers. Trying to plan a journey by exploring the concrete world instead of the abstraction (i.e. driving around aimlessly) is If we want to know whether −1515 × 37 is positive either very expensive or virtually impossible. or negative, there are two ways to find out: • Compute in the concrete world (arithmetic), A trustworthy map makes it possible — even easy. using the standard interpretation of multiplication. −1515 × 37 = −56055, which is negative. This is a real application of abstract intepretation, but in this course we’re more interested in computer programs. • Compute in an abstract world, using an abstract interpretation of multiplication: call it ⊗.

Multiplying integers Multiplying integers

In this example the magnitudes of the numbers are We need to define the abstract operator ⊗. insignificant; we care only about their sign, so we can Luckily, we have been to primary school. use this information to design our abstraction. ⊗ (−) = { z ∈ ℤ | z < 0 } (−) (0) (+) (0) = { 0 } (−) (+) (0) (−) (+) = { z ∈ ℤ | z > 0 } (0) (0) (0) (0) In the concrete world we have all the integers; in the abstract world we have only the values (−), (0) and (+). (+) (−) (0) (+) Multiplying integers Multiplying integers

Armed with our abstraction, we can now tackle the This is just a toy example, but it demonstrates the original problem. methodology: state a problem, devise an abstraction that retains the characteristics of that problem, solve the − − abs( 1515) = ( ) problem in the abstract world, and then interpret the abs(37) = (+) solution back in the concrete world. (−) ⊗ (+) = (−) This abstraction has avoided doing arithmetic; in compilers, we will mostly be interested in avoiding So, without doing any concrete computation, we have expensive computation, nontermination or undecidability. discovered that −1515 × 37 has a negative result.

Safety Safety We consider a particular abstraction to be safe if, whenever a property is true in the abstract world, it must also be true in the concrete world. As always, there are important safety issues. Our multiplication example is actually quite Because an abstraction discards detail, a computation in precise, and therefore trivially safe: the magnitudes the abstract world will necessarily produce less precise of the original integers are irrelevant, so when the results than its concrete counterpart. abstraction says that the result of a multiplication will be negative, it definitely will be. It is important to ensure that this imprecision is safe. In general, however, abstractions will be more approximate than this. Adding integers Adding integers

A good example is the addition of integers. When adding integers, their (relative) magnitudes are How do we define the abstract operator ⊕? important in determining the sign of the result, but our abstraction has discarded this information. ⊕ (−) (0) (+) As a result, we need a new abstract value: (?).

(−) (−) (−) (?) (−) = { z ∈ ℤ | z < 0 } (0) (−) (0) (+) (0) = { 0 } (+) = { z ∈ ℤ | z > 0 } (+) (?) (+) (+) (?) = ℤ

Adding integers Adding integers

abs(19 + 23) = abs(42) = (+)

(?) is less precise than (–), (0) and (+); it means “I don’t know”, or “it could be anything”. 19, 23 + 42 Because we want the abstraction to be safe, we must put up with this weakness. abs

(+) Adding integers Adding integers

abs(19 + 23) = abs(42) = (+) abs(–1515 + 37) = abs(–1478) = (–) abs(19) ⊕ abs(23) = (+) ⊕ (+) = (+) ☺

19, 23 + 42 –1515, 37 + –1478

abs abs abs

(+), (+) (+) (–) ⊕ Adding integers Safety

abs(–1515 + 37) = abs(–1478) = (–) Here, safety is represented by the fact that (–) ⊆ (?): abs(–1515) ⊕ abs(37) = (–) ⊕ (+) = (?) ☹ { z ∈ ℤ | z < 0 } ⊆ ℤ –1515, 37 + –1478 The result of doing the abstract computation is less precise, but crucially includes the result of doing the abs abs concrete computation (and then abstracting), so the abstraction is safe and hasn’t missed anything. (–), (+) (?) ⊕ Abstraction Abstraction

Formally, an abstraction of some concrete domain D concrete abstract (e.g. ℘(ℤ)) consists of

• an abstract domain D# (e.g. { (–), (0), (+), (?) }), γ • an abstraction function α : D → D# (e.g. abs), and • a concretisation function γ : D# → D, e.g.: α • (–) ↦ { z ∈ ℤ | z < 0 }, • (0) ↦ { 0 }, etc.

Abstraction Abstraction So, for D = ℘(ℤ) and D# = { (–), (0), (+), (?) }, we have: Given a function f from one concrete domain to ℤ × ℤ + ℤ another (e.g. +^ : ℘(ℤ) × ℘(ℤ) → ℘(ℤ)), we require an abstract function f # (e.g. ⊕) between the corresponding abstract domains. ^ D × D + D +^({ 2, 5 }, { –3, –7 }) = { –1, –5, 2, –2 } α1 γ1 α2 γ2 ⊕ ((+), (–)) = (?) ⊕ D# × D# D#

where α1,2 and γ1,2 are the appropriate abstraction and concretisation functions. Abstraction Summary

These mathematical details are formally important, but Abstractions are manageably simple models of are not examinable on this course. • unmanageably complex reality Abstract interpretation can get very theoretical, but • Abstract interpretation is a general technique for what’s significant is the idea of using an abstraction to executing simplified versions of computations safely model reality. • For example, the sign of an arithmetic result can be sometimes determined without doing any arithmetic Recognise that this is what we were doing in data- flow analysis: interpreting 3-address instructions as • Abstractions are approximate, but must be safe operations on abstract values — e.g. live variable sets Data-flow analysis is a form of abstract interpretation — and then “executing” this abstract program. •

Motivation

The operations and control structures of imperative languages are strongly influenced by the way most Lecture 10: real computer hardware works. This makes imperative languages relatively easy to compile, but (arguably) less expressive; many people Strictness analysis use functional languages, but these are harder to compile into efficient imperative machine code.

Strictness optimisation can help to improve the efficiency of compiled functional code.

Call-by-value evaluation Call-by-name evaluation

Strict (“eager”) functional languages (e.g. ML) Non-strict (“lazy”) functional languages (e.g. Haskell) use a call-by-value evaluation strategy: use a call-by-name evaluation strategy:

e2 ⇓ v2 e1[v2/x] ⇓ v1 e1[e2/x] ⇓ v (λx.e1) e2 ⇓ v1 (λx.e1) e2 ⇓ v

• Efficient in space and time, but • Only evaluates arguments when necessary, but • might evaluate more arguments than necessary. • copies (and redundantly re-evaluates) arguments.

Call-by-need evaluation Call-by-need evaluation

plus(x,y) = if x=0 then y else plus(x-1,y+1) One simple optimisation is to use call-by-need evaluation instead of call-by-name. Using call-by-value: If the language has no side-effects, duplicated plus(3,4) ! if 3=0 then 4 else plus(3-1,4+1) instances of an argument can be shared, evaluated ! plus(2,5) once if required, and the resulting value reused. ! plus(1,6) ! plus(0,7) This avoids recomputation and is better than call-by- ! 7 name, but is still more expensive than call-by-value. Call-by-need evaluation Replacing CBN with CBV plus(x,y) = if x=0 then y else plus(x-1,y+1) So why not just replace call-by-name with call-by-value? Using call-by-need: Because, while replacing call-by-name with call-by-need plus(3,4) ! if 3=0 then 4 else plus(3-1,4+1) never changes the semantics of the original program (in ! plus(3-1,4+1) the absence of side-effects), replacing CBN with CBV ! plus(2-1,4+1+1) does. ! plus(1-1,4+1+1+1) ! 4+1+1+1 ! 5+1+1 In particular, the program’s termination behaviour changes. ! 6+1 ! 7

Replacing CBN with CBV Neededness

Assume we have some nonterminating expression, !. Intuitively, it will be safe to use CBV in place of CBN whenever an argument is definitely going to be evaluated. • Using CBN, the expression (!x. 42) ! will evaluate to 42. We say that an argument is needed by a function if the • But using CBV, evaluation of (!x. 42) ! will function will always evaluate it. not terminate: ! gets evaluated first, even though its value is not needed here. • !x,y. x+y needs both its arguments. !x,y. x+1 needs only its first argument. We should therefore try to use call-by-value wherever • possible, but not when it will affect the termination • !x,y. 42 needs neither of its arguments. behaviour of a program.

Neededness Neededness

In fact, neededness is too conservative: !x,y,z. if x then y else ! These needed arguments can safely be passed by value: if their evaluation causes nontermination, this will just This function might not evaluate y, so only x is needed. happen sooner rather than later. But actually it’s still safe to pass y by value: if y doesn’t get evaluated then the function doesn’t terminate anyway, so it doesn’t matter if eagerly evaluating y causes nontermination.

Strictness Strictness What we really want is a more refined notion:

It is safe to pass an argument by value when the function fails to terminate whenever the argument fails to terminate. If we can develop an analysis that discovers which functions are strict in which arguments, we can When this more general statement holds, we use that information to selectively replace CBN say the function is strict in that argument. with CBV and obtain a more efficient program. !x,y,z. if x then y else ! is strict in x and strict in y. Strictness analysis Recursion equations

F1(x1, . . . , xk1 ) = e1 We can perform strictness analysis by abstract interpretation. · · · = · · ·

First, we must define a concrete world of Fn(x1, . . . , xkn ) = en programs and values.

We will use the simple language of recursion e ::= xi | Ai(e1, . . . , eri ) | Fi(e1, . . . eki ) equations, and only consider integer values. where each Ai is a symbol representing a built-in (predefined) function of arity ri.

Recursion equations Standard interpretation

For our earlier example, We must have some representation of plus(x,y) = if x=0 then y else plus(x-1,y+1) nontermination in our concrete domain. As values we will consider the integer results of we can write the recursion equation terminating computations, ", and a single extra value to represent nonterminating computations: ⊥. plus(x, y) = cond(eq(x, 0), y, plus(sub1 (x), add1 (y)))

Our concrete domain D is therefore "⊥ = " ∪ { ⊥ }. where cond, eq, 0, sub1 and add1 are built-in functions.

Standard interpretation Standard interpretation

Each built-in function needs a standard interpretation.

We will interpret each Ai as a function ai on values in D: The standard interpretation fi of a user-defined function Fi cond(⊥, x, y) = ⊥ is constructed from the built-in functions by composition and recursion according to its defining equation. cond(0, x, y) = y cond(p, x, y) = x otherwise plus(x, y) = cond(eq(x, 0), y, plus(sub1 (x), add1 (y))) eq(⊥, y) = ⊥ eq(x, ⊥) = ⊥ eq(x, y) = x =Z y otherwise

Abstract interpretation Abstract interpretation

Our abstraction must capture the properties we’re interested in, while discarding enough detail to make For each built-in function Ai we need a corresponding # analysis computationally feasible. strictness function ai — this provides the strictness interpretation for Ai. Strictness is all about termination behaviour, and in fact this is all we care about: does evaluation of an Whereas the standard interpretation of each built-in is expression definitely not terminate (as with !), or a function on concrete values from D, the strictness may it eventually terminate and return a result? interpretation of each will be a function on abstract values from D# (i.e. 0 and 1). Our abstract domain D# is therefore { 0, 1 }. Abstract interpretation Abstract interpretation

A formal relationship exists between the standard and x y eq#(x,y) abstract interpretations of each built-in function; the mathematical details are in the lecture notes. 0 0 0 0 1 0 Informally, we use the same technique as for multiplication and addition of integers in the last lecture: 1 0 0 we define the abstract operations using what we know about the behaviour of the concrete operations. 1 1 1

Abstract interpretation Abstract interpretation

p x y cond#(p,x,y) 0 0 0 0 0 0 1 0 These functions may be expressed more compactly as # 0 1 0 0 boolean expressions, treating 0 and 1 from D as false and true respectively. 0 1 1 0

1 0 0 0 We can use Karnaugh maps (from IA DigElec) to turn 1 0 1 1 each truth table into a simple boolean expression. 1 1 0 1 1 1 1 1

Abstract interpretation Abstract interpretation x, y y cond# 0, 0 0, 1 1, 1 1, 0 eq# 0 1

0 0 0 0 0 0 0 0 p x 1 0 1 1 1 1 0 1 x ∧ y p ∧ y p ∧ x cond#(p, x, y) = (p ∧ y) ∨ (p ∧ x) = p ∧ (x ∨ y) eq#(x, y) = x ∧ y

Strictness analysis Strictness analysis

So far, we have set up The point of this analysis is to discover the # missing piece: what is the strictness function fi • a concrete domain, D, equipped with corresponding to each user-defined Fi?

a standard interpretation ai of each built-in Ai, and • These strictness functions will show us exactly • a standard interpretation fi of each user-defined Fi; how each Fi is strict with respect to each of its and an abstract domain, D#, equipped with arguments — and that’s the information that tells • us where we can replace lazy, CBN-style # • an abstract interpretation ai of each built-in Ai. parameter passing with eager CBV. Strictness analysis Strictness analysis

plus!(x, y) = cond !(eq!(x, 0!), y, plus!(sub1 !(x), add1 !(y))) But recall that the recursion equations show us how to build up each user-defined function, by composition We already know all the other strictness functions: and recursion, from all the built-in functions: cond !(p, x, y) = p ∧ (x ∨ y) plus(x, y) = cond(eq(x, 0), y, plus(sub1 (x), add1 (y))) eq!(x, y) = x ∧ y 0! = 1 # # ! So we can build up the fi from the ai in the same way: sub1 (x) = x add1 !(x) = x plus!(x, y) = cond !(eq!(x, 0!), y, plus!(sub1 !(x), add1 !(y))) So we can use these to simplify the expression for plus#.

Strictness analysis Strictness analysis

plus!(x, y) = x ∧ (y ∨ plus!(x, y)) plus!(x, y) = cond !(eq!(x, 0!), y, plus!(sub1 !(x), add1 !(y))) = eq!(x, 0!) ∧ (y ∨ plus!(sub1 !(x), add1 !(y))) This is a recursive definition, and so unfortunately = eq!(x, 1) ∧ (y ∨ plus!(x, y)) doesn’t provide us with the strictness function directly. = x ∧ 1 ∧ (y ∨ plus!(x, y)) # = x ∧ (y ∨ plus!(x, y)) We want a definition of plus which satisfies this equation — actually we want the least fixed point of this equation, which (as ever!) we can compute iteratively.

Algorithm Algorithm

We have only one user-defined function, plus, and so for i = 1 to n do f#[i] := !x.0 only one recursion equation: while (f#[] changes) do plus(x, y) = cond(eq(x, 0), y, plus(sub1 (x), add1 (y))) for i = 1 to n do # f#[i] := !x.ei We initialise the corresponding element of our f#[] array to contain the always-0 strictness function of # ei means “ei (from the recursion equations) with each Aj the appropriate arity: # replaced with aj and each Fj replaced with f#[j]”. f#[1] := !x,y. 0

Algorithm Algorithm

# # # # # # e1 = cond (eq (x, 0 ), y, (!x,y. 0) (sub1 (x), add1 (y))) # On the first iteration, we calculate e1 : Simplifying: The recursion equations say # # # # • e1 = cond (eq (x, 0 ), y, 0) e1 = cond(eq(x, 0), y, plus(sub1(x), add1(y)))

# # # # • The current contents of f#[] say f1 is !x,y. 0 Using definitions of cond , eq and 0 : # ∧ ∧ ∨ • So: e1 = (x 1) (y 0)

# # # # # # Simplifying again: e1 = cond (eq (x, 0 ), y, (!x,y. 0) (sub1 (x), add1 (y))) # e1 = x ∧ y Algorithm Algorithm

# On the second iteration, we recalculate e1 :

• The recursion equations still say So, at the end of the first iteration, e1 = cond(eq(x, 0), y, plus(sub1(x), add1(y))) # f#[1] := !x,y. x ∧ y • The current contents of f#[] say f1 is !x,y. x ∧ y • So: # # # # # # e1 = cond (eq (x, 0 ), y, (!x,y. x ∧ y) (sub1 (x), add1 (y)))

Algorithm Algorithm

# # # # # # e1 = cond (eq (x, 0 ), y, (!x,y. x ∧ y) (sub1 (x), add1 (y)))

Simplifying: # # # # # # e1 = cond (eq (x, 0 ), y, sub1 (x) ∧ add1 (y)) So, at the end of the second iteration, Using definitions of cond#, eq#, 0#, sub1# and add1#: f#[1] := !x,y. x ∧ y # e1 = (x ∧ 1) ∧ (y ∨ (x ∧ y)) This is the same result as last time, so we stop. Simplifying again: # e1 = x ∧ y

Algorithm Optimisation

So now, finally, we can see that

plus#(1, 0) = 1 ∧ 0 = 0 and # # plus (x, y) = x ∧ y plus (0, 1) = 0 ∧ 1 = 0 which means our concrete plus function is strict in its first argument and strict in its second argument: we may always safely use CBV when passing either.

Summary

• Functional languages can use CBV or CBN evaluation • CBV is more efficient but can only be used in place of CBN if termination behaviour is unaffected Lecture 11: • Strictness shows dependencies of termination • Abstract interpretation may be used to perform Constraint-based analysis strictness analysis of user-defined functions • The resulting strictness functions tell us when it is safe to use CBV in place of CBN Motivation Constraint-based analysis

Intra-procedural analysis depends upon accurate Many of the analyses in this course can be thought of control-flow information. in terms of solving systems of constraints.

In the presence of certain language features (e.g. For example, in LVA, we generate equality constraints indirect calls) it is nontrivial to predict accurately from each instruction in the program: how control may flow at execution time — the naïve in-live(m) = (out-live(m) ∖ def(m)) ∪ ref(m) strategy is very imprecise. out-live(m) = in-live(n) ∪ in-live(o) in-live(n) = (out-live(n) ∖ def(n)) ∪ ref(n) A constraint-based analysis called 0CFA can compute a ⋯ more precise estimate of this information. and then iteratively compute their minimal solution.

0CFA Specimen language

0CFA — “zeroth-order control-flow analysis” — is a Functional languages are a good candidate for this constraint-based analysis for discovering which values kind of analysis; they have functions as first-class may reach different places in a program. values, so control flow may be complex.

When functions (or pointers to functions) are We will use a minimal syntax for expressions: present, this provides information about which functions may be potentially be called at each call site. e ::= x | c | λx. e | let x = e1 in e2

We can then build a more precise call graph. A program in this language is a closed expression.

Specimen program Program points let id = λx. x in id id 7 let

let id = λx. x in id id 7 id λ @ x x @ 7

id id Program points Program points (let id2 =let (λ idx 4=. x λ5)x.3 inx in(( idid8 idid 97)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 let 1 Each program point i has an associated flow variable αi.

Each αi represents the set of flow values which may be id 2 λ 3 @ 6 yielded at program point i during execution.

For this language the flow values are integers and x4 x 5 @ 7 710 function closures; in this particular program, the only values available are 710 and (λx4. x5)3. id 8 id 9 Program points Generating constraints (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1

The precise value of each αi is undecidable in general, so our analysis will compute a safe overapproximation. a a c αa ⊇ { c } From the structure of the program we can generate a set of constraints on the flow variables, which we can then treat as data-flow inequations and iteratively compute their least solution.

Generating constraints Generating constraints (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1

10 10 a b c a b c 7 α10 ⊇ { 7 } (λx . e ) αc ⊇ { (λx . e ) }

10 α10 ⊇ { 7 }

Generating constraints Generating constraints (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1

λxb. ... x a ... (λx4. x5)3 α ⊇ { (λx4. x5)3 } α ⊇ α 3 let xb = ... x a ... a b

10 10 α10 ⊇ { 7 } α10 ⊇ { 7 } 4 5 3 α3 ⊇ { (λx . x ) }

Generating constraints Generating constraints (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1

4 5 λx . ... x ... α5 ⊇ α4 αd ⊇ αc 2 8 a b c d let id = ... id ... α8 ⊇ α2 (let _ = _ in _ ) 2 9 let id = ... id ... α9 ⊇ α2 αa ⊇ αb

10 10 α10 ⊇ { 7 } α10 ⊇ { 7 } α8 ⊇ α2 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α3 ⊇ { (λx . x ) } α9 ⊇ α2 α5 ⊇ α4 Generating constraints Generating constraints (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1

α1 ⊇ α6 2 3 6 1 a b c (let _ = _ in _ ) (_ _ ) (αb ↦ αc) ⊇ αa

α2 ⊇ α3

10 10 α10 ⊇ { 7 } α8 ⊇ α2 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 α3 ⊇ { (λx . x ) } α9 ⊇ α2 α5 ⊇ α4 α5 ⊇ α4 α1 ⊇ α6 Generating constraints Generating constraints (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1 (let id2 = (λx4. x5)3 in ((id8 id9)7 710)6)1

10 α10 ⊇ { 7 } 8 9 7 ⊇ (_ _ ) (α9 ↦ α7) ⊇ α8 4 5 3 α1 α6 α3 ⊇ { (λx . x ) } α2 ⊇ α3 7 10 6 ⊇ (_ _ ) (α10 α6) ⊇ α7 α5 α4 ↦ (α9 ↦ α7) ⊇ α8 α8 ⊇ α2 (α10 ↦ α6) ⊇ α7 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α9 ⊇ α2 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 α5 ⊇ α4 α1 ⊇ α6 Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7

α1 = { } α6 = { } α1 = { } α6 = { }

α2 = { } α7 = { } α2 = { } α7 = { }

α3 = { } α8 = { } α3 = { } α8 = { }

α4 = { } α9 = { } α4 = { } α9 = { } 10 α5 = { } α10 = { } α5 = { } α10 = { }7 }

Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7

α1 = { } α6 = { } α1 = { } α6 = { } 4 5 3 α2 = { } α7 = { } α2 = { }(λx . x ) } α7 = { } 4 5 3 4 5 3 α3 = { }(λx . x ) } α8 = { } α3 = { }(λx . x ) } α8 = { }

α4 = { } α9 = { } α4 = { } α9 = { } 10 10 α5 = { } α10 = { }7 } α5 = { } α10 = { }7 } Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7

α1 = { } α6 = { } α1 = { } α6 = { } 4 5 3 4 5 3 α2 = { (λx . x ) } α7 = { } α2 = { (λx . x ) } α7 = { } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { }(λx . x ) } α3 = { (λx . x ) } α8 = { }(λx . x ) } 4 5 3 α4 = { } α9 = { } α4 = { } α9 = { }(λx . x ) } 10 10 α5 = { } α10 = { 7 } α5 = { } α10 = { 7 }

Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α4 ⊇ α9 α7 ⊇ α5

α1 = { } α6 = { } α1 = { } α6 = { } 4 5 3 4 5 3 α2 = { (λx . x ) } α7 = { } α2 = { (λx . x ) } α7 = { } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { }(λx . x ) } α3 = { (λx . x ) } α8 = { }(λx . x ) } 4 5 3 4 5 3 4 5 3 α4 = { } α9 = { }(λx . x ) } α4 = { }(λx . x ) } α9 = { }(λx . x ) } 10 10 α5 = { } α10 = { 7 } α5 = { } α10 = { 7 }

Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α9 α7 ⊇ α5

α1 = { } α6 = { } α1 = { } α6 = { } 4 5 3 4 5 3 4 5 3 α2 = { (λx . x ) } α7 = { } α2 = { (λx . x ) } α7 = { }(λx . x ) } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { (λx . x ) } α3 = { (λx . x ) } α8 = { (λx . x ) } 4 5 3 4 5 3 4 5 3 4 5 3 α4 = { (λx . x ) } α9 = { (λx . x ) } α4 = { (λx . x ) } α9 = { (λx . x ) } 4 5 3 10 4 5 3 10 α5 = { }(λx . x ) } α10 = { 7 } α5 = { }(λx . x ) } α10 = { 7 }

Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5

α1 = { } α6 = { } α1 = { } α6 = { } 4 5 3 4 5 3 4 5 3 4 5 3 α2 = { (λx . x ) } α7 = { }(λx . x ) } α2 = { (λx . x ) } α7 = { }(λx . x ) } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { (λx . x ) } α3 = { (λx . x ) } α8 = { (λx . x ) } 4 5 3 4 5 3 4 5 3 10 4 5 3 α4 = { (λx . x ) } α9 = { (λx . x ) } α4 = { (λx . x ) , }7 } α9 = { (λx . x ) } 4 5 3 10 4 5 3 10 α5 = { }(λx . x ) } α10 = { 7 } α5 = { }(λx . x ) } α10 = { 7 } Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5

4 5 3 4 5 3 α1 = { } α6 = { }(λx . x ) } α1 = { } α6 = { (λx . x ) } 4 5 3 4 5 3 4 5 3 4 5 3 α2 = { (λx . x ) } α7 = { }(λx . x ) } α2 = { (λx . x ) } α7 = { (λx . x ) } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { (λx . x ) } α3 = { (λx . x ) } α8 = { (λx . x ) } 4 5 3 10 4 5 3 4 5 3 10 4 5 3 α4 = { (λx . x ) , }7 } α9 = { (λx . x ) } α4 = { (λx . x ) , 7 } α9 = { (λx . x ) } 4 5 3 10 4 5 3 10 10 α5 = { }(λx . x ) } α10 = { 7 } α5 = { (λx . x ) , }7 } α10 = { 7 }

Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5

4 5 3 4 5 3 4 5 3 α1 = { } α6 = { (λx . x ) } α1 = { }(λx . x ) } α6 = { (λx . x ) } 4 5 3 4 5 3 10 4 5 3 4 5 3 10 α2 = { (λx . x ) } α7 = { (λx . x ) , }7 } α2 = { (λx . x ) } α7 = { (λx . x ) , }7 } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { (λx . x ) } α3 = { (λx . x ) } α8 = { (λx . x ) } 4 5 3 10 4 5 3 4 5 3 10 4 5 3 α4 = { (λx . x ) , 7 } α9 = { (λx . x ) } α4 = { (λx . x ) , 7 } α9 = { (λx . x ) } 4 5 3 10 10 4 5 3 10 10 α5 = { (λx . x ) , }7 } α10 = { 7 } α5 = { (λx . x ) , }7 } α10 = { 7 }

Solving constraints Solving constraints 10 10 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 α10 ⊇ { 7 } α8 ⊇ α2 α2 ⊇ α3 4 5 3 4 5 3 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α3 ⊇ { (λx . x ) } α9 ⊇ α2 (α9 ↦ α7) ⊇ α8 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α5 ⊇ α4 α1 ⊇ α6 (α10 ↦ α6) ⊇ α7 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5 α4 ⊇ α9 α7 ⊇ α5 α4 ⊇ α10 α6 ⊇ α5

4 5 3 4 5 3 10 4 5 3 10 4 5 3 10 α1 = { }(λx . x ) } α6 = { (λx . x ) , }7 } α1 = { (λx . x ) , }7 } α6 = { (λx . x ) , 7 } 4 5 3 4 5 3 10 4 5 3 4 5 3 10 α2 = { (λx . x ) } α7 = { (λx . x ) , }7 } α2 = { (λx . x ) } α7 = { (λx . x ) , 7 } 4 5 3 4 5 3 4 5 3 4 5 3 α3 = { (λx . x ) } α8 = { (λx . x ) } α3 = { (λx . x ) } α8 = { (λx . x ) } 4 5 3 10 4 5 3 4 5 3 10 4 5 3 α4 = { (λx . x ) , 7 } α9 = { (λx . x ) } α4 = { (λx . x ) , 7 } α9 = { (λx . x ) } 4 5 3 10 10 4 5 3 10 10 α5 = { (λx . x ) , }7 } α10 = { 7 } α5 = { (λx . x ) , 7 } α10 = { 7 }

Using solutions 1CFA

0CFA is still imprecise because it is monovariant: each 10 α10 = { 7 } expression has only one flow variable associated with it, so 4 5 3 multiple calls to the same function allow multiple values α2, α3, α8, α9 = { (λx . x ) } into the single flow variable for the function body, and 4 5 3 10 these values “leak out” at all potential call sites. α1, α4, α5, α6, α7 = { (λx . x ) , 7 } A better approximation is given by 1CFA (“first-order...”), 2 4 5 3 8 9 7 10 6 1 in which a function has a separate flow variable for each (let id = (λx . x ) in ((id id ) 7 ) ) call site in the program; this isolates separate calls to the same function, and so produces a more precise result. 1CFA Summary

• Many analyses can be formulated using constraints 1CFA is a polyvariant approach. • 0CFA is a constraint-based analysis Another alternative is to use a polymorphic • Inequality constraints are generated from the syntax approach, in which the values themselves are of a program enriched to support specialisation at different call A minimal solution to the constraints provides a safe sites (cf. ML polymorphic types). • approximation to dynamic control-flow behaviour It’s unclear which approach is “best”. • Polyvariant (as in 1CFA) and polymorphic approaches may improve precision

Motivation

In this part of the course we’re examining several Lecture 12: methods of higher-level program analysis. We have so far seen abstract interpretation and constraint- Inference-based analysis based analysis, two general frameworks for formally specifying (and performing) analyses of programs.

Another alternative framework is inference-based analysis.

Inference-based analysis Inference-based analysis

An inference system specifies judgements: Inference systems consist of sets of rules for determining program properties. Γ ! : Typically such a property of an entire program e φ depends recursively upon the properties of the program’s subexpressions; inference systems can • e is an expression (e.g. a complete program) directly express this relationship, and show how to recursively compute the property. • ! is a set of assumptions about free variables of e • " is a program property

Type systems Type systems

Consider the ML type system, for example. We will avoid the more complicated ML typing issues (see Types course for details) and just This particular inference system specifies consider the expressions in the lambda calculus: judgements about a well-typedness property: Γ ! e : t e ::= x | #x. e | e1 e2 Our program properties are types t: means “under the assumptions in !, the expression e has type t”. t ::= $ | int | t1 % t2 Type systems Type systems

! is a set of type assumptions of the form

{ x1 : t1, ..., xn : tn }

where each identifier xi is assumed to have type ti. In all inference systems, we use a set of rules to inductively define which judgements are valid. We write In a type system, these are the typing rules. ![x : t] to mean ! with the additional assumption that x has type t (overriding any other assumption about x).

Type systems Type systems

(Var) ! = { 2 : int, add : int % int % int, multiply : int % int % int } Γ[x : t] ! x : t e = #x. #y. add (multiply 2 x) y Γ[x : t] ! e : t! t = ? (Lam) Γ ! λx.e : t → t!

! e t → t! ! e t Γ 1 : Γ 2 : App ! ( ) Γ ! e1e2 : t

Type systems Optimisation

In the absence of a compile-time type checker, all values ! = { 2 : int, add : int % int % int, multiply : int % int % int } must be tagged with their types and run-time checks must e = #x. #y. add (multiply 2 x) y be performed to ensure types match appropriately.

t = int? % int % int If a type system has shown that the program is well-typed, execution can proceed safely without these tags and . . checks; if necessary, the final result of evaluation can be Γ[x : int][y : int] ! add : int → int → int Γ[x : int][y : int] ! multiply 2 x : int Γ[x : int][y : int] ! add (multiply 2 x) : int → int Γ[x : int][y : int] ! y : int tagged with its inferred type. Γ[x : int][y : int] ! add (multiply 2 x) y : int Γ[x : int] ! λy. add (multiply 2 x) y : int → int Hence the final result of evaluation is identical, but less Γ ! λx. λy. add (multiply 2 x) y : int → int → int run-time computation is required to produce it.

Safety Odds and evens

The safety condition for this inference system is Type-checking is just one application of ({} ! e : t) ⇒ ([[e]] ∈ [[t]]) inference-based program analysis. The properties do not have to be types; in where [[ e ]] and [ [ t ] ] are the denotations of e and t particular, they can carry more (or completely respectively: [[ e ]] is the value obtained by evaluating e, different!) information than traditional types do. and [ [ t ] ] is the set of all values of type t. We’ll consider a more program-analysis–related This condition asserts that the run-time behaviour of example: detecting odd and even numbers. the program will agree with the type system’s prediction. Odds and evens Odds and evens

(Var) Γ[x : φ] ! x : φ

This time, the program property " has the form ! ! Γ[x : φ] e : φ Lam ! → ! ( ) " ::= odd | even | "1 % "2 Γ λx.e : φ φ ! → ! ! Γ e1 : φ φ Γ e2 : φ App ! ( ) Γ ! e1e2 : φ

Odds and evens Odds and evens

! = { 2 : even, add : even % even % even, ! = { 2 : even, add : even % even % even, multiply : even % odd % even } multiply : even % odd % even } e = #x. #y. add (multiply 2 x) y e = #x. #y. add (multiply 2 x) y " = ? " = odd? % even % even

. . Γ[x : odd][y : even] ! add : even → even → even Γ[x : odd][y : even] ! multiply 2 x : even Γ[x : odd][y : even] ! add (multiply 2 x) : even → even Γ[x : odd][y : even] ! y : even Γ[x : odd][y : even] ! add (multiply 2 x) y : even Γ[x : odd] ! λy. add (multiply 2 x) y : even → even Γ ! λx. λy. add (multiply 2 x) y : odd → even → even

Safety Richer properties

The safety condition for this inference system is Note that if we want to show a judgement like ({} ! e : φ) ⇒ ([[e]] ∈ [[φ]]) Γ ! λx. λy. add (multiply 2 x) (multiply 3 y) : even → even → even we need more than one assumption about multiply: where [ [ " ] ] is the denotation of ": [ [ odd ]] = { z ∈ ! | z is odd }, ! = { ..., multiply : even % even % even, % % [[ e ven ] ] = { z ∈ ! | z is even }, multiply : odd even even, ... }

[[ " 1 % " 2 ]] = [[ " 1 ]] % [[ " 2 ]]

Richer properties Summary This might be undesirable, and one alternative is to enrich our properties instead; in this case we Inference-based analysis is another useful framework could allow conjunction inside properties, so that • our single assumption about multiply looks like: • Inference rules are used to produce judgements about programs and their properties multiply : even % even % even ∧ even % odd % even ∧ • Type systems are the best-known example odd % even % even ∧ • Richer properties give more detailed information odd % odd % odd • An inference system used for analysis has an We would need to modify the inference system associated safety condition to handle these richer properties. Motivation

We have so far seen many analyses which deal with Lecture 13: control- and data-flow properties of pure languages. However, many languages contain operations with side- effects, so we must also be able to analyse and safely Effect systems transform these impure programs.

Effect systems, a form of inference-based analysis, are often used for this purpose.

Side-effects Side-effects As an example language, we will use the lambda calculus extended with read and write operations on “channels”. A side-effect is some event — typically a change of state — which occurs as a result of evaluating an expression. e ::= x | !x.e | e1 e2 | "?x.e | "!e1.e2 “x++” changes the value of variable x. • • " represents some channel name. “malloc(42)” allocates some memory. • • "?x.e reads an integer from the channel named ", • “print 42” outputs a value to a stream. binds it to x, and returns the result of evaluating e. • "!e1.e2 evaluates e1, writes the resulting integer to channel ", and returns the result of evaluating e2.

Side-effects Side-effects Some example expressions:

read an integer from "?x. x channel " and return it Ignoring their side-effects, the typing rules for write the (integer) value these new operations are straightforward. of x to channel " and "!x. y return the value of y

read an integer from channel ", write it to "?x. #!x. x channel # and return it

Side-effects Effect systems

However, in order to perform any transformations on Γ[x : int] ! e : t a program in this language it would be necessary to (Read) pay attention to its potential side-effects. Γ ! ξ?x.e : t For example, we might need to devise an analysis to tell us which channels may be read or written during Γ ! e1 : int Γ ! e2 : t evaluation of an expression. (Write) Γ ! ξ!e1.e2 : t We can do this by modifying our existing type system to create an effect system (or “type and effect system”). Effect systems Effect systems

First we must formally define our effects: For example:

An expression has effects F. F = { R" } F is a set containing elements of the form "?x. x

read from channel " R" "!x. y F = { W" }

write to channel " W" "?x. #!x. x F = { R", W# }

Effect systems Effect systems

But we also need to be able to handle expressions like To handle these latent effects we extend the syntax of types so that function types are annotated with the !x. "!x. x effects that may occur when a function is applied: whose evaluation doesn’t have any immediate effects. F t ::= int | t1 $ t2 In this case, the effect W" may occur later, whenever this newly-created function is applied.

Effect systems Effect systems

So, although it has no immediate effects, the type of We can now modify the existing type system to make an effect system — an inference system which produces judgements about !x. "!x. x the type and effects of an expression: is { W" } Γ ! : int $ int e t, F

Effect systems Effect systems

Γ[x : int] ! e : t, F Γ[x : int] ! e : t, F (Read) (Read) Γ ! ξ?x.e : t, {Rξ} ∪ F Γ ! ξ?x.e : t, {Rξ} ∪ F

int ! int ! Γ ! e1 : , F Γ ! e2 : t, F Write Γ ! e1 : , F Γ ! e2 : t, F Write ! ( ) ! ( ) Γ ! ξ!e1.e2 : t, F ∪ {Wξ} ∪ F Γ ! ξ!e1.e2 : t, F ∪ {Wξ} ∪ F Effect systems Effect systems (Var) (Var) Γ[x : t] ! x : t, {} Γ[x : t] ! x : t, {}

! Γ[x : t] ! e : t , F Lam F ( ) Γ ! λx.e : t → t!, {}

Effect systems Effect systems (Var) Γ[x : t] ! x : t, {}

{x : int, y : int} ! x : int, {} Γ[x : t] ! e : t!, F Lam {x : int, y : int} ! ξ!x. x : int, {Wξ} F ( ) ! {W } Γ ! λx.e : t → t , {} {y : int} ! λx. ξ!x. x : int →ξ int, {} {y : int} ! y : int, {}

!! {y : int} ! (λx. ξ!x. x) y : int, {Wξ} ! F→ ! ! ! Γ e1 : t t , F Γ e2 : t, F App ! ! !! ( ) Γ ! e1e2 : t , F ∪ F ∪ F

Effect systems Effect subtyping

{x : int, y : int} ! x : int, {} We would probably want more expressive control structure in a real programming language. {x : int, y : int} ! ξ!x. x : int, {Wξ}

{W } {y : int} ! λx. ξ!x. x : int →ξ int, {} {y : int} ! y : int, {} For example, we could add if-then-else:

{y : int} ! (λx. ξ!x. x) y : int, {Wξ} e ::= x | !x.e | e1 e2 | "?x.e | "!e1.e2 | if e1 then e2 else e3

Effect subtyping Effect subtyping

! !! ! e1 int, F ! e2 t, F ! e3 t, F Γ : Γ : Γ : Cond However, there are some valid uses of if-then-else ! !! ( ) Γ ! if e1 then e2 else e3 : t, F ∪ F ∪ F which this rule cannot handle by itself. Effect subtyping Effect subtyping ! { W" } { } { W" } { } int $ int int $ int int $ int int $ int if x then !x. "!3. x + 1 else !x. x + 2 if x then !x. "!3. x + 1 else !x. x + 2

! !! ! e1 int, F ! e2 t, F ! e3 t, F Γ : Γ : Γ : Cond ! !! ( ) Γ ! if e1 then e2 else e3 : t, F ∪ F ∪ F

Effect subtyping Effect subtyping

F ! ! e t → t!, F F ! ⊆ F !! We can solve this problem by adding a new Γ : Sub rule to handle subtyping. F !! ( ) Γ ! e : t → t!, F

Effect subtyping Effect subtyping

{ W" } { W" } int $ int % int $ int { W" } (SUB) { } { W" } { } int $ int int $ int int $ int int $ int if x then !x. "!3. x + 1 else !x. x + 2 if x then !x. "!3. x + 1 else !x. x + 2

Optimisation Safety

The information discovered by the effect system is useful when deciding whether particular transformations are safe. An expression with no immediate side-effects is ({} ! e : t, F ) ⇒ (v ∈ [[t]] ∧ f ⊆ F where (v, f) = [[e]]) referentially transparent: it can safely be replaced with another expression (with the same value and type) with no change to the semantics of the program.({} ! e : t, F ) ⇒ (v ∈ [[t]] ∧ f ⊆ F where (v, f) = [[e]])

For example, referentially transparent expressions may safely be removed if LVA says they are dead. Extra structure Extra structure In this analysis we are using sets of effects.

As a result, we aren’t collecting any information about how many times each effect may occur, If we use a different representation of effects, or the order in which they may happen. and use different operations on them, we can keep track of more information. F = { R", W# } "?x. #!x. x One option is to use sequences of effects and use an append operation when combining them. #!y. "?x. x F = { R", W# }

#!y. "?x. #!x. x F = { R", W# }

Extra structure Extra structure

In the new system, these expressions Γ[x : int] ! e : t, F all have different effects: (Read) Γ ! ξ?x.e : t, "Rξ# @ F "?x. #!x. x F =〈 R"; W# 〉 iinntt !! ΓΓ !! ee11 :: ,,FF ΓΓ !! ee22 :: tt,,FF WWrriittee #!y. "?x. x F =〈 W#; R" 〉 !! (( )) Γ ! ξ!e1.e2 : t, F ∪@{"WWξξ}# ∪@FF #!y. "?x. #!x. x F =〈 W#; R"; W# 〉

Extra structure Summary

Whether we use sequences instead of sets depends Effect systems are a form of inference-based analysis upon whether we care about the order and number • of effects. In the channel example, we probably don’t. • Side-effects occur when expressions are evaluated • Function types must be annotated to account for But if we were tracking file accesses, it would be latent effects important to ensure that no further read or write effects occurred after a file had been closed. • A type system can be modified to produce judgements about both types and effects And if we were tracking memory allocation, we Subtyping may be required to handle annotated types would want to ensure that no block of memory got • deallocated twice. • Different effect structures may give more information

Instruction scheduling

character stream

Lecture 14: token stream

Instruction scheduling optimisation parse tree

optimisation intermediate code decompilation

optimisation target code Motivation Single-cycle implementation

In single-cycle processor designs, an entire instruction We have seen optimisation techniques which involve is executed in a single clock cycle. removing and reordering code at both the source- and intermediate-language levels in an attempt to achieve Each instruction will use some of the processor’s the smallest and fastest correct program. functional units: These techniques are platform-independent, and pay Instruction Register Memory Register little attention to the details of the target architecture. Execute fetch fetch access write-back (EX) (IF) (RF) (MEM) (WB) We can improve target code if we consider the architectural characteristics of the target processor. For example, a load instruction uses all five.

Single-cycle implementation Single-cycle implementation

On these processors, the order of instructions doesn’t make any difference to execution time: each instruction takes one clock cycle, so n instructions will take n cycles lw $1,0($0) lw $2,4($0) lw $3,8($0) and can be executed in any (correct) order.

IF RF EX MEM WB IF RF EX MEM WB IF RF EX MEM WB In this case we can naïvely translate our optimised 3- address code by expanding each intermediate instruction into the appropriate sequence of target instructions; clever reordering is unlikely to yield any benefits.

Pipelined implementation Pipelined implementation

In pipelined processor designs (e.g. MIPS R2000), each functional unit works independently and does its job lw $1,0($0) IF RF EX MEM WB in a single clock cycle, so different functional units can be handling different instructions simultaneously. lw $2,4($0) IF RF EX MEM WB These functional units are arranged in a pipeline, and the result from each unit is passed to the next one lw $3,8($0) IF RF EX MEM WB via a pipeline register before the next clock cycle.

Pipelined implementation Pipeline hazards

However, it is not always possible to run the pipeline at full capacity. In this multicycle design the clock cycle is much Some situations prevent the next instruction shorter (one functional unit vs. one complete from executing in the next clock cycle: this is a instruction) and ideally we can still execute one pipeline hazard. instruction per cycle when the pipeline is full. On interlocked hardware (e.g. SPARC) a hazard Programs will therefore execute more quickly. will cause a pipeline stall; on non-interlocked hardware (e.g. MIPS) the compiler must generate explicit NOPs to avoid errors. Pipeline hazards Pipeline hazards

Consider data hazards: these occur when an instruction depends upon the result of an earlier one. add $3,$1,$2 add $3,$1,$2 IF RF EX MEM WB add $5,$3,$4 add $5,$3,$4 STALL IF RF EX

The pipeline must stall until the result of the first add has been written back into register $3.

Pipeline hazards Pipeline hazards

The severity of this effect can be reduced by using add $3,$1,$2 IF RF EX MEM WB feed-forwarding: extra paths are added between functional units, allowing data to be used before it add $5,$3,$4 IF RF EX MEM WB has been written back into registers.

Pipeline hazards Pipeline hazards

But even when feed-forwarding is used, lw $1,0($0) IF RF EX MEM WB some combinations of instructions will always result in a stall. add $3,$1,$2 STALL IF RF EX MEM WB

Instruction order Instruction order

Since particular combinations of instructions cause this problem on pipelined architectures, we can achieve better performance by reordering instructions where possible. lw $1,0($0) IF RF EX MEM WB add $2,$2,$1 STALL IF RF EX MEM WB lw $1,0($0) lw $3,4($0) IF RF EX MEM WB add $2,$2,$1 lw $3,4($0) add $4,$4,$3 STALL IF RF EX MEM WB add $4,$4,$3 10 cycles Instruction order Instruction dependencies

We can only reorder target-code instructions if the meaning of the program is preserved. lw $1,0($0) IF RF EX MEM WB lw $3,4($0) IF RF EX MEM WB We must therefore identify and respect the data dependencies which exist between instructions. add $2,$2,$1 IF RF EX MEM WB add $4,$4,$3 IF RF EX MEM WB In particular, whenever an instruction is dependent upon an earlier one, the order of these two instructions must not be reversed. 8 cycles

Instruction dependencies Instruction dependencies

There are three kinds of data dependency: Read after write: An instruction reads from a location • Read after write after an earlier instruction has written to it. • Write after read Write after write add $3,$1,$2 add $4,$4,$3 • ⋯ ⋯ Whenever one of these dependencies exists between add $4,$4,$3 add $3,$1,$2 two instructions, we cannot safely permute them. Reads old value ✗ Instruction dependencies Instruction dependencies

Write after read: Write after write: An instruction writes to a location An instruction writes to a location after an earlier instruction has read from it. after an earlier instruction has written to it.

add $4,$4,$3 add $3,$1,$2 add $3,$1,$2 add $3,$4,$5 ⋯ ⋯ ⋯ ⋯ add $3,$1,$2 add $4,$4,$3 add $3,$4,$5 add $3,$1,$2 Reads new value ✗ Writes old value ✗ Instruction scheduling Preserving dependencies

We would like to reorder the instructions Firstly, we can construct a directed acyclic graph (DAG) within each basic block in a way which to represent the dependencies between instructions: • preserves the dependencies between those • For each instruction in the basic block, create a instructions (and hence the correctness of corresponding vertex in the graph. the program), and • For each dependency between two instructions, • achieves the minimum possible number of create a corresponding edge in the graph. pipeline stalls. ‣ This edge is directed: it goes from the earlier instruction to the later one. We can address these two goals separately. Preserving dependencies Preserving dependencies

1 2 1 lw $1,0($0) 2 lw $2,4($0) 3 4 Any topological sort of this DAG (i.e. any linear 3 add $3,$1,$2 ordering of the vertices which keeps all the edges 4 sw $3,12($0) “pointing forwards”) will maintain the dependencies lw $4,8($0) 5 and hence preserve the correctness of the program. 6 add $3,$1,$4 5 6 7 sw $3,16($0) 7

Preserving dependencies Minimising stalls

1 2 1, 2, 3, 4, 5, 6, 7 Secondly, we want to choose an instruction order 2, 1, 3, 4, 5, 6, 7 which causes the fewest possible pipeline stalls. 1, 2, 3, 5, 4, 6, 7 Unfortunately, this problem is (as usual) NP-complete 3 4 1, 2, 5, 3, 4, 6, 7 and hence difficult to solve in a reasonable amount of 1, 5, 2, 3, 4, 6, 7 time for realistic quantities of instructions. 5, 1, 2, 3, 4, 6, 7 5 6 2, 1, 3, 5, 4, 6, 7 However, we can devise some static scheduling 2, 1, 5, 3, 4, 6, 7 heuristics to help guide us; we will hence choose a 2, 5, 1, 3, 4, 6, 7 sensible and reasonably optimal instruction order, if 7 5, 2, 1, 3, 4, 6, 7 not necessarily the absolute best one possible.

Minimising stalls Algorithm

Each time we’re emitting the next instruction, we should try to choose one which:

• does not conflict with the previous emitted instruction Armed with the scheduling DAG and the static • is most likely to conflict if first of a pair (e.g. prefer lw scheduling heuristics, we can now devise an to add) algorithm to perform instruction scheduling. • is as far away as possible (along paths in the DAG) from an instruction which can validly be scheduled last

Algorithm Algorithm

• While the candidate list is non-empty: • Construct the scheduling DAG. • If possible, emit a candidate instruction satisfying all ‣ We can do this in O(n2) by scanning backwards three of the static scheduling heuristics; through the basic block and adding edges as • if no instruction satisfies all the heuristics, either dependencies arise. emit NOP (on MIPS) or an instruction satisfying • Initialise the candidate list to contain the minimal only the last two heuristics (on SPARC). elements of the DAG. • Remove the instruction from the DAG and insert the newly minimal elements into the candidate list. Algorithm Algorithm Candidates: Candidates: 1 2 1 2 { 1, 2, 5 } { 2, 5 }

3 4 1 lw $1,0($0) 3 4 1 lw $1,0($0) 2 lw $2,4($0)

5 6 5 6

7 7

Algorithm Algorithm Candidates: Candidates: 1 2 1 2 { 3, 5 } { 3 }

3 4 1 lw $1,0($0) 3 4 1 lw $1,0($0) 2 lw $2,4($0) 2 lw $2,4($0) 5 lw $4,8($0) 5 lw $4,8($0) add $3,$1,$2 5 6 5 6 3

7 7

Algorithm Algorithm Candidates: Candidates: 1 2 1 2 { 4 } { 6 }

3 4 1 lw $1,0($0) 3 4 1 lw $1,0($0) 2 lw $2,4($0) 2 lw $2,4($0) 5 lw $4,8($0) 5 lw $4,8($0) add $3,$1,$2 add $3,$1,$2 5 6 3 5 6 3 4 sw $3,12($0) 4 sw $3,12($0) 6 add $3,$1,$4 7 7

Algorithm Algorithm Candidates: 1 2 Original code: Scheduled code: { 7 } 1 lw $1,0($0) 1 lw $1,0($0) 2 lw $2,4($0) 2 lw $2,4($0) 3 4 1 lw $1,0($0) 3 add $3,$1,$2 5 lw $4,8($0) 2 lw $2,4($0) 4 sw $3,12($0) 3 add $3,$1,$2 5 lw $4,8($0) 5 lw $4,8($0) 4 sw $3,12($0) add $3,$1,$2 5 6 3 6 add $3,$1,$4 6 add $3,$1,$4 4 sw $3,12($0) 7 sw $3,16($0) 7 sw $3,16($0) 6 add $3,$1,$4 7 sw $3,16($0) 2 stalls no stalls 7 13 cycles 11 cycles Dynamic scheduling Dynamic scheduling

Instruction scheduling is important for getting the best But: performance out of a processor; if the compiler does a • This is still compiler technology, just increasingly bad job (or doesn’t even try), performance will suffer. being implemented in hardware. As a result, modern processors (e.g. Intel Pentium) • Somebody — now hardware designers — must have dedicated hardware for performing instruction still understand the principles. scheduling dynamically as the code is executing. • Embedded processors may not do dynamic scheduling, or may have the option to turn the This may appear to render compile-time scheduling feature off completely to save power, so it’s still rather redundant. worth doing at compile-time.

Summary

• Instruction pipelines allow a processor to work on Lecture 15: executing several instructions at once Register allocation vs. • Pipeline hazards cause stalls and impede optimal throughput, even when feed-forwarding is used instruction scheduling; • Instructions may be reordered to avoid stalls legality of reverse • Dependencies between instructions limit reordering engineering • Static scheduling heuristics may be used to achieve near-optimal scheduling with an O(n2) algorithm

Allocation vs. scheduling Allocation vs. scheduling lexing, parsing, LDR v36,v32 We have seen why register allocation is a useful *x := *a; translation STR v36,v33 compilation phase: when done well, it can make *y := *b; LDR v37,v34 the best use of available registers and hence STR v37,v35 reduce the number of spills to memory. register allocation Unfortunately, by maximising the utilisation of lw $5,0($1) LDR v5,v1 physical registers, register allocation makes sw $5,0($2) STR v5,v2 instruction scheduling significantly more difficult. lw $5,0($3) code LDR v5,v3 generation sw $5,0($4) STR v5,v4

Allocation vs. scheduling Allocation vs. scheduling lw $5,0($1) IF RF EX MEM WB 1 2 sw $5,0($2) STALL IF RF EX MEM WB Can we reorder them to avoid stalls? lw $5,0($3) IF RF EX MEM WB 3 4 sw $5,0($4) STALL IF RF EX MEM WB

lw $5,0($1) 1 lw $5,0($1) 1, 2, 3, 4 sw $5,0($2) This schedule of instructions 2 sw $5,0($2) No: this is the only lw $5,0($3) produces two pipeline stalls 3 lw $5,0($3) correct schedule for (or requires two NOPs). these instructions. sw $5,0($4) 4 sw $5,0($4) Allocation vs. scheduling Allocation vs. scheduling lexing, parsing, LDR v36,v32 *x := *a; translation STR v36,v33 *y := *b; LDR v37,v34 We might have done better if STR v37,v35 register $5 wasn’t so heavily used. register allocation If only our register allocation had been less aggressive! lw $5,0($1) LDR v5,v1 sw $5,0($2) STR v5,v2 lw $6,0($3) code LDR v6,v3 generation sw $6,0($4) STR v6,v4

Allocation vs. scheduling Allocation vs. scheduling

1 2 lw $5,0($1) IF RF EX MEM WB

lw $6,0($3) IF RF EX MEM WB

sw $5,0($2) IF RF EX MEM WB 3 4 sw $6,0($4) IF RF EX MEM WB

1 lw $5,0($1) 1, 2, 3, 4 lw $5,0($1) 1, 3, 2, 4 This schedule of the 2 sw $5,0($2) lw $6,0($3) 3, 1, 2, 4 new instructions 3 lw $6,0($3) 1, 3, 4, 2 produces no stalls. sw $5,0($2) 4 sw $6,0($4) 3, 4, 1, 2 sw $6,0($4)

Allocation vs. scheduling Allocation vs. scheduling

There is clearly antagonism between register allocation and instruction scheduling: one reduces One option is to try to allocate physical registers spills by using fewer registers, but the other can better cyclically rather than re-using them at the earliest reduce stalls when more registers are used. opportunity. This is related to the phase-order problem discussed It is this eager re-use of registers that causes earlier in the course, in which we would like to defer stalls, so if we can avoid it — and still not spill any optimisation decisions until we know how they will virtual registers to memory — we will have a affect later phases in the compiler. better chance of producing an efficient program. It’s not clear how best to resolve the problem.

Allocation vs. scheduling Allocation vs. scheduling In practise this means that, when doing register allocation by colouring for a basic block, we should So, if we are less zealous about reusing registers, this should hopefully result in a better instruction schedule • satisfy all of the important constraints as usual while not incurring any extra spills. (i.e. clash graph, preference graph), In general, however, it is rather difficult to predict • see how many spare physical registers we still exactly how our allocation and scheduling phases will have left over, and then interact, and this particular solution is quite ad hoc. • for each unallocated virtual register, try to choose a physical register distinct from all Some recent research (e.g. CRAIG system in 1995, others allocated in the same basic block. Touati’s PhD thesis in 2002) has improved the situation. Allocation vs. scheduling

The same problem also shows up in dynamic scheduling done by hardware. Executable x86 code, for example, has lots of Lecture 16: register reuse because of the small number of physical registers available. Decompilation

The Pentium copes by actually having more registers than advertised; it does dynamic recolouring using this larger register set, which then enables more effective scheduling.

Decompilation Motivation

character stream The job of an optimising compiler is to turn human- readable source code into efficient, executable target code. token stream Although executable code is useful, software is most optimisation parse tree valuable in source code form, where it can be easily read and modified. optimisation intermediate code decompilation The source code corresponding to an executable is not always available — it may be lost, missing or secret — so optimisation target code we might want to use decompilation to recover it.

Reverse engineering Reverse engineering

Requirements Ideas

In general terms, engineering is a process which Design decreases the level of abstraction of some system. compiler

Target code Source code

Reverse engineering Reverse engineering

Requirements Ideas In contrast, reverse engineering is the process of increasing the level of abstraction of some system, Design making it less suitable for implementation but more suitable for comprehension and modification. decompiler

Target code Source code Legality and ethics Legality and ethics

Companies and individuals responsible for creating software generally consider source code to be their It is quite feasible to decompile and otherwise reverse- confidential intellectual property; they will not make it engineer most software. available, and they do not want you to reconstruct it.

So if reverse-engineering software is technologically (There are some well-known exceptions.) possible, is there any ethical barrier to doing it? Usually this desire is expressed via an end-user In particular, when is it legal to do so? license agreement, either as part of a shrink-wrapped software package or as an agreement to be made at installation time (“click-wrap”).

Legality and ethics Legality and ethics However, the European Union Software Directive of 1991 (91/250/EC) says:

3. The person having a right to use a copy of a computer program shall be entitled, Article 4 Restricted Acts without the authorization of the rightholder, to observe, study or test the functioning “The authorization of the rightholder shall not be of the program in order to determine the ideas and principles which underlie any Subject to the provisions of Articles 5 and 6, the exclusive rights of the rightholder element of the program if he does so while performing any of the acts of loading, within the meaning of Article 2, shall include the right to do or to authorize: displaying, running, transmitting or storing the program which he is entitled to do. required where [...] translation [of a program is] (a) the permanent or temporary reproduction of a computer program by any means Article 6 Decompilation and in any form, in part or in whole. Insofar as loading, displaying, running, transmission or storage of the computer program necessitate such reproduction, 1. The authorization of the rightholder shall not be required where reproduction of the such acts shall be subject to authorization by the rightholder; code and translation of its form within the meaning of Article 4 (a) and (b) are necessary to achieve the interoperability of [that indispensable to obtain the information necessary to achieve the interoperability of (b) the translation, adaptation, arrangement and any other alteration of a computer an independently created computer program with other programs, provided that the program and the reproduction of the results thereof, without prejudice to the rights of following conditions are met: the person who alters the program; program] with other programs, provided [...] (a) these acts are performed by the licensee or by another person having a right to (c) any form of distribution to the public, including the rental, of the original computer use a copy of a program, or on their behalf by a person authorized to to so; program or of copies thereof. The first sale in the Community of a copy of a program these acts are performed by [a] person having a by the rightholder or with his consent shall exhaust the distribution right within the (b) the information necessary to achieve interoperability has not previously been Community of that copy, with the exception of the right to control further rental of the readily available to the persons referred to in subparagraph (a); and (c) these acts program or a copy thereof. are confined to the parts of the original program which are necessary to achieve interoperability. right to use a copy of the program” Article 5 Exceptions to the restricted acts 2. The provisions of paragraph 1 shall not permit the information obtained through its 1. In the absence of specific contractual provisions, the acts referred to in Article 4 application: (a) and (b) shall not require authorization by the rightholder where they are necessary for the use of the computer program by the lawful acquirer in accordance (a) to be used for goals other than to achieve the interoperability of the independently with its intended purpose, including for error correction. created computer program;

2. The making of a back-up copy by a person having a right to use the computer (b) to be given to others, except when necessary for the interoperability of the program may not be prevented by contract insofar as it is necessary for that use. independently created computer program; or (c) to be used for the development, production or marketing of a computer program substantially similar in its expression, or for any other act which infringes copyright.

Legality and ethics Legality and ethics

The more recent European Union Copyright Directive of And the USA has its own implementation of the 2001 (2001/29/EC, aka “EUCD”) is the EU’s WIPO Copyright Treaty: the Digital Millennium implementation of the 1996 WIPO Copyright Treaty. Copyright Act of 1998 (DMCA), which contains a similar exception for reverse engineering: It is again concerned with the ownership rights of technological IP, but Recital 50 states that: “This exception permits circumvention [...] for the “[this] legal protection does not affect the specific sole purpose of identifying and analyzing elements of provisions [of the EUSD]. In particular, it should not the program necessary to achieve interoperability apply to [...] computer programs [and shouldn’t] prevent with other programs, to the extent that such acts [...] the use of any means of circumventing a are permitted under copyright law.” technological measure [allowed by the EUSD].”

Legality and ethics Clean room design

Despite the complexity of legislation, it is possible to Predictably enough, the interaction between the do useful reverse-engineering without breaking the law. EUSD, EUCD and DMCA is complex and unclear, particularly at the increasingly-blurred interfaces In 1982, Compaq produced the first fully IBM- between geographical jurisdictions (cf. Dmitry compatible personal computer by using clean room Sklyarov), and between software and other forms design (aka “Chinese wall technique”) to reverse- of technology (cf. Jon Johansen). engineer the proprietary IBM BIOS.

Get a lawyer. This technique is effective in legally circumventing copyrights and trade secrets, although not patents. Summary Why decompilation?

• Register allocation makes scheduling harder by creating extra dependencies between instructions This course is ostensibly about Optimising Compilers. • Less aggressive register allocation may be desirable It is really about program analysis and transformation. • Some processors allocate and schedule dynamically • Reverse engineering is used to extract source code Decompilation is achieved through analysis and and specifications from executable code transformation of target code; the transformations just work in the opposite direction. • Existing copyright legislation may permit limited reverse-engineering for interoperability purposes

The decompilation problem The decompilation problem Optimising compilation is even worse:

Even simple compilation discards a lot of information: • Dead code and common subexpressions are eliminated • Comments • Algebraic expressions are rewritten • Function and variable names • Code and data are inlined; loops are unrolled • Structured control flow • Unrelated local variables are allocated to the • Type information same physical register • Instructions are reordered by code motion optimisations and instruction scheduling

The decompilation problem Intermediate code

Some of this information is never going to be automatically recoverable (e.g. comments, variable It is relatively straightforward to extract a names); some of it we may be able to partially flowgraph from an assembler program. recover if our techniques are sophisticated enough. Basic blocks are located in the same way as during Compilation is not injective. Many different source forward compilation; we must simply deal with the programs may result in the same compiled code, so semantics of the target instructions rather than our the best we can do is to pick a reasonable intermediate 3-address code. representative source program.

Intermediate code Control reconstruction

For many purposes (e.g. simplicity, retargetability) it might be beneficial to convert the target instructions back into A compiler apparently destroys the high-level control 3-address code when storing it into the flowgraph. structure which is evident in a program’s source code. This presents its own problems: for example, many After building a flowgraph during decompilation, we can architectures include instructions which test or set recover some of this structure by attempting to match condition flags in a status register, so it may be necessary intervals of the flowgraph against some fixed set of to laboriously reconstruct this behaviour with extra familiar syntactic forms from our high-level language. virtual registers and then use dead-code elimination to remove all unnecessary instructions thus generated. Finding loops Dominance

In a flowgraph, we say a node m dominates another node n if control must go through m before it can reach n. Any structured loops from the original program will have been compiled into tests and branches; they The immediate dominator of a node n is the will look like arbitrary (“spaghetti”) control flow. unique node that dominates n but doesn’t dominate any other dominator of n. In order to recover the high-level structure of these loops, we must use dominance. We can represent this dominance relation with a dominance tree in which each edge connects a node with its immediate dominator.

Dominance Dominance ENTRY f ENTRY f a a b c b e c d e d EXIT EXIT Back edges Back edges

ENTRY f ENTRY f a We can now define the concept of a back edge. a b c In a flowgraph, a back edge is one whose head dominates its tail. b e c d e

EXIT d EXIT

Finding loops Finding loops ENTRY f

Each back edge has an associated loop. a

The head of a back edge points to the loop header, and the loop body consists of all the nodes from b c which the tail of the back edge can be reached without passing through the loop header. d e

EXIT Finding loops Finding loops

Here, the loop header contains a conditional which determines whether the loop body is executed, and the last node of the body unconditionally transfers control back to the header. Once each loop has been identified, we can examine its structure to determine what kind of loop it is, and hence how best to represent it in source code. b This structure corresponds to source-level while (...) {...} d syntax.

Finding loops Finding conditionals

Here, the loop header unconditionally allows the body to execute, and the last node of the body tests whether the loop should execute again. A similar principle applies when trying to reconstruct conditionals: we look for structures in the flowgraph which may be represented by a This structure corresponds particular forms of high-level language syntax. to source-level do {...} while (...) e syntax.

Finding conditionals Finding conditionals

The first node in this interval The first node in this interval transfers control to one a transfers control to one node if node if some condition is true, and another node if the some condition is true, otherwise it condition is false; control always reaches some later node. transfers control to another node b (which control also eventually reaches along the first branch). a This structure corresponds c to source-level This structure corresponds to b c if (...) then {...} source-level else {...} d if (...) then {...} syntax. syntax. d

Control reconstruction Type reconstruction

Many source languages also contain rich information We can keep doing this for whatever other control- about the types of variables: integers, booleans, flow constructs are available in our source language. arrays, pointers, and more elaborate data-structure types such as unions and structs. Once an interval of the flowgraph has been matched against a higher-level control structure in At the target code level there are no variables, only this way, its entire subgraph can be replaced with a registers and memory locations. single node which represents that structure and contains all of the information necessary to Types barely exist here: memory contains arbitrary generate the appropriate source code. bytes, and registers contain integers of various bit- widths (possibly floating-point values too). Type reconstruction Type reconstruction

So each user variable may be spread between several Reconstruction of the types of source-level registers — and each register may hold the value of variables is made more difficult by the different variables at different times. combination of SSA and register allocation performed by an optimising compiler. It’s therefore a bit hopeless to try to give a type to each physical register; the notional type of the value SSA splits one user variable into many variables held by any given register will change during execution. — one for each static assignment — and any of these variables with disjoint live ranges may be int x = 42; MOV r3,#42 allocated to the same physical register. ⋯ ⋯ char *y = “42”; MOV r3,#0xFF34

Type reconstruction Type reconstruction

int foo (int *x) { Happily, we can undo the damage by once again return x[1] + 2; converting to SSA form: this will split a single C } register into many registers, each of which can be assigned a different type if necessary. compile

MOV r3,#42 MOV r3a,#42 ⋯ ⋯ f: ldr r0,[r0,#4] MOV r3,#0xFF34 MOV r3b,#0xFF34 ARM add r0,r0,#2 mov r15,r14

Type reconstruction Type reconstruction

int f (int r0) { int f (int r0) { r0 = *(int *)(r0 + 4); r0 = *(int *)(r0 + 4); C r0 = r0 + 2; r0 = r0 + 2; return r0; return r0; } } SSA decompile int f (int r0a) { f: ldr r0,[r0,#4] int r0b = *(int *)(r0a + 4); ARM add r0,r0,#2 int r0c = r0b + 2; mov r15,r14 return r0c; } Type reconstruction Type reconstruction

int f (int *r0a) { int f (int *r0a) { int r0b = *(r0a + 1); int r0b = *(r0a + 1); int r0c = r0b + 2; int r0c = r0b + 2; return r0c; return r0c; } } reconstruct types reconstruct syntax

int f (int r0a) { int f (int *r0a) { int r0b = *(int *)(r0a + 4); int r0b = r0a[1]; int r0c = r0b + 2; int r0c = r0b + 2; return r0c; return r0c; } } Type reconstruction Type reconstruction

int f (int *r0a) { int f (int *r0a) { return r0a[1] + 2; return r0a[1] + 2; } }

propagate copies In fact, the return type could be int f (int *r0a) { anything, so more generally: int r0 = r0 [1]; b a T f (T *r0 ) { int r0 = r0 + 2; a c b return r0 [1] + 2; return r0 ; a c } } Type reconstruction Summary

This is all achieved using constraint-based analysis: • Decompilation is another application of program each target instruction generates constraints on the analysis and transformation types of the registers, and we then solve these Compilation discards lots of information about constraints in order to assign types at the source level. • programs, some of which can be recovered Typing information is often incomplete • Loops can be identified by using dominator trees intraprocedurally (as in the example); constraints Other control structure can also be recovered generated at call sites help to fill in the gaps. • • Types can be partially reconstructed with constraint- We can also infer unions, structs, etc. based analysis