CIS 341: COMPILERS Announcements

Lecture 24 CIS 341: COMPILERS Announcements • HW6: Analysis & Optimizations – Alias analysis, constant propagation, dead code elimination, register allocation – Due: TOMORROW at Midnight • Final Exam: – Date: Friday, May 4th – Time: noon – 2:00pm – Location: Moore 216 – Coverage: cumulative, but emphasis on material since the midterm Zdancewic CIS 341: Compilers 2 Phi nodes Alloc “promotion” Register allocation ! REVISITING SSA Zdancewic CIS 341: Compilers 3 Dominator Trees • Domination is transitive: – if A dominates B and B dominates C then A dominates C • Domination is anti-symmetric: – if A dominates B and B dominates A then A = B • Every flow graph has a dominator tree – The Hasse diagram of the dominates relation 1 1 2 2 3 4 3 4 5 6 5 6 7 8 7 8 9 0 9 0 CIS 341: Compilers 4 Phi Functions • Solution: φ functions – Fictitious operator, used only for analysis • implemented by Mov at x86 level – Chooses among different versions of a variable based on the path by which control enters the phi node.! %uid = phi <ty> v1, <label1>, … , vn, <labeln> entry: int y = … %y1 = … int x = … %x1 = … int z = … %z1 = … %p = icmp … if (p) { br i1 %p, label %then, label %else x = y + 1; then: } else { %x2 = add i64 %y1, 1 br label %merge x = y * 2; else: } %x3 = mult i64 %y1, 2 z = x + 3; merge: %x4 = phi i64 %x2, %then, %x3, %else %z2 = %add i64 %x4, 3 Zdancewic CIS 341: Compilers 5 Alloca Promotion • Not all source variables can be allocated to registers – If the address of the variable is taken (as permitted in C, for example) – If the address of the variable “escapes” (by being passed to a function) • An alloca instruction is called promotable if neither of the two conditions above holds entry: %x = alloca i64 // %x cannot be promoted %y = call malloc(i64 8) %ptr = bitcast i8* %y to i64** store i65** %ptr, %x // store the pointer into the heap entry: %x = alloca i64 // %x cannot be promoted %y = call foo(i64* %x) // foo may store the pointer into the heap • Happily, most local variables declared in source programs are promotable – That means they can be register allocated Zdancewic CIS 341: Compilers 6 Phi Placement Alternative • Less efficient, but easier to understand: • Place phi nodes "maximally" (i.e. at every node with > 2 predecessors) • If all values flowing into phi node are the same, then eliminate it: %x = phi t %y, %pred1 t %y %pred2 … t %y %predK // code that uses %x ⇒ // code with %x replaced by %y • Interleave with other optimizations – copy propagation – constant propagation – etc. Zdancewic CIS 341: Compilers 7 Example SSA Optimizations l : %p = alloca i64 1 Find • How to place phi store 0, %p alloca %b = %y > 0 nodes without breaking SSA? br %b, %l2, %l3 max φs • Note: the “real” LAS/ implementation l2: combines many of these LAA steps into one pass. store 1, %p – Places phis directly at the dominance frontier br %l3 DSE • This example also illustrates other common DAE optimizations: l3: – Load after store/alloca %x = load %p – Dead store/alloca ret %x elim φs elimination Example SSA Optimizations l : %p = alloca i64 1 Find • How to place phi store 0, %p alloca %b = %y > 0 nodes without %x1 = load %p breaking SSA? br %b, %l2, %l3 max φs • Insert LAS/ l2: LAA – Loads at the store 1, %p end of each %x = load %p 2 block br %l3 DSE DAE l3: %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • How to place phi store 0, %p alloca %b = %y > 0 nodes without %x1 = load %p breaking SSA? br %b, %l2, %l3 max φs • Insert LAS/ l2: %x3 = φ[%x1,%l1] LAA – Loads at the store 1, %p end of each %x = load %p 2 block br %l3 DSE – Insert φ-nodes at each block DAE l3: %x4 = φ[%x1;%l1, %x2:%l2] %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • How to place phi store 0, %p alloca %b = %y > 0 nodes without %x1 = load %p breaking SSA? br %b, %l2, %l3 max φs • Insert l : %x = φ[%x ,%l ] LAS/ 2 3 1 1 – Loads at the store %x3, %p LAA store 1, %p end of each %x = load %p 2 block br %l3 DSE – Insert φ-nodes at each block DAE l3: %x4 = φ[%x1;%l1, %x2:%l2] – Insert stores store %x4, %p %x = load %p after φ-nodes ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): %x = load %p 1 – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[%x1,%l1] store %x3, %p LAA – Remove the load store 1, %p %x2 = load %p br %l3 DSE DAE l3: %x4 = φ[%x1;%l1, %x2:%l2] store %x4, %p %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): %x = load %p 1 – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[%x1,%l1] store %x3, %p LAA – Remove the load store 1, %p %x2 = load %p br %l3 DSE DAE l3: %x4 = φ[%x1;%l1, %x2:%l2] store %x4, %p %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): %x = load %p 1 – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[0,%l1] store %x3, %p LAA – Remove the load store 1, %p %x2 = load %p br %l3 DSE DAE l3: %x4 = φ[0;%l1, %x2:%l2] store %x4, %p %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[0,%l1] store %x3, %p LAA – Remove the load store 1, %p %x2 = load %p br %l3 DSE DAE l3: %x4 = φ[0;%l1, %x2:%l2] store %x4, %p %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[0,%l1] store %x3, %p LAA – Remove the load store 1, %p %x2 = load %p br %l3 DSE DAE l3: %x4 = φ[0;%l1, 1:%l2] store %x4, %p %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[0,%l1] store %x3, %p LAA – Remove the load store 1, %p br %l3 DSE DAE l3: %x4 = φ[0;%l1, 1:%l2] store %x4, %p %x = load %p ret %x elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • For loads after store 0, %p alloca %b = %y > 0 stores (LAS): – Substitute all br %b, %l , %l 2 3 max φs uses of the load by the value LAS/ being stored l2: %x3 = φ[0,%l1] store %x3, %p LAA – Remove the load store 1, %p br %l3 DSE DAE l3: %x4 = φ[0;%l1, 1:%l2] store %x4, %p %x = load %p ret %x4 elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • Dead Store store 0, %p alloca %b = %y > 0 Elimination (DSE) – Eliminate all br %b, %l , %l 2 3 max φs stores with no subsequent LAS/ loads. l2: %x3 = φ[0,%l1] store %x3, %p LAA store 1, %p • Dead Alloca br %l DSE 3 Elimination (DAE) – Eliminate all allocas with no l : %x = φ[0;%l , 1:%l ] DAE 3 4 1 2 subsequent store %x4, %p loads/stores. ret %x4 elim φs Example SSA Optimizations l : %p = alloca i64 1 Find • Dead Store store 0, %p alloca %b = %y > 0 Elimination (DSE) – Eliminate all br %b, %l , %l 2 3 max φs stores with no subsequent LAS/ loads. l2: %x3 = φ[0,%l1] store %x3, %p LAA store 1, %p • Dead Alloca br %l DSE 3 Elimination (DAE) – Eliminate all allocas with no l : %x = φ[0;%l , 1:%l ] DAE 3 4 1 2 subsequent store %x4, %p loads/stores. ret %x4 elim φs Example SSA Optimizations l : 1 Find • Eliminate φ nodes: alloca %b = %y > 0 – Singletons – With identical br %b, %l2, %l3 max φs values from each l : %x = [0,%l ] LAS/ 2 3 φ 1 predecessor LAA – See Aycock & Horspool, 2002 br %l3 DSE DAE l3: %x4 = φ[0;%l1, 1:%l2] ret %x4 elim φs Example SSA Optimizations l : 1 Find • Eliminate φ nodes: alloca %b = %y > 0 – Singletons – With identical br %b, %l2, %l3 max φs values from each l : %x = [0,%l ] LAS/ 2 3 φ 1 predecessor LAA br %l3 DSE DAE l3: %x4 = φ[0;%l1, 1:%l2] ret %x4 elim φs Example SSA Optimizations l : 1 Find • Done! alloca %b = %y > 0 br %b, %l2, %l3 max φs LAS/ l2: LAA br %l3 DSE DAE l3: %x4 = φ[0;%l1, 1:%l2] ret %x4 elim φ LLVM Phi Placement • This transformation is also sometimes called register promotion – older versions of LLVM called this “mem2reg” memory to register promotion • In practice, LLVM combines this transformation with scalar replacement of aggregates (SROA) – i.e.

CIS 341: COMPILERS Announcements

Topic 1D Slides

A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion

Effective Representation of Aliases and Indirect Memory Operations in SSA Form

Compiler Construction

Techniques for Effectively Exploiting a Zero Overhead Loop Buffer

A Practical System for Intermodule Code Optimization at Link-Time

Compiler Optimization and Code Generation

Denotational Translation Validation

Imit Cuttack

Secure Optimization Through Opaque Observations

Equality Saturation: Engineering Challenges and Applications

Low-Cost Memory Analyses for Efficient Compilers