Pros and Cons of Pointers

• Many procedural languages have pointers Lecture 27 – e.g., C or C++: int *p = &x; • Pointers are powerful and convenient Pointer Analysis – can build arbitrary data structures • Pointers can also hinder optimization – hard to know where pointers are pointing • Basics – must be conservative in their presence • Design Options • Has inspired much research • Pointer Analysis Algorithms – analyses to decide where pointers are pointing – many options and trade-offs • Pointer Analysis Using BDDs – open problem: a scalable accurate analysis • Probabilistic Pointer Analysis

(Slide content courtesy of Greg Steffan, U. of Toronto)

Carnegie Mellon Carnegie Mellon Todd C. Mowry 15-745: Pointer Analysis 1 15-745: Pointer Analysis 2 Todd C. Mowry

Pointer Analysis Basics: Aliases The Pointer Problem

• Two variables are aliases if: • Decide for every pair of pointers at every program point: – they reference the same memory location – do they point to the same memory location? • More useful: • A difficult problem – prove variables reference different locations – shown to be undecidable by Landi, 1992 • Correctness: Alias sets: – report all pairs of pointers which do/may alias • Ambiguous: int x,y; – two pointers which may or may not alias • Accuracy/Precision: int *p = &x; – how few pairs of pointers are reported while remaining correct int *q = &y; – ie., reduce ambiguity to improve accuracy int *r = p; int **s = &q;

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 3 Todd C. Mowry 15-745: Pointer Analysis 4 Todd C. Mowry

1 Many Uses of Pointer Analysis Challenges for Pointer Analysis

• Basic compiler optimizations • Complexity: huge in space and time – , CSE, , live variables, – compare every pointer with every other pointer , loop invariant code motion, redundant – at every program point load/store elimination – potentially considering all program paths to that point • Parallelization • Scalability vs accuracy trade-off – instruction-level parallelism – different analyses motivated for different purposes – thread-level parallelism – many useful algorithms (adds to confusion) • Behavioral synthesis • Coding corner cases – automatically converting C-code into gates – pointer arithmetic (*p++), casting, function pointers, long-jumps • Error detection and program understanding • Whole program? – memory leaks, wild pointers, security holes – most algorithms require the entire program – library code? optimizing at link-time only?

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 5 Todd C. Mowry 15-745: Pointer Analysis 6 Todd C. Mowry

Pointer Analysis: Design Options Representation

• Representation • Track pointer aliases • Heap modeling – <*a, b>, <*a, e>, , • Aggregate modeling <**a, c>, <**a, d>, … • Flow sensitivity – More precise, less efficient • Context sensitivity • Track points-to information – , , , , – Less precise, more efficient

a&ba = &b; b = &c; b = &d; e = b;

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 7 Todd C. Mowry 15-745: Pointer Analysis 8 Todd C. Mowry

2 Heap Modeling Options Aggregate Modeling Options

• Heap merged Arrays Structures – i.e. “no heap modeling” • Allocation site (any call to malloc/calloc) Elements are treated Elements are treated – Consider each to be a unique location … as individual locations as individual locations – Doesn’t differentiate between multiple objects allocated by the … (“field sensitive”) same allocation site or • Shape analysis Treat entire array – Recognize linked lists, trees, DAGs, etc. as a single location or or

… Treat first element Treat entire structure separate from others as a single location

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 9 Todd C. Mowry 15-745: Pointer Analysis 10 Todd C. Mowry

Flow Sensitivity Options Flow Sensitivity Example

• Flow insensitive (assuming allocation-site heap modeling) – The order of statements doesn’t matter Flow Insensitive • Result of analysis is the same regardless of statement order aS7  – Uses a si ngl e gl ob al st at e to st ore result s as they are comput ed S11: a = malloc()(…); – Not very accurate S2: b = malloc(…); • Flow sensitive S3: a = b; S4: a = malloc(…); – The order of the statements matter Flow Sensitive S5: if(c)  – Need a control flow graph a = b; aS7 – Must store results for each program point S6: if(!c) – Improves accuracy a = malloc(…); • S7: … = *a; PhPath sensiiitive Path Sensitive – Each path in a control flow graph is considered aS7 

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 11 Todd C. Mowry 15-745: Pointer Analysis 12 Todd C. Mowry

3 Context Sensitivity Options Pointer Alias Analysis Algorithms

• Context insensitive/sensitive References: – whether to consider different calling contexts • “Points-to analysis in almost linear time , Steensgaard, POPL 1996 – e.g., what are the possibilities for p at S6? • “Program Analysis and Specialization for the C Programming Language , An der sen, Tec hni ca l Repor t, 1994 int f() Context Insensitive: • “Context-sensitive interprocedural point { , Emami et al., PLDI 1994 S4: p = &b; function pointers ” int a, b, *p; • “Pointer analysis: haven't we solved this problem yet? , Hind, PASTE S5: g(); int main() 2001 } { • “Which pointer analysis” should I use? , Hind et al., ISSTA 2000 S1: f(); s-to analysis in the presence of S2: p = &a; Context Sensitive: S3: g(); ” } int g() { S6: … = *p; ” ” }

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 13 Todd C. Mowry 15-745: Pointer Analysis 14 Todd C. Mowry

Address Taken Address Taken Example

• Basic, fast, ultra-conservative algorithm T *p, *q, *r; – flow-insensitive, context-insensitive – often used in production int main() { void f() { g(T **fp) { S1: p = alloc(T); S6: q = alloc(T); T local; • Algorithm: if(…) f(); g(&q); – Generate the set of all variables whose addresses are assigned to s9: p = &local; g(&p); S8: r = alloc(T); another variable. } S4: p = alloc(T); } – Assume that any pointer can potentially point to any variable in that S5: … = *p; set. } • Complexity: O(n) - linear in size of program • Accuracy: very imprecise

pS5 =

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 15 Todd C. Mowry 15-745: Pointer Analysis 16 Todd C. Mowry

4 Anderson’s Algorithm Anderson Example

• Flow-insensitive, context-insensitive, iterative T *p, *q, *r; • Representation: – one points-to graph for entire program int main() { void f() { g(T **fp) { S1: p = alloc(T); S6: q = alloc(T); T local; – each node represents exactly one location if(…) f(); g(&q); • s9: p = &local; For each statement, build the points-to graph: g(&p); S8: r = alloc(T); } y = &x y points-to x S4: p = alloc(T); } S5: … = *p; y = x if x points-to w } then y points-to w *y = x if y points-to z and x points-to w then z points-to w y = *x if x points-to z and z points-to w then y points-to w p = • Iterate until graph no longer changes S5 • Worst case complexity: O(n3), where n = program size Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 17 Todd C. Mowry 15-745: Pointer Analysis 18 Todd C. Mowry

Steensgaard’s Algorithm Steensgaard Example

• Flow-insensitive, context-insensitive T *p, *q, *r; • Representation: – a compact points-to graph for entire program int main() { void f() { g(T **fp) { T local; • S1: p = alloc(T); S6: q = alloc(T); each node can represent mullltiple locations if(…) f(); g(&q); • but can only point to one other node s9: p = &local; g(&p); – i.e. every node has a fan-out of 1 or 0 S8: r = alloc(T); } S4: p = alloc(T); } • union-find data structure implements fan-out S5: … = *p; – “unioning” while finding eliminates need to iterate } • Worst case complexity: O(n) • Precision: less precise than Anderson’s

pS5 =

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 19 Todd C. Mowry 15-745: Pointer Analysis 20 Todd C. Mowry

5 Example with Flow Sensitivity Pointer Analysis Using BDDs

T *p, *q, *r; References: • “Cloning-based context-sensitive po int main() { void f() { g(T **fp) { decision diagrams”, Whaley and Lam, PLDI 2004 T local; S1: p = alloc(T); S6: q = alloc(T); • “S b liym bi to lic l sis po int isiter d” analysis revisited”, Zhu and ClCalman, PDLI 2004 if(…) f(); g(&q); s9: p = &local; • “Points-to analysis using BDDs”, Berndl et al, PDLI 2003 g(&p); S8: r = alloc(T); } S4: p = alloc(T); } inter alias analysis using binary S5: … = *p; }

pS5 = pS9 =

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 21 Todd C. Mowry 15-745: Pointer Analysis 22 Todd C. Mowry

Binary Decision Diagram (BDD) BDD-Based Pointer Analysis

• Use a BDD to represent transfer functions – encode procedure as a function of its calling context – compact and efficient representation • Perform context-sensitive, inter-procedural analysis – similar to dataflow analysis – but across the procedure call graph • Gives accurate results – and scales up to large programs

Binary Decision Tree Truth Table BDD

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 23 Todd C. Mowry 15-745: Pointer Analysis 24 Todd C. Mowry

6 Probabilistic Pointer Analysis Pointer Analysis: Yes, No, & Maybe

References: • “A Probabilistic Pointer Analysis for Speculative Optimizations”, DaSilva and Steffan, ASPLOS 2006 *a = ~ optimize • “C il t s ompf l tis ltith il dier hit itht suppor t for speculative multithreading architecture with ~=*b~ = *b probabilistic points-to analysis”, Shen et al., PPoPP 2003 Pointer Definitely • “Speculative Alias Analysis for Executable Code”, Fernandez and Analysis Definitely Not Espasa, PACT 2002 • “A General Compiler Framework fo Maybe Data Speculative Code Motion”, Dai et al., CGO 2005 *a = ~ ~ = *b • “Speculative register promotion us (ALAT)”, Lin et al., CGO 2003 r Speculative Optimizations Using • Do pointers a and b point to the same location? – Repeat for every pair of pointers at every program point ing Advanced Load Address Table • How can we optimize the “maybe” cases?

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 25 Todd C. Mowry 15-745: Pointer Analysis 26 Todd C. Mowry

Let’s Speculate Data Speculative Optimizations

• Implement a potentially unsafe optimization • EPIC Instruction sets – Verify and Recover if necessary – Support for speculative load/store instructions (e.g., Itanium) • Speculative compiler optimizations int *a, x; int *a, x, tmp; – elimination, redundancy elimination, copy propagation, … … , register promotion while(…) tmp = *a; • Thread-level speculation (TLS) { while(…) – Hardware and compiler support for speculative parallel threads x = *a; { • Transactional programming … x = tmp; – Hardware and software support for speculative parallel } a is probably … transactions loop invariant } Heavy reliance on

detailed profile feedback

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 27 Todd C. Mowry 15-745: Pointer Analysis 28 Todd C. Mowry

7 Can We Quantify “Maybe”? Conventional Pointer Analysis

• Estimate the potential benefit for speculating:

Recovery Maybe *a = ~ optimize penalty Overhead ~=*b~ = *b (if unsuccessful) for verify Probability Pointer Definitelyp = 1.0 of success Analysis Definitelyp = 0.0 Not Expected 0.0Maybe < p < 1.0 speedup *a = ~ ~ = *b (if successful) SPECULATE?

• Do pointers a and b point to the same location? – Repeat for every pair of pointers at every program point

Ideally “maybe” should be a probability.

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 29 Todd C. Mowry 15-745: Pointer Analysis 30 Todd C. Mowry

Probabilistic Pointer Analysis PPA Research Objectives

• Accurate points-to probability information – at every static pointer dereference *a = ~ optimize • Scalable analysis ~=*b~ = *b Probabilistic – Goal: entire SPEC integer benchmark suite Definitelyp = 1.0 Pointer • Understand scalability/accuracy tradeoff Analysis Definitelyp = 0.0 Not – through flexible static memory model 0.0Maybe < p < 1.0 *a = ~ ~ = *b

Improve our understanding of programs • Potential advantage of Probabilistic Pointer Analysis: – it doesn’t need to be safe

Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 31 Todd C. Mowry 15-745: Pointer Analysis 32 Todd C. Mowry

8 Algorithm Design Choices Traditional Points-To Graph

Fixed: int x, y, z, *b = &x; Definitely • Bottom Up / Top Down Approach void foo(int *a) { = pointer = • Linear transfer functions (for scalability) • One-level context and flow sensitive if(… ) = poidinted at = MbMaybe b = &y; Flexible: if(…) b a • Edge profiling (or static prediction) a = &z; • Safe (or unsafe) else(…) • Field sensitive (or field insensitive) a = b; x y z UND while(…) { x = *a; … } Results are inconclusive } Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 33 Todd C. Mowry 34

Probabilistic Points-To Graph Probabilistic Pointer Analysis Results Summary int x, y, z, *b = &x; • Matrix-based, transfer function approach p = 1.0 void foo(int *a) { = pointer = – SUIF/Matlab implementation • Scales to the SPECint 95/2000 benchmarks 001.1 taken(edge profile) p if(… ) = poidinted at = 000.0

9 Pointer Analysis Summary

• Pointers are hard to understand at compile time! – accurate analyses are large and complex • Many different options: – Representation, heap modeling, aggregate modeling, flow sensitivity, context sensitivity • Many algorithms: – Address-taken, Steensgarde, Anderson, Emami – BDD-based, probabilistic • Many trade-offs: – space, time, accuracy, safety • Choose the ri ght type of anal ysi s gi ven how the inf ormati on will be use d

Carnegie Mellon 15-745: Pointer Analysis 37 Todd C. Mowry

10