1 Lecture 20 Pointer Analysis

1 Lecture 20 Pointer Analysis

Pros and Cons of Pointers • Many procedural languages have pointers Lecture 20 – e.g., C or C++: int *p = &x; • Pointers are powerful and convenient Pointer Analysis – can build arbitrary data structures • Pointers can also hinder compiler optimization – hard to know where pointers are pointing • Basics – must be conservative in their presence • Design Options • Has inspired much research • Pointer Analysis Algorithms – analyses to decide where pointers are pointing – many options and trade-offs • Pointer Analysis Using BDDs – open problem: a scalable accurate analysis • Probabilistic Pointer Analysis (Slide content courtesy of Greg Steffan, U. of Toronto) Carnegie Mellon Carnegie Mellon Todd C. Mowry 15-745: Pointer Analysis 1 15-745: Pointer Analysis 2 Todd C. Mowry Pointer Analysis Basics: Aliases The Pointer Alias Analysis Problem • Two variables are aliases if: • Decide for every pair of pointers at every program point: – they reference the same memory location – do they point to the same memory location? • More useful: • A difficult problem – prove variables reference different locations – shown to be undecidable by Landi, 1992 • Correctness: Alias sets: – report all pairs of pointers which do/may alias • Ambiguous: – int x,y; two pointers which may or may not alias • Accuracy/Precision: int *p = &x; – how few pairs of pointers are reported while remaining correct int *q = &y; – ie., reduce ambiguity to improve accuracy int *r = p; int **s = &q; Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 3 Todd C. Mowry 15-745: Pointer Analysis 4 Todd C. Mowry 1 Many Uses of Pointer Analysis Challenges for Pointer Analysis • Basic compiler optimizations • Complexity: huge in space and time – register allocation, CSE, dead code elimination, live variables, – compare every pointer with every other pointer instruction scheduling, loop invariant code motion, redundant load/ – at every program point store elimination – potentially considering all program paths to that point • Parallelization • Scalability vs accuracy trade-off – instruction-level parallelism – different analyses motivated for different purposes – thread-level parallelism – many useful algorithms (adds to confusion) • Behavioral synthesis • Coding corner cases – automatically converting C-code into gates – pointer arithmetic (*p++), casting, function pointers, long-jumps • Error detection and program understanding • Whole program? – memory leaks, wild pointers, security holes – most algorithms require the entire program – library code? optimizing at link-time only? Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 5 Todd C. Mowry 15-745: Pointer Analysis 6 Todd C. Mowry Pointer Analysis: Design Options Representation • Representation • Track pointer aliases • Heap modeling – <*a, b>, <*a, e>, <b, e>, • Aggregate modeling <**a, c>, <**a, d>, … • Flow sensitivity – More precise, less efficient • Context sensitivity • Track points-to information – <a, b>, <b, c>, <b, d>, <e, c>, <e, d> – Less precise, more efficient a = &b; b = &c; b = &d; e = b; Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 7 Todd C. Mowry 15-745: Pointer Analysis 8 Todd C. Mowry 2 Heap Modeling Options Aggregate Modeling Options • Heap merged Arrays Structures – i.e. “no heap modeling” • Allocation site (any call to malloc/calloc) Elements are treated Elements are treated – … Consider each to be a unique location as individual locations as individual locations – Doesn’t differentiate between multiple objects allocated by the … (“field sensitive”) same allocation site or • Shape analysis Treat entire array – Recognize linked lists, trees, DAGs, etc. as a single location or or … Treat first element Treat entire structure separate from others as a single location Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 9 Todd C. Mowry 15-745: Pointer Analysis 10 Todd C. Mowry Flow Sensitivity Options Flow Sensitivity Example • Flow insensitive (assuming allocation-site heap modeling) – The order of statements doesn’t matter Flow Insensitive • Result of analysis is the same regardless of statement order aS7 à – Uses a single global state to store results as they are computed S1: a = malloc(…); – Not very accurate S2: b = malloc(…); • Flow sensitive S3: a = b; S4: a = malloc(…); – The order of the statements matter Flow Sensitive S5: if(c) à – Need a control flow graph a = b; aS7 – Must store results for each program point S6: if(!c) – Improves accuracy a = malloc(…); • S7: … = *a; Path sensitive Path Sensitive – Each path in a control flow graph is considered aS7 à Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 11 Todd C. Mowry 15-745: Pointer Analysis 12 Todd C. Mowry 3 Context Sensitivity Options Pointer Alias Analysis Algorithms • Context insensitive/sensitive References: – whether to consider different calling contexts • “Points-to analysis in almost linear time”, Steensgaard, POPL 1996 – e.g., what are the possibilities for p at S6? • “Program Analysis and Specialization for the C Programming Language”, Andersen, Technical Report, 1994 int f() Context Insensitive: • “Context-sensitive interprocedural points-to analysis in the presence of { function pointers”, Emami et al., PLDI 1994 S4: p = &b; int a, b, *p; • “Pointer analysis: haven't we solved this problem yet?”, Hind, PASTE S5: g(); int main() 2001 } { • “Which pointer analysis should I use?”, Hind et al., ISSTA 2000 S1: f(); S2: p = &a; Context Sensitive: S3: g(); } int g() { S6: … = *p; } Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 13 Todd C. Mowry 15-745: Pointer Analysis 14 Todd C. Mowry Address Taken Address Taken Example • Basic, fast, ultra-conservative algorithm T *p, *q, *r; – flow-insensitive, context-insensitive – often used in production compilers int main() { void f() { g(T **fp) { S1: p = alloc(T); S6: q = alloc(T); T local; • Algorithm: if(…) f(); g(&q); – Generate the set of all variables whose addresses are assigned to s9: p = &local; g(&p); S8: r = alloc(T); another variable. } S4: p = alloc(T); } – Assume that any pointer can potentially point to any variable in that S5: … = *p; set. } • Complexity: O(n) - linear in size of program • Accuracy: very imprecise pS5 = Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 15 Todd C. Mowry 15-745: Pointer Analysis 16 Todd C. Mowry 4 Andersen’s Algorithm Andersen Example • Flow-insensitive, context-insensitive, iterative T *p, *q, *r; • Representation: – one points-to graph for entire program int main() { void f() { g(T **fp) { S1: p = alloc(T); S6: q = alloc(T); T local; – each node represents exactly one location if(…) f(); g(&q); • For each statement, build the points-to graph: s9: p = &local; g(&p); S8: r = alloc(T); } y = &x y points-to x S4: p = alloc(T); } S5: … = *p; y = x if x points-to w } then y points-to w *y = x if y points-to z and x points-to w then z points-to w y = *x if x points-to z and z points-to w then y points-to w p = • Iterate until graph no longer changes S5 • Worst case complexity: O(n3), where n = program size Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 17 Todd C. Mowry 15-745: Pointer Analysis 18 Todd C. Mowry Steensgaard’s Algorithm Steensgaard Example • Flow-insensitive, context-insensitive T *p, *q, *r; • Representation: – a compact points-to graph for entire program int main() { void f() { g(T **fp) { T local; • each node can represent multiple locations S1: p = alloc(T); S6: q = alloc(T); if(…) f(); g(&q); • but can only point to one other node s9: p = &local; g(&p); S8: r = alloc(T); – i.e. every node has a fan-out of 1 or 0 } S4: p = alloc(T); • union-find data structure implements fan-out } S5: … = *p; – “unioning” while finding eliminates need to iterate } • Worst case complexity: O(n) • Precision: less precise than Andersen’s pS5 = Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 19 Todd C. Mowry 15-745: Pointer Analysis 20 Todd C. Mowry 5 Example with Flow Sensitivity Pointer Analysis Using BDDs T *p, *q, *r; References: • “Cloning-based context-sensitive pointer alias analysis using binary int main() { void f() { g(T **fp) { decision diagrams”, Whaley and Lam, PLDI 2004 S1: p = alloc(T); S6: q = alloc(T); T local; • “Symbolic pointer analysis revisited”, Zhu and Calman, PDLI 2004 if(…) f(); g(&q); s9: p = &local; • “Points-to analysis using BDDs”, Berndl et al, PDLI 2003 g(&p); S8: r = alloc(T); } S4: p = alloc(T); } S5: … = *p; } pS5 = pS9 = Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 21 Todd C. Mowry 15-745: Pointer Analysis 22 Todd C. Mowry Binary Decision Diagram (BDD) BDD-Based Pointer Analysis • Use a BDD to represent transfer functions – encode procedure as a function of its calling context – compact and efficient representation • Perform context-sensitive, inter-procedural analysis – similar to dataflow analysis – but across the procedure call graph • Gives accurate results – and scales up to large programs Binary Decision Tree Truth Table BDD Carnegie Mellon Carnegie Mellon 15-745: Pointer Analysis 23 Todd C. Mowry 15-745: Pointer Analysis 24 Todd C. Mowry 6 Probabilistic Pointer Analysis Pointer Analysis: Yes, No, & Maybe References: • “A Probabilistic Pointer Analysis for Speculative Optimizations”, DaSilva and Steffan, ASPLOS 2006 *a = ~ optimize • “Compiler support for speculative multithreading architecture with ~ = *b probabilistic points-to analysis”, Shen et al., PPoPP 2003 Pointer Definitely • “Speculative Alias Analysis for Executable Code”, Fernandez and Analysis Definitely Not Espasa, PACT 2002 • “A General Compiler Framework for Speculative

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us