Pointer Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Full text available at: http://dx.doi.org/10.1561/2500000014 Pointer Analysis Yannis Smaragdakis University of Athens [email protected] George Balatsouras University of Athens [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/2500000014 Foundations and Trends R in Programming Languages Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is Y. Smaragdakis and G. Balatsouras. Pointer Analysis. Foundations and Trends R in Programming Languages, vol. 2, no. 1, pp. 1–69, 2015. R This Foundations and Trends issue was typeset in LATEX using a class file designed by Neal Parikh. Printed on acid-free paper. ISBN: 978-1-68083-020-0 c 2015 Y. Smaragdakis and G. Balatsouras All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen- ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com For those organizations that have been granted a photocopy license, a separate system of payment has been arranged. Authorization does not extend to other kinds of copy- ing, such as that for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. In the rest of the world: Permission to pho- tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA; Tel. +1 781 871 0245; www.nowpublishers.com; [email protected] now Publishers Inc. has an exclusive license to publish this material worldwide. Permission to use this content must be obtained from the copyright license holder. Please apply to now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail: [email protected] Full text available at: http://dx.doi.org/10.1561/2500000014 Foundations and Trends R in Programming Languages Volume 2, Issue 1, 2015 Editorial Board Editor-in-Chief Mooly Sagiv Tel Aviv University Israel Editors Martín Abadi Robert Harper Ganesan Ramalingam Microsoft Research & CMU Microsoft Research UC Santa Cruz Tim Harris Mooly Sagiv Anindya Banerjee Oracle Tel Aviv University IMDEA Fritz Henglein Davide Sangiorgi Patrick Cousot University of Copenhagen University of Bologna ENS Paris & NYU Rupak Majumdar David Schmidt Oege De Moor MPI-SWS & UCLA Kansas State University University of Oxford Kenneth McMillan Peter Sewell Matthias Felleisen Microsoft Research University of Cambridge Northeastern University J. Eliot B. Moss Scott Stoller John Field UMass, Amherst Stony Brook University Google Andrew C. Myers Peter Stuckey Cormac Flanagan Cornell University University of Melbourne UC Santa Cruz Hanne Riis Nielson Jan Vitek Philippa Gardner TU Denmark Purdue University Imperial College Peter O’Hearn Philip Wadler Andrew Gordon UCL University of Edinburgh Microsoft Research & Benjamin C. Pierce David Walker University of Edinburgh UPenn Princeton University Dan Grossman Andrew Pitts Stephanie Weirich University of Washington University of Cambridge UPenn Full text available at: http://dx.doi.org/10.1561/2500000014 Editorial Scope Topics Foundations and Trends R in Programming Languages publishes survey and tutorial articles in the following topics: • Abstract interpretation • Programming languages for concurrency • Compilation and interpretation techniques • Programming languages for • Domain specific languages parallelism • Formal semantics, including • Program synthesis lambda calculi, process calculi, and process algebra • Program transformations and optimizations • Language paradigms • Mechanical proof checking • Program verification • Memory management • Runtime techniques for • Partial evaluation programming languages • Program logic • Software model checking • Programming language • Static and dynamic program implementation analysis • Programming language security • Type theory and type systems Information for Librarians Foundations and Trends R in Programming Languages, 2015, Volume 2, 4 issues. ISSN paper version 2325-1107. ISSN online version 2325-1131. Also available as a combined paper and online subscription. Full text available at: http://dx.doi.org/10.1561/2500000014 Foundations and Trends R in Programming Languages Vol. 2, No. 1 (2015) 1–69 c 2015 Y. Smaragdakis and G. Balatsouras DOI: 10.1561/2500000014 Pointer Analysis Yannis Smaragdakis George Balatsouras University of Athens University of Athens [email protected] [email protected] Full text available at: http://dx.doi.org/10.1561/2500000014 Contents 1 Introduction 2 2 Core Pointer Analysis 5 2.1 Andersen-Style Points-To Analysis, Declaratively . 7 2.2 Other Approaches . 14 3 Analysis of Realistic Languages 18 3.1 Arrays and Other Language Features . 18 3.2 Exception Analysis . 20 3.3 Reflection Analysis . 23 4 Context Sensitivity 29 4.1 Context-Sensitive Analysis Model . 30 4.2 Call-Site Sensitivity . 34 4.3 Object Sensitivity . 36 4.4 Discussion . 39 5 Flow Sensitivity, Must-Analysis, and ... Pointers 43 5.1 Flow Sensitivity . 43 5.2 Must-Analysis . 45 5.3 Other Directions . 52 ii Full text available at: http://dx.doi.org/10.1561/2500000014 iii 6 Conclusions 59 Acknowledgments 61 References 62 Full text available at: http://dx.doi.org/10.1561/2500000014 Abstract Pointer analysis is a fundamental static program analysis, with a rich literature and wide applications. The goal of pointer analysis is to com- pute an approximation of the set of program objects that a pointer variable or expression can refer to. We present an introduction and survey of pointer analysis tech- niques, with an emphasis on distilling the essence of common analysis algorithms. To this end, we focus on a declarative presentation of a com- mon core of pointer analyses: algorithms are modeled as configurable, yet easy-to-follow, logical specifications. The specifications serve as a starting point for a broader discussion of the literature, as independent threads spun from the declarative model. Y. Smaragdakis and G. Balatsouras. Pointer Analysis. Foundations and Trends R in Programming Languages, vol. 2, no. 1, pp. 1–69, 2015. DOI: 10.1561/2500000014. Full text available at: http://dx.doi.org/10.1561/2500000014 1 Introduction Pointer analysis or points-to analysis is a static program analysis that determines information on the values of pointer variables or expres- sions. Such information offers a static model of a program’s heap. Since the heap is the primary structure for global program data, pointer anal- ysis forms the substrate of most inter-procedural static analyses. Vir- tually all interesting questions one may want to ask of a program will eventually need to query the possible values of a pointer expression, or its relationship to other pointer expressions. The exact representation of such information, i.e., the static abstraction used to model the heap, often serves as a classifier of the analysis algorithm. Although the litera- ture is not entirely consistent on high-level terminology, pointer analysis is a near-synonym of alias analysis. Whereas, however, pointer/points- to analysis typically tries to model heap objects and asks “what objects can a variable point to?”, alias analysis algorithms focus on the closely related question of “can a pair of variables or expressions be aliases, i.e., point to the same object?” [Landi and Ryder, 1992, Emami et al., 1994] In this monograph, we attempt to survey the most common modern approaches to pointer analysis, with an eye towards ease of exposition 2 Full text available at: http://dx.doi.org/10.1561/2500000014 3 and concreteness. Our presentation aspires to be rather more tutorial and hands-on than other surveys of the pointer analysis area. For a thorough view of the literature, with an emphasis on coverage, readers can consult Hind [2001], Ryder [2003], Sridharan et al. [2013], and Kanvar and Khedker [2014]. Our tutorial will develop, in significant detail, a modular, config- urable model of a standard points-to analysis. This analysis model is a skeleton on which we progressively add more flesh, to reflect several realistic features and analysis enhancements from the recent literature. The analysis model will also serve as a firm basis for high-level tangen- tial discussions on topics that deviate from the model: alias analysis, complexity theory results, algorithms not captured well by the formal model, and more. Importantly, our analysis model is executable: The specification of our algorithms is given in Datalog, which is simultaneously a logic and a realistic programming language. The use of Datalog allows us to express the precision aspects of pointer analyses concisely, at almost the same high level as a mathematical formalism, yet with no need to separately treat the topic of how to implement the algorithms so that they perform efficiently. The axes of precision and performance/efficiency characterize every