Pointer Analysis for C Programs Through AST Traversal

Total Page:16

File Type:pdf, Size:1020Kb

Pointer Analysis for C Programs Through AST Traversal Pointer Analysis for C Programs Through AST Traversal Marcio Buss∗ Stephen A. Edwards† Bin Yao Daniel Waddington Department of Computer Science Network Platforms Research Group Columbia University Bell Laboratories, Lucent Technologies New York, NY 10027 Holmdel, NJ 07733 {marcio,sedwards}@cs.columbia.edu {byao,dwaddington}@lucent.com Abstract dismantling the program into increasingly lower-level rep- resentations that deliberately discard most of the original We present a pointer analysis algorithm designed for structure of the source code to simplify its analysis. source-to-source transformations. Existing techniques for By contrast, source-to-source techniques strive to pre- pointer analysis apply a collection of inference rules to a serve everything about the structure of the original source dismantled intermediate form of the source program, mak- so that only minimal, necessary changes are made. As such, ing them difficult to apply to source-to-source tools that they typically manipulate abstract syntax trees that are little generally work on abstract syntax trees to preserve details more than a structured interpretation of the original program of the source program. text. Such trees are often manipulated directly through tree- Our pointer analysis algorithm operates directly on the or term-rewriting systems such as Stratego [15, 16]. abstract syntax tree of a C program and uses a form of stan- In this paper, we present an algorithm developed to per- dard dataflow analysis to compute the desired points-to in- form pointer analysis directly on abstract syntax trees. We formation. We have implemented our algorithm in a source- implemented our algorithm in a source-to-source tool called to-source translation framework and experimental results Proteus [17], which uses Stratego [15] as a back-end, and show that it is practical on real-world examples. find that it works well in practice. 2 Existing Pointer Analysis Techniques 1 Introduction Many techniques have been proposed for pointer analy- The role of pointer analysis in understanding C programs sis of C programs [1, 3, 4, 6, 10, 12, 14, 18]. They differ has been studied for years, being the subject of several PhD mainly in how they group related alias information. Fig- thesis and nearly a hundred research papers [9]. This type ure 1 shows a C fragment and the points-to sets computed of static analysis has been used in a variety of applications by four well-known flow-insensitive algorithms. such as live variable analysis for register allocation and Arrows in the figure represent pointer relationships be- constant propagation, checking for potential runtime errors tween the variables in the head and tail nodes: an arc from (e.g., null pointer dereferencing), static schedulers that need a to b means that variable a points-to variable b, or may to track resource allocation and usage, etc. Despite its appli- point-to that variable, depending on the specific algorithm. cability in several other areas, however, pointer analysis has been targeted primarily at compilation, be it software [9] or hardware [13]. In particular, the use of pointer analysis (and p=&x; p=&y; q=&z; p=q; x=&a; y=&b; z=&c; in fact, static analysis in general) for automated source code transformations remains little explored. Andersen [1] Steensgaard [15] Das [5] Heintze [8] We believe the main reason for this is the different pro- p q p q p q p q gram representations employed in source-to-source tools. x y z x,y,z x,y z x y z Historically, pointer analysis algorithms have been imple- mented in optimizing compilers, which typically proceed by a b c a,b,c a,b,c a b c ∗ supported in part by CNPq Brazilian Research Council, grant number Figure 1. Results of various flow-insensitive 200346/01-6 †supported by an NSF CAREER award, a grant from Intel corporation, pointer analysis algorithms. an award from the SRC, and by New York State’s NYSTAR program Some techniques encapusulate more than one variable in a of these properties depend on the order in which the state- single node, as seen in Steensgaard’s and Das’s approaches, ments of the program execute. As a result, the approach we in order to speed-up the computation. These methods trade adopt is flow-sensitive. precision for running time: variable x, for instance, points- to a, b and c on both techniques, although the code only 2.1 Analysis Accuracy assigns a’s address to x. Broadly, existing techniques can be classified as Another source of approximation commonly found in to- constraint-solving [5, 7, 8] or dataflow-based [6, 11, 12, 18]. day’s approaches is the adoption of the so-called non-visible Members of both groups usually define a minimal gram- variables [10], later renamed to invisible variables [6] or, al- ternatively, extended parameters [18]. When a function call mar for the source language that includes only basic op- 1 erators and statements. They then build templates used to takes place, a parameter p of pointer type might point to a match these statements. The templates are cast as inference variable v that is not in the scope of the called function. To rules [5, 7, 8] or dataflow equations [6, 11, 12, 18]. The al- keep track of such pointer relationships, special symbolic gorithms consist of iterative applications of inference rules names are created in the enclosing scope [6] [10] [18] and or dataflow equations on the statements of the program, dur- then manipulated in place of v whenever p is dereferenced. ing which pointer relationships are derived. This approach When the function call returns to the caller, the information assumes that the C program only contains allowed state- kept in the symbolic name is ’mapped’ back to v. For ex- ample, for a variable x with type int**, symbolic names ments. For instance, a=**b, with two levels of dereference 2 in the right-hand side, is commonly parsed 1 x and 2 x with types int* and int would be created [6] [18]. If an indirect reference, say *x, can lead to an out- = of-scope variable w, the corresponding symbolic name 1 x a is used to represent w. * There are some drawbacks with this approach: it adds * an overhead in the analysis due to this ’mapping’ and ’un- mapping’ of information, and it can become too approxi- b mate as the chain of function calls gets larger. The follow- Existing techniques generally require the preceding ing example shows how spurious aliases can be generated statement to be dismantled into two sub-expressions, each even though symbolic variable 1 a is not accessed within having at most one level of dereference: the called function. int *a, *b; Pointer Relationships = = int main() At P1: At P2: { Mapping: int c,d,e; d t a 1_a => c,d * * if (...) a 1_b => e a = &c; c else b t a = &d; b a e 1_a if (...) b = &c; b It is difficult to employ such an approach to source-to- else 1_b b = &e; source transformations because it is difficult to correlate the /* P1 */ f(); At P3: results calculated on the dismantled program with the origi- } d nal source. Furthermore, it introduces needless intermediate int f() a { /* P2 */ c variables, which can increase the analysis cost. int w; For source-to-source transformations, we want to per- b z = &w; e form the analysis close to the source level. It is particularly } /* P3 */ useful to directly analyze the ASTs and annotate them with the results of the analysis. Hence, we need to be able to Figure 2. Innacuracy due to invisible vari- handle arbitrary compositions of statements. ables. Precision is another issue in source-to-source transfor- mations: we want the most precise analysis practical be- In the example, 1 a stands for more than one program cause otherwise we may make unnecessary changes to the variable, namely c and d, due to the double assignment to code or, even worse, make incorrect changes. A flow- global variable a [6] [18]. When the statement b=&c insensitive analysis cannot, for example, determine that a pointer is initialized before it is used or that a pointer has 1Or global variable different values in different regions of the program. Both 2Process is applied recursively to all levels of pointer type. 2 is analyzed, local variable c has been already mapped to We assume the entire source code of the subject appli- 1 a, thus b is assumed to possibly point to such symbolic cation (multiple translation units, multiple files) is resolved variable. 1 b is then created to represent program variable into a large AST that resides in memory [17], so that we e, and the assignment b = &e induces pointer b to be also are able to jump from one procedure to another through tree associated with 1 b. When f returns, an additional relation- queries. The analysis starts off at the program’s main func- ship between program variables b and d has been created, tion, iterating through its statements. If a function call is even though only local variables were accessed within f .A encountered, its body is recursively analyzed taking into ac- pointer relationship to 1 a was generated because of a rela- count pointers being passed as parameters as well as global tionship to one of the variables 1 a stands for - not all the pointers. When the analysis reaches the end of the function, variables it represents3. it continues at the statement following the function call. The adoption of invisible variables, however, is of rel- Below, we give an overview of some aspects of the im- evant importance if one’s priority is the efficiency of the plementation.
Recommended publications
  • Derivatives of Parsing Expression Grammars
    Derivatives of Parsing Expression Grammars Aaron Moss Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada [email protected] This paper introduces a new derivative parsing algorithm for recognition of parsing expression gram- mars. Derivative parsing is shown to have a polynomial worst-case time bound, an improvement on the exponential bound of the recursive descent algorithm. This work also introduces asymptotic analysis based on inputs with a constant bound on both grammar nesting depth and number of back- tracking choices; derivative and recursive descent parsing are shown to run in linear time and constant space on this useful class of inputs, with both the theoretical bounds and the reasonability of the in- put class validated empirically. This common-case constant memory usage of derivative parsing is an improvement on the linear space required by the packrat algorithm. 1 Introduction Parsing expression grammars (PEGs) are a parsing formalism introduced by Ford [6]. Any LR(k) lan- guage can be represented as a PEG [7], but there are some non-context-free languages that may also be represented as PEGs (e.g. anbncn [7]). Unlike context-free grammars (CFGs), PEGs are unambiguous, admitting no more than one parse tree for any grammar and input. PEGs are a formalization of recursive descent parsers allowing limited backtracking and infinite lookahead; a string in the language of a PEG can be recognized in exponential time and linear space using a recursive descent algorithm, or linear time and space using the memoized packrat algorithm [6]. PEGs are formally defined and these algo- rithms outlined in Section 3.
    [Show full text]
  • Declaring Pointers in Functions C
    Declaring Pointers In Functions C isSchoolgirlish fenestrated Tye or overlain usually moltenly.menace some When babbling Raleigh orsubmersing preappoints his penetratingly. hums wing not Calvinism insidiously Benton enough, always is Normie outdance man? his rectorials if Rodge Please suggest some time binding in practice, i get very complicated you run this waste and functions in the function pointer arguments Also allocated on write in c pointers in declaring function? Been declared in function pointers cannot declare more. After taking these requirements you do some planning start coding. Leave it to the optimizer? Web site in. Functions Pointers in C Programming with Examples Guru99. Yes, here is a complete program written in C, things can get even more hairy. How helpful an action do still want? This title links to the home page. The array of function pointers offers the facility to access the function using the index of the array. If you want to avoid repeating the loop implementations and each branch of the decision has similar code, or distribute it is void, the appropriate function is called through a function pointer in the current state. Can declare functions in function declarations and how to be useful to respective function pointer to the basic concepts of characters used as always going to? It in functions to declare correct to the declaration of your site due to functions on the modified inside the above examples. In general, and you may publicly display copies. You know any type of a read the licenses of it is automatically, as small number types are functionally identical for any of accessing such as student structure? For this reason, every time you need a particular behavior such as drawing a line, but many practical C programs rely on overflow wrapping around.
    [Show full text]
  • Refining Indirect-Call Targets with Multi-Layer Type Analysis
    Where Does It Go? Refining Indirect-Call Targets with Multi-Layer Type Analysis Kangjie Lu Hong Hu University of Minnesota, Twin Cities Georgia Institute of Technology Abstract ACM Reference Format: System software commonly uses indirect calls to realize dynamic Kangjie Lu and Hong Hu. 2019. Where Does It Go? Refining Indirect-Call Targets with Multi-Layer Type Analysis. In program behaviors. However, indirect-calls also bring challenges 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ’19), November 11–15, 2019, to constructing a precise control-flow graph that is a standard pre- London, United Kingdom. ACM, New York, NY, USA, 16 pages. https://doi. requisite for many static program-analysis and system-hardening org/10.1145/3319535.3354244 techniques. Unfortunately, identifying indirect-call targets is a hard problem. In particular, modern compilers do not recognize indirect- call targets by default. Existing approaches identify indirect-call 1 Introduction targets based on type analysis that matches the types of function Function pointers are commonly used in C/C++ programs to sup- pointers and the ones of address-taken functions. Such approaches, port dynamic program behaviors. For example, the Linux kernel however, suffer from a high false-positive rate as many irrelevant provides unified APIs for common file operations suchas open(). functions may share the same types. Internally, different file systems have their own implementations of In this paper, we propose a new approach, namely Multi-Layer these APIs, and the kernel uses function pointers to decide which Type Analysis (MLTA), to effectively refine indirect-call targets concrete implementation to invoke at runtime.
    [Show full text]
  • Incrementalized Pointer and Escape Analysis Frédéric Vivien, Martin Rinard
    Incrementalized Pointer and Escape Analysis Frédéric Vivien, Martin Rinard To cite this version: Frédéric Vivien, Martin Rinard. Incrementalized Pointer and Escape Analysis. PLDI ’01 Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, Jun 2001, Snowbird, United States. pp.35–46, 10.1145/381694.378804. hal-00808284 HAL Id: hal-00808284 https://hal.inria.fr/hal-00808284 Submitted on 14 Oct 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Incrementalized Pointer and Escape Analysis∗ Fred´ eric´ Vivien Martin Rinard ICPS/LSIIT Laboratory for Computer Science Universite´ Louis Pasteur Massachusetts Institute of Technology Strasbourg, France Cambridge, MA 02139 [email protected]­strasbg.fr [email protected] ABSTRACT to those parts of the program that offer the most attrac- We present a new pointer and escape analysis. Instead of tive return (in terms of optimization opportunities) on the analyzing the whole program, the algorithm incrementally invested resources. Our experimental results indicate that analyzes only those parts of the program that may deliver this approach usually delivers almost all of the benefit of the useful results. An analysis policy monitors the analysis re- whole-program analysis, but at a fraction of the cost.
    [Show full text]
  • SI 413, Unit 3: Advanced Scheme
    SI 413, Unit 3: Advanced Scheme Daniel S. Roche ([email protected]) Fall 2018 1 Pure Functional Programming Readings for this section: PLP, Sections 10.7 and 10.8 Remember there are two important characteristics of a “pure” functional programming language: • Referential Transparency. This fancy term just means that, for any expression in our program, the result of evaluating it will always be the same. In fact, any referentially transparent expression could be replaced with its value (that is, the result of evaluating it) without changing the program whatsoever. Notice that imperative programming is about as far away from this as possible. For example, consider the C++ for loop: for ( int i = 0; i < 10;++i) { /∗ some s t u f f ∗/ } What can we say about the “stuff” in the body of the loop? Well, it had better not be referentially transparent. If it is, then there’s no point in running over it 10 times! • Functions are First Class. Another way of saying this is that functions are data, just like any number or list. Functions are values, in fact! The specific privileges that a function earns by virtue of being first class include: 1) Functions can be given names. This is not a big deal; we can name functions in pretty much every programming language. In Scheme this just means we can do (define (f x) (∗ x 3 ) ) 2) Functions can be arguments to other functions. This is what you started to get into at the end of Lab 2. For starters, there’s the basic predicate procedure?: (procedure? +) ; #t (procedure? 10) ; #f (procedure? procedure?) ; #t 1 And then there are “higher-order functions” like map and apply: (apply max (list 5 3 10 4)) ; 10 (map sqrt (list 16 9 64)) ; '(4 3 8) What makes the functions “higher-order” is that one of their arguments is itself another function.
    [Show full text]
  • Abstract Syntax Trees & Top-Down Parsing
    Abstract Syntax Trees & Top-Down Parsing Review of Parsing • Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree • Issues: – How do we recognize that s ∈ L(G) ? – A parse tree of s describes how s ∈ L(G) – Ambiguity: more than one parse tree (possible interpretation) for some string s – Error: no parse tree for some string s – How do we construct the parse tree? Compiler Design 1 (2011) 2 Abstract Syntax Trees • So far, a parser traces the derivation of a sequence of tokens • The rest of the compiler needs a structural representation of the program • Abstract syntax trees – Like parse trees but ignore some details – Abbreviated as AST Compiler Design 1 (2011) 3 Abstract Syntax Trees (Cont.) • Consider the grammar E → int | ( E ) | E + E • And the string 5 + (2 + 3) • After lexical analysis (a list of tokens) int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • During parsing we build a parse tree … Compiler Design 1 (2011) 4 Example of Parse Tree E • Traces the operation of the parser E + E • Captures the nesting structure • But too much info int5 ( E ) – Parentheses – Single-successor nodes + E E int 2 int3 Compiler Design 1 (2011) 5 Example of Abstract Syntax Tree PLUS PLUS 5 2 3 • Also captures the nesting structure • But abstracts from the concrete syntax a more compact and easier to use • An important data structure in a compiler Compiler Design 1 (2011) 6 Semantic Actions • This is what we’ll use to construct ASTs • Each grammar symbol may have attributes – An attribute is a property of a programming language construct
    [Show full text]
  • Generalized Escape Analysis and Lightweight Continuations
    Generalized Escape Analysis and Lightweight Continuations Abstract Might and Shivers have published two distinct analyses for rea- ∆CFA and ΓCFA are two analysis techniques for higher-order soning about programs written in higher-order languages that pro- functional languages (Might and Shivers 2006b,a). The former uses vide first-class continuations, ∆CFA and ΓCFA (Might and Shivers an enrichment of Harrison’s procedure strings (Harrison 1989) to 2006a,b). ∆CFA is an abstract interpretation based on a variation reason about control-, environment- and data-flow in a program of procedure strings as developed by Sharir and Pnueli (Sharir and represented in CPS form; the latter tracks the degree of approxi- Pnueli 1981), and, most proximately, Harrison (Harrison 1989). mation inflicted by a flow analysis on environment structure, per- Might and Shivers have applied ∆CFA to problems that require mitting the analysis to exploit extra precision implicit in the flow reasoning about the environment structure relating two code points lattice, which improves not only the precision of the analysis, but in a program. ΓCFA is a general tool for “sharpening” the precision often its run-time as well. of an abstract interpretation by tracking the amount of abstraction In this paper, we integrate these two mechanisms, showing how the analysis has introduced for the specific abstract interpretation ΓCFA’s abstract counting and collection yields extra precision in being performed. This permits the analysis to opportunistically ex- ∆CFA’s frame-string model, exploiting the group-theoretic struc- ploit cases where the abstraction has introduced no true approxi- ture of frame strings. mation.
    [Show full text]
  • Escape from Escape Analysis of Golang
    Escape from Escape Analysis of Golang Cong Wang Mingrui Zhang Yu Jiang∗ KLISS, BNRist, School of Software, KLISS, BNRist, School of Software, KLISS, BNRist, School of Software, Tsinghua University Tsinghua University Tsinghua University Beijing, China Beijing, China Beijing, China Huafeng Zhang Zhenchang Xing Ming Gu Compiler and Programming College of Engineering and Computer KLISS, BNRist, School of Software, Language Lab, Huawei Technologies Science, ANU Tsinghua University Hangzhou, China Canberra, Australia Beijing, China ABSTRACT KEYWORDS Escape analysis is widely used to determine the scope of variables, escape analysis, memory optimization, code generation, go pro- and is an effective way to optimize memory usage. However, the gramming language escape analysis algorithm can hardly reach 100% accurate, mistakes ACM Reference Format: of which can lead to a waste of heap memory. It is challenging to Cong Wang, Mingrui Zhang, Yu Jiang, Huafeng Zhang, Zhenchang Xing, ensure the correctness of programs for memory optimization. and Ming Gu. 2020. Escape from Escape Analysis of Golang. In Software In this paper, we propose an escape analysis optimization ap- Engineering in Practice (ICSE-SEIP ’20), May 23–29, 2020, Seoul, Republic of proach for Go programming language (Golang), aiming to save Korea. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3377813. heap memory usage of programs. First, we compile the source code 3381368 to capture information of escaped variables. Then, we change the code so that some of these variables can bypass Golang’s escape 1 INTRODUCTION analysis mechanism, thereby saving heap memory usage and reduc- Memory management is very important for software engineering.
    [Show full text]
  • Escape Analysis for Java
    Escape Analysis for Java Jong-Deok Choi Manish Gupta Mauricio Serrano Vugranam C. Sreedhar Sam Midkiff IBM T. J. Watson Research Center P. O. Box 218, Yorktown Heights, NY 10598 jdchoi, mgupta, mserrano, vugranam, smidkiff ¡ @us.ibm.com ¢¤£¦¥¨§ © ¦ § § © § This paper presents a simple and efficient data flow algorithm Java continues to gain importance as a language for general- for escape analysis of objects in Java programs to determine purpose computing and for server applications. Performance (i) if an object can be allocated on the stack; (ii) if an object is an important issue in these application environments. In is accessed only by a single thread during its lifetime, so that Java, each object is allocated on the heap and can be deallo- synchronization operations on that object can be removed. We cated only by garbage collection. Each object has a lock asso- introduce a new program abstraction for escape analysis, the ciated with it, which is used to ensure mutual exclusion when connection graph, that is used to establish reachability rela- a synchronized method or statement is invoked on the object. tionships between objects and object references. We show that Both heap allocation and synchronization on locks incur per- the connection graph can be summarized for each method such formance overhead. In this paper, we present escape analy- that the same summary information may be used effectively in sis in the context of Java for determining whether an object (1) different calling contexts. We present an interprocedural al- may escape the method (i.e., is not local to the method) that gorithm that uses the above property to efficiently compute created the object, and (2) may escape the thread that created the connection graph and identify the non-escaping objects for the object (i.e., other threads may access the object).
    [Show full text]
  • Object-Oriented Programming in C
    OBJECT-ORIENTED PROGRAMMING IN C CSCI 5448 Fall 2012 Pritha Srivastava Introduction Goal: To discover how ANSI – C can be used to write object- oriented code To revisit the basic concepts in OO like Information Hiding, Polymorphism, Inheritance etc… Pre-requisites – A good knowledge of pointers, structures and function pointers Table of Contents Information Hiding Dynamic Linkage & Polymorphism Visibility & Access Functions Inheritance Multiple Inheritance Conclusion Information Hiding Data types - a set of values and operations to work on them OO design paradigm states – conceal internal representation of data, expose only operations that can be used to manipulate them Representation of data should be known only to implementer, not to user – the answer is Abstract Data Types Information Hiding Make a header file only available to user, containing a descriptor pointer (which represents the user-defined data type) functions which are operations that can be performed on the data type Functions accept and return generic (void) pointers which aid in hiding the implementation details Information Hiding Set.h Example: Set of elements Type Descriptor operations – add, find and drop. extern const void * Set; Define a header file Set.h (exposed to user) void* add(void *set, const void *element); Appropriate void* find(const void *set, const Abstractions – Header void *element); file name, function name void* drop(void *set, const void reveal their purpose *element); Return type - void* helps int contains(const void *set, const in hiding implementation void *element); details Set.c Main.c - Usage Information Hiding Set.c – Contains void* add (void *_set, void *_element) implementation details of { Set data type (Not struct Set *set = _set; struct Object *element = _element; exposed to user) if ( !element-> in) The pointer Set (in Set.h) is { passed as an argument to element->in = set; add, find etc.
    [Show full text]
  • Making Collection Operations Optimal with Aggressive JIT Compilation
    Making Collection Operations Optimal with Aggressive JIT Compilation Aleksandar Prokopec David Leopoldseder Oracle Labs Johannes Kepler University, Linz Switzerland Austria [email protected] [email protected] Gilles Duboscq Thomas Würthinger Oracle Labs Oracle Labs Switzerland Switzerland [email protected] [email protected] Abstract 1 Introduction Functional collection combinators are a neat and widely ac- Compilers for most high-level languages, such as Scala, trans- cepted data processing abstraction. However, their generic form the source program into an intermediate representa- nature results in high abstraction overheads – Scala collec- tion. This intermediate program representation is executed tions are known to be notoriously slow for typical tasks. We by the runtime system. Typically, the runtime contains a show that proper optimizations in a JIT compiler can widely just-in-time (JIT) compiler, which compiles the intermediate eliminate overheads imposed by these abstractions. Using representation into machine code. Although JIT compilers the open-source Graal JIT compiler, we achieve speedups of focus on optimizations, there are generally no guarantees up to 20× on collection workloads compared to the standard about the program patterns that result in fast machine code HotSpot C2 compiler. Consequently, a sufficiently aggressive – most optimizations emerge organically, as a response to JIT compiler allows the language compiler, such as Scalac, a particular programming paradigm in the high-level lan- to focus on other concerns. guage. Scala’s functional-style collection combinators are a In this paper, we show how optimizations, such as inlin- paradigm that the JVM runtime was unprepared for. ing, polymorphic inlining, and partial escape analysis, are Historically, Scala collections [Odersky and Moors 2009; combined in Graal to produce collections code that is optimal Prokopec et al.
    [Show full text]
  • Call Graph Generation for LLVM (Due at 11:59Pm 3/7)
    CS510 Project 1: Call Graph Generation for LLVM (Due at 11:59pm 3/7) 2/21/17 Project Description A call graph is a directed graph that represents calling relationships between functions in a computer program. Specifically, each node represents a function and each edge (f, g) indicates that function f calls function g [1]. In this project, you are asked to familiarize yourself with the LLVM source code and then write a program analysis pass for LLVM. This analysis will produce the call graph of input program. Installing and Using LLVM We will be using the Low Level Virtual Machine (LLVM) compiler infrastructure [5] de- veloped at the University of Illinois Urbana-Champaign for this project. We assume you have access to an x86 based machine (preferably running Linux, although Windows and Mac OS X should work as well). First download, install, and build the latest LLVM source code from [5]. Follow the instructions on the website [4] for your particular machine configuration. Note that in or- der use a debugger on the LLVM binaries you will need to pass --enable-debug-runtime --disable-optimized to the configure script. Moreover, make the LLVM in debug mode. Read the official documentation [6] carefully, specially the following pages: 1. The LLVM Programmers Manual (http://llvm.org/docs/ProgrammersManual.html) 2. Writing an LLVM Pass tutorial (http://llvm.org/docs/WritingAnLLVMPass.html) Project Requirements LLVM is already able to generate the call graphs of the pro- grams. However, this feature is limited to direct function calls which is not able to detect calls by function pointers.
    [Show full text]