Control-Flow Analysis of Dynamic Languages Via Pointer Analysis

Control-Flow Analysis of Dynamic Languages via Pointer Analysis Steven Lyde William E. Byrd Matthew Might University of Utah flyde,webyrd,[email protected] Abstract assignments of either #t or #f to the parameters of f result We demonstrate how to map a control-flow analysis for in f evaluating to #t. a higher-order language (dynamic languages are typically (define (taut f n) higher-order) into a pointer analysis for a first-order lan- (if (= n 0) f guage, such as C. This allows us to use existing pointer anal- (and (taut (f #t) (- n 1)) ysis tools to perform a control-flow analysis, exploiting their (taut (f #f) (- n 1))))) technical advancements and the engineering effort that went into developing them. We compare the results of two recent (define (g x) parallel pointer analysis tools with a parallel control-flow (lambda (y) analysis tool. While it has been known that a control-flow (or (and x (not y)) analysis of higher-order languages and a pointer analysis of (or (not x) y)))) first-order languages are very similar, we demonstrate that these two analyses are actually more similar than previously (define (h z) z) thought. We present the first mapping between a high-order control-flow analysis and a pointer analysis. (taut g 2) Categories and Subject Descriptors D.3.4 [Programming languages]: Processors—Optimization (taut h 1) General Terms Languages, Theory Because functions are passed around as arguments, it Keywords program analysis, high-order languages, pointer might not be immediately clear which functions can flow analysis to f and thus what functions we are invoking when we call f. There are two call sites which recursively invoke taut, 1. Introduction but the argument is the result of invoking f itself. In order to answer the question of which functions can flow to f, In dynamic languages, because they are typically higher- we need a control-flow analysis. A higher-order control-flow order, it is not always clear from the syntax which functions analysis conservatively bounds the set of functions for a are being called at which call sites. Take for example the higher-order call site. simple Racket program that implements Turner’s tautology Imagine we have an analysis or compiler optimization checker and invokes it two times [6]. The details of the that needs a control-flow analysis, but we don’t have a example are not important, just the illustration it gives. The control-flow analysis immediately available to us. Our anal- function taut takes a function f that represents a boolean ysis might need to know the control flow of the program in expression. The function taut then checks if all possible order to prove some security properties or prove the absence of errors. Our compiler optimization might need to know the control flow of the program to speed up the program by Permission to make digital or hard copies of all or part of this work for personal or inlining functions. classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation There are a large number of tools that can perform a re- on the first page. Copyrights for components of this work owned by others than ACM lated but different analysis: pointer analysis. A pointer anal- must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a ysis tells us which objects variables point to in a program. fee. Request permissions from [email protected]. Wouldn’t it be nice if we could just use these tools? We could CONF ’yy, Month d–d, 20yy, City, ST, Country. Copyright c 20yy ACM 978-1-nnnn-nnnn-n/yy/mm. $15.00. spend a significant amount of time developing a new mod- http://dx.doi.org/10.1145/nnnnnnn.nnnnnnn ern control-flow analysis tool, pulling in the latest features from these similar but different tools, or we could find ways that the answer we get back from a pointer analysis tool to reuse these existing tools to solve the problem we have at will be the same answer we would get from a control- hand. Using existing tools saves us the effort of developing flow analysis tool. our own tool and allows us to rely on mature and well tested • In section 6, we then demonstrate the benefit of this tools. mapping by comparing two recent parallel pointer analy- 1.1 Overview sis tools with a recent parallel control-flow analysis tool called EigenCFA [13]. We compare a control-flow anal- Precise control-flow analysis is expensive. For example, ysis tool that runs on the GPU with a pointer analysis 0CFA as described by Palsberg, which is the analysis we tool that also runs on the GPU [9]. We also compare explore in this paper, is cubic [11]. This work was initially these tools to one that runs on a single CPU with multiple motivated by trying to identify ways we could speed up cores [8]. Both of theses pointer analysis tools were writ- control-flow analysis. With the advancement of GPUs being ten by Mendez-Lojo´ et al. For the benchmarks we used, used for general process computing and more cores being this multithreaded implementation performs the best. We available on commodity machines, one way to speed up the saw that the CPU multicore pointer analysis tool is able analysis is to parallelize it. The only work we know of that to run up to 35 times faster than the GPU control-flow parallelizes control-flow analysis of higher-order languages analysis tool. With the mapping of this paper, we beat the is EigenCFA [13]. We also know of two recent tools that fastest known GPU version of CFA with a pointer analy- parallelize pointer analysis developed by Mendez-Lojo´ et sis tool that runs on the CPU. al. [8, 9]. One option we considered to speed up control-flow analy- The implementation details of a static analysis tool sis was to deeply understand these two tools for pointer anal- matter. With the wide range of work that has been done ysis and port over any ideas that would improve EigenCFA. for pointer analysis, applying these techniques directly to However, we took the option to use those tools directly. But control-flow analysis of dynamic languages is advantageous. in order to do this, we needed to map a control-flow analysis Due to the mapping in this paper, for the chosen benchmarks, into a pointer analysis. In this paper, we demonstrate how to we now have the fastest way we know of to do higher-order do this. To our knowledge, even though the similarities be- control-flow analysis that has the precision of a 0CFA as tween these analyses have been known and there has been a formulated by Palsberg [11]. general feeling in the community that they are the same, this mapping has not been explicitly laid out [10]. 2. Background With this paper we plan to show that we can successfully In this section, we give a brief description of a traditional take the state of the art in pointer analysis and improve upon pointer analysis and a traditional control-flow analysis. the state of the art in control-flow analysis in terms of per- Control-flow analysis of higher-order programs and pointer formance. Our goal is to leverage existing tools available to analysis share much in common with each other and often us from the pointer analysis community. We wish to exploit the pointer analysis and control-flow analysis communities the efficiency of these static analysis tools for control-flow use similar techniques [7]. However, the relationship be- analysis of dynamic languages. tween the techniques is often obscured by differing termi- The contributions of this paper are as follows. nology and presentation styles. The brief overview of the two analyses given here il- • We show that there exists a direct mapping between a lustrates their differences. But the mapping given in the control-flow analysis of higher-order languages and a next section illustrates their similarities. By showing that pointer analysis for first-order languages. In fact, we the gap between pointer analysis of first-order languages and show three different mappings, each serving a slightly control-flow analysis of higher-order languages is even nar- different purpose. The first mapping, in section 3, is to rower than once thought, this allows for further applications help us demonstrate the connection between a control- of the large amount of research that has gone into pointer flow analysis and a pointer analysis. The second map- analysis to be applied to control-flow analyses. ping, in section 5.1, allows us to use the benchmarks that were used by the control-flow analysis tool EigenCFA. 2.1 Pointer Analysis The final mapping, in 5.4, makes a different trade-off by Pointer analysis is one of the most fundamental static anal- containing more inference rules, but results in fewer vari- yses with a broad range of applications. Is is used by tra- ables. Because of this mapping, we end up with one of ditional optimizing compilers and by applications involved the fastest tools for 0CFA. with program verification, bug finding, refactoring, and se- • We show in section 4 that the constraints generated by curity. a traditional control-flow analysis are equivalent to the Pointer analyses can change based on the desired pre- constraints generated by a pointer analysis after going cision.

Control-Flow Analysis of Dynamic Languages Via Pointer Analysis

7. Control Flow First?

A Survey of Hardware-Based Control Flow Integrity (CFI)

Control-Flow Analysis of Functional Programs

Obfuscating C++ Programs Via Control Flow Flattening

Control Flow Statements

Control Flow Statements in Java Tutorials Point

Control Flow Analysis

Code Based Analysis of Object-Oriented Systems Using Extended Control Flow Graph

Computer Science 2. Conditionals and Loops

Statement-Level Control Structures

Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs By: Nancy Pennington Presented By: Michael John Decker Author & Journal

Control Flow