The Set Constraint/CFL Reachability Connection in Practice

The Set Constraint/CFL Reachability Connection in Practice John Kodumal Alex Aiken EECS Department Computer Science Department University of California, Berkeley Stanford University [email protected] [email protected] ABSTRACT implementations to scale has proved as tricky as implement- Many program analyses can be reduced to graph reachabil- ing set constraints, leading to new algorithms and optimiza- ity problems involving a limited form of context-free lan- tion techniques. guage reachability called Dyck-CFL reachability. We show Set constraints and CFL reachability were shown to be a new reduction from Dyck-CFL reachability to set con- closely related in work by Melski and Reps [13]. While their straints that can be used in practice to solve these problems. result is useful for understanding the conceptual similarity Our reduction is much simpler than the general reduction between the two problems, it does not serve as an imple- from context-free language reachability to set constraints. mentation strategy; as with many reductions, the cost of We have implemented our reduction on top of a set con- encoding one problem in the other formalism proves to be straints toolkit and tested its performance on a substantial prohibitive in practice. polymorphic flow analysis application. Furthermore, the Melski-Reps reduction does not show how to relate the state-of-the-art algorithms for set constraints to the state-of-the-art algorithms for CFL reachabil- Categories and Subject Descriptors ity. Optimizations in one formalism are not preserved across D.2.4 [Software Engineering]: Software/Program Verifi- the reduction to the other formalism. Demand-driven algo- cation; F.3.2 [Logics and Meanings of Programs]: Se- rithms for CFL reachability do not automatically lead to mantics of Programming Languages demand algorithms for set constraints (except by applying the reduction first). General Terms Our insight is based on the observation that almost all of the applications of CFL reachability in program analy- Algorithms, Design, Experimentation, Languages, Theory sis are based on Dyck languages, which contain strings of matched parentheses. For Dyck languages, we show an al- Keywords ternative reduction from CFL reachability to set constraints Set constraints, context-free language reachability, flow that addresses the issues mentioned above. The principal analysis, type qualifiers contributions of this work are as follows: • We give a novel construction for converting a Dyck- 1. INTRODUCTION CFL reachability problem into a set constraint prob- Many program analyses have been formulated and imple- lem. The construction is simpler than the more gen- mented using set constraints. Set constraint-based analyses eral reduction described in [13]. In fact, the constraint have been shown to scale to large programs on applications graphs produced by our construction are nearly iso- such as points-to analysis, shape analysis, and receiver class morphic to the original CFL graph. prediction, though scalability requires highly optimized implementations [4, 8, 19, 21]. • We show that on real polymorphic flow analysis prob- Recently, a number of program analyses have been formu- lems, our implementation of Dyck-CFL reachability lated as context-free language (CFL) reachability problems. based on set constraints remains competitive with a These include applications such as type-based polymorphic highly tuned CFL reachability engine. This is some- flow analysis, field-sensitive points-to analysis, and interpro- what surprising since that implementation contains cedural dataflow analysis [15, 17, 18]. Getting CFL-based optimizations that exploit the specific structure of the CFL graphs that arise in flow analysis. These results have several consequences: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are • Our results show that it is possible to use an off-the- not made or distributed for profit or commercial advantage and that copies shelf set constraint solver to solve Dyck-CFL reacha- bear this notice and the full citation on the first page. To copy otherwise, to bility problems without suffering the penalties we nor- republish, to post on servers or to redistribute to lists, requires prior specific mally expect when using a reduction. Furthermore, permission and/or a fee. PLDI'04, June 9–11, 2004, Washington, DC, USA. reasonable performance can be expected without hav- Copyright 2004 ACM 1-58113-807-5/04/0006 ...$5.00. ing to tune the solver for each specific application. G [ fB hu; u0i ; C hu0; vig , add A hu; vi for each production of the form A ! B C G [ fB hu; vig , add A hu; vi for each production of the form A ! B G , add A hu; ui for each production of the form A ! Figure 1: Closure rules for CFL reachability • We believe that our reduction bridges the gap be- S tween the various algorithms for Dyck-CFL reachabil- b a _ ] \ Z x1 d , z1 ity [10] and the algorithms for solving set constraints. BB |> BB (1 )1 || In light of our reduction, it seems that the distinc- BB || BB || tion between these problems is illusory: the manner in ! s | which a problem is specified (as Dyck-CFL reachability y1 / y2 |= BB vs. set constraints) is orthogonal to other algorithmic || BB || BB issues (e.g. whether the solver is online vs. offline, | (2 )2 BB || demand-driven vs. exhaustive). This has some impor- x2 Z d 2 z2 \ ] _ a b tant consequences{ for instance, we plan in the future S to investigate an algorithm for incremental set constraints. Using our reduction, such an algorithm could be used immediately to solve Dyck-CFL reachability Figure 2: Example Dyck-CFL reachability problem problems incrementally. • Furthermore, our reduction shows how techniques used form: to solve set constraints (inductive form and cycle elim- S ! S S j (1 S )1 j · · · j (n S )n j j s ination [4]) can be applied to Dyck-CFL reachability problems. (where s is a distinguished terminal not in O or C). The all-pairs Dyck-CFL reachability problem is the restriction of The remainder of this paper is structured as follows. In the all-pairs CFL reachability problem to Dyck languages. Section 2 we briefly introduce CFL reachability and set con- Figure 1 shows the closure rules for the CFL and Dyck- straints. In Section 3 we review the reduction from CFL CFL reachability algorithms. The rules assume that the reachability to set constraints. Section 4 presents our spe- CFL grammar has been normalized such that no right-hand cialized reduction from Dyck-CFL reachability to set con- side of any production contains more than two symbols1. A straints. In Section 5 we use polymorphic flow analysis as a naive CFL reachability algorithm might apply the rules as case study for our reduction. Section 6 covers related work, follows: whenever the CFL graph matches the form on the and Section 7 concludes. left-hand side of the rule, add the edges indicated by the right-hand side of the rule; iterate this process until no rule 2. PRELIMINARIES induces a new edge addition. The application of these rules produces a graph closure(G) that contains all possible edges We review basic material on CFL reachability and set con- labeled by nonterminals in the grammar. To check whether straints. Readers familiar with CFL reachability may wish there is an S-path from nodes u to v in G, we simply check to skip Section 2.1, which is standard. Section 2.2 is based for the existence of an edge S hu; vi in closure(G). on the framework of [21], which uses some nonstandard no- In general, the all-pairs CFL-reachability problem can be tation. solved in time O(jT [ Nj3n3) for a graph with n nodes. Figure 2 shows an example of a Dyck-CFL reachability 2.1 CFL and Dyck-CFL Reachability problem. The dashed lines show the edges added by the In this subsection we define CFL reachability and Dyck- computation of closure(G). The edges show that there are CFL reachability and describe an approach to solving these S-paths between nodes x1 and z1, and x2 and z2. Self-loop problems. S-paths derived from the production S ! are not shown. Let CF G = (T; N; P; S) be a context free grammar with terminals T , nonterminals N, productions P and start sym- 2.2 Set Constraints bol S. Let G be a directed graph with edges labeled by In this subsection, we review the set constraints frame- elements of T . The notation A hu; vi denotes an edge in G work introduced in [21]. We define a language of set con- from node u to node v labeled with symbol A. Each path straints (essentially a subset of the full language defined in in G defines a word over T by concatenating, in order, the [1, 6]) and discuss inductive form, a graph-based represen- labels of the edges in the path. A path in G is an S-path tation of set constraints. if its word is in the language of CF G. The all-pairs CFL reachability problem determines the pairs of vertices (u; v) 2.2.1 A Language of Set Constraints where there exists an S-path from u to v in G. A constraint system is a finite collection of set constraints. Now let O = f(1; : : : ; (ng and C = f)1; : : : ; )ng be two A set constraint is a relation of the form L ⊆ R, where L disjoint sets of terminals (interpreted as opening and closing and R are set expressions. Set expressions consist of set symbols, respectively). The subscripts are the terminals' indices. The Dyck language over O [ C [ fsg is generated 1For instance, by converting the grammar to Chomsky nor- by a context free grammar with productions of the following mal form.

The Set Constraint/CFL Reachability Connection in Practice

Matroid Theory

Matroids You Have Known

Incremental Dynamic Construction of Layered Polytree Networks )

A Framework for the Evaluation and Management of Network Centrality ∗

1 Plane and Planar Graphs Definition 1 a Graph G(V,E) Is Called Plane If

On Hamiltonian Line-Graphsq

Graph Theory to Gain a Deeper Understanding of What’S Going On

5. Matroid Optimization 5.1 Definition of a Matroid

6.1 Directed Acyclic Graphs 6.2 Trees

Introduction to Graph Theory

Some Trends in Line Graphs

5.6 Euler Paths and Cycles One of the Oldest and Most Beautiful Questions in Graph Theory Originates from a Simple Challenge That Can Be Played by Children