<<

Portland State University PDXScholar

Computer Science Faculty Publications and Presentations Computer Science

9-2015 A Constraint Language for Static Semantic Analysis based on Graphs

Hendrik van Antwerpen Delft University of Technology

Pierre Néron Delft University of Technology

Andrew Tolmach Portland State University

Eelco Visser Delft University of Technology

Guido Wachsmuth Delft University of Technology

Follow this and additional works at: https://pdxscholar.library.pdx.edu/compsci_fac

Part of the Electrical and Electronics Commons, and the Software Engineering Commons Let us know how access to this document benefits ou.y

Citation Details Van Antwerpen, H., Néron, P., Tolmach, A., Visser, E. and Wachsmuth, G. (2015). A Constraint Language for Static Semantic Analysis based on Scope Graphs. Report TUD-SERG-2015-009.

This Report is brought to you for free and open access. It has been accepted for inclusion in Computer Science Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. A Constraint Language for Static Semantic Analysis Based on Scope Graphs

Hendrik van Antwerpen Pierre Néron Andrew Tolmach TU Delft, The Netherlands TU Delft, The Netherlands Portland State University, USA [email protected] [email protected] [email protected]

Eelco Visser Guido Wachsmuth TU Delft, The Netherlands TU Delft, The Netherlands [email protected] [email protected]

Abstract meta-languages for the specification of syntactic and semantic as- In previous work, we introduced scope graphs as a formalism for pects of a language [18]. Such meta-languages should (i) have a describing program binding structure and performing name resolu- clear and clean underlying theory; (ii) handle a broad range of tion in an AST-independent way. In this paper, we show how to use common language features; (iii) be declarative, but be realizable scope graphs to build static semantic analyzers. We use constraints by practical algorithms and tools; (iv) be factored into language- extracted from the AST to specify facts about binding, typing, and specific and language-independent parts, to maximize re-use; and initialization. We treat name and type resolution as separate build- (v) apply to erroneous programs as well as to correct ones. ing blocks, but our approach can handle language constructs—such In recent work we showed how name resolution for lexically- as record field access—for which binding and typing are mutually scoped languages can be formalized in a way that meets these cri- dependent. We also refine and extend our previous scope graph the- teria [14]. The name binding structure of a program is captured in ory to address practical concerns including ambiguity checking and a scope graph which records identifier declarations and references support for a wider range of scope relationships. We describe the and their scoping relationships, while abstracting away program de- details of constraint generation for a model language that illustrates tails. Its building blocks are scopes, which correspond to sets many of the interesting static analysis issues associated with mod- of program points that behave uniformly with respect to resolution. ules and records. A scope contains identifier declarations and references, each tagged with its position in the original AST. Scopes can be connected Categories and Subject Descriptors D.3.1 [Programming Lan- by edges representing lexical nesting or import of named collec- guages]: Formal Definitions and Theory; D.3.2 [Programming tions of declarations such as modules or records. A scope graph Languages]: Language classifications; F.3.1 [Logics and Mean- is constructed from the program AST using a language-dependent ings of Programs]: Specifying and Verifying and Reasoning about traversal, but thereafter, it can be processed in a largely language- Programs; D.3.4 [Programming Languages]: Processors; F.3.2 independent way. A resolution calculus gives a formal definition [Logics and Meanings of Programs]: Semantics of Programming of what it means for a reference to resolve to a declaration. Res- Languages; D.2.6 [Software Engineering]: Programming Envi- olutions are described as paths in the scope graph obeying certain ronments (language-specific) criteria; a given reference may resolve to one or many declarations (or to none). A derived resolution algorithm Keywords Language Specification; Name Binding; Types; Do- computes the set of declarations to which each reference resolves, main Specific Languages; Meta-Theory and is sound and complete with respect to the calculus. In this paper, we refine and extend the scope graph framework 1. Introduction of [14] to a full framework for static semantic analysis. In essence, this involves uniting a type checker with our existing name reso- Language workbenches [6] are tools that support the implemen- lution machinery. Ideally, we would like to keep these two aspects tation of full-fledged programming environments for (domain- separated as much as possible for maximum modularity. And in- specific) programming languages. Ongoing research investigates deed, for many language constructs, a simple two-stage approach— how to reduce implementation effort by factoring out language- name resolution using the scope graph followed by a separate type independent implementation concerns and providing high-level checking step—would work. But the full story is more complicated, because sometimes name resolution also depends on type resolu- tion. For example, in a language that uses dot notation for object field projection, determining the resolution of x in the expression r.x requires first determining the object type of r, which in turn requires name resolution again. Thus, we require a unified mecha- nism for expressing and solving arbitrarily interdependent naming and typing resolution problems. To address this challenge, we base our framework on a language of constraints. Term equality constraints are a standard choice for • We extend the name resolution algorithm of [14] to be paramet- ric over scope reachability and visibility policies defined over extract solve (generalized) scope graph edge labels. Name & Type Constraints Assignment • We give an algorithm for solving combined name and type AST resolution problems and prove that it is sound with respect to this paper the satisfiability specification.

Figure 1. Architecture of our constraint-based approach to static Outline In Section 2, we introduce the constraint language using semantic analysis. A language-specific extraction function trans- example programs in a small model language. In Section 3, we lates an abstract syntax tree to a set of constraints. The generic formally define the syntax and semantics of the constraint language constraint solver (independent of the source language), solves the by defining a satisfaction relation on constraints and an extended constraints and produces a name and type assignment. resolution calculus. In Section 4 we develop a constraint solver and prove that it is sound with respect to the semantics. In Section 5 describing type inference problems while abstracting away from we relate this work to previous work by ourselves and others, and the details of an AST in a particular language. Adopting constraints discuss limitations and ideas for future work. to describe both typing and scoping requirements has the advantage of uniform notation, and, more importantly, provides a clean way 2. Constraints for Static Semantics to combine naming and typing problems. In particular, we extend our previous work to support incomplete scope graphs, which cor- In this section we introduce our approach to constraint-based name respond to constraint sets with (as yet) unresolved variables. and type resolution. We show how scope graph constraints are used Our new framework continues to satisfy the criteria outlined to model name binding and combine them with typing constraints above. (i) The resolution calculus and standard term equality con- to model type consistency. We illustrate the ideas using LMR (Lan- straint theory provide a solid language-independent theory for guage with Modules and Records), a small model language that is name and type resolution. (ii) Our framework supports type check- a variant of the LM (Language with Modules) of [14]. LMR does ing and inference for statically typed, monomorphic languages with not aspire to be a real , but is designed to user-defined types, and can also express uniqueness and complete- represent typical and challenging name and type resolution idioms. ness requirements on declarations and initializers. The framework In the rest of this section we study name and type resolution for inherits from scope graphs the ability to model a broad range a selection of LMR constructs using a series of examples. The full of binding patterns, including many variants of lexical scoping, grammar of LMR is defined in Fig. 5 and a constraint extraction records, and modules. (iii) The constraint language has a declar- algorithm for the entire language is given in Fig. 6. Along the way ative semantics given by a constraint satisfaction relation, which we gradually introduce the concepts of the constraint language. The employs the resolution calculus to define name resolution relative full syntax of the constraint language is defined in Fig. 7. Subse- to a scope graph. We define a constraint resolution algorithm based quent sections formalize the constraint language and its semantics. on our previous name resolution algorithm, extended to support parameterization by a language-specific policy controlling scope 2.1 Declarations and References reachability and visibility, combined with a standard unification We first recall the concepts of the scope graph approach [14], and algorithm. (iv) The constraint language is intended as an inter- adapt them to a constraint-based framework. Consider the example nal language for static semantic analysis tools (Fig. 1). Given the in Fig. 2, which shows a simple LMR program with two global dec- abstract syntax tree of a program, a language-specific extractor pro- larations (top), and, in the boxes below it, the constraints extracted duces a set of constraints that express the name binding and types from it and their solution. Subscripts on expressions and identifiers of the program. A language-independent solver attempts to find a represent AST positions. Thus, x1, x4, and x8 are different occur- solution for the set of extracted constraints, and produces a (par- rences of the same name x. We represent scope graph constraints tial) name and type assignment. Note that the constraint language diagrammatically by the scope graph they specify. is not intended as a domain-specific meta-language (such as NaBL The nodes of a scope graph G represent the three basic notions [12]) to be used by language designers using a language work- derived from the program abstract syntax tree (AST): scopes, dec- bench. Rather, it is intended to be used as an internal language for larations, and references: the implementation of such meta-languages. (v) The application to erroneous programs is work in progress. • A scope is an abstraction of a set of nodes in the AST that behave uniformly with respect to name binding. Scopes are Contributions The specific technical contributions of this paper denoted by identifiers drawn from an abstract enumerable set. are the following: In a scope graph diagram, scopes are represented by circles with numbers representing their identity, e.g. 1 . S(G) denotes the • We introduce a constraint notation for the specification of scope set of scopes of G. graphs and name resolution that is complementary to the de- • A declaration is an occurrence of an identifier that introduces a scription of traditional typing constraints. D name. We write xi for the declaration of name x at position i • We extend the scope graph framework of [14] with unique- in the program. We omit the position i when it is unimportant in ness and completeness constraints to express properties such the context. In diagrams, a declaration is represented by a box as “there are no duplicate declarations in this scope” or “every with an incoming arrow, e.g. x1 . D(G) denotes the set of declared field in this record is initialized.” declarations of G. • We introduce generalized scope graph edge labels to model a • A reference is an occurrence of an identifier referring to a wide range of scope combination policies including transitive R declaration. We write xi for a reference with name x at position and non-transitive imports, and non-overriding includes. i. Again, we sometimes omit the position i. In diagrams, a • We give a specification for satisfiability of combined sets of reference is represented by a box with an outgoing arrow, e.g. name and type resolution constraints. x4 . R(G) denotes the set of references of G. Scope Graph Constraints The edges of a scope graph deter- def x = 1 mine the connections between scopes, declarations, and references. 1 2 def y = (if (x == 0 ) then 3 else x ) Edges are specified directly by means of scope graph constraints 3 4 5 6 7 8 9 (CG in the grammar of Fig. 7), where the ground terms D, R, and Scope graph constraints Typing constraints S represent declarations, references, and scopes, respectively. For D now, we only consider the two basic edges that connect declarations x1 : τ2 τ2 ≡ Int x4 x1 D and references to scopes: y3 : τ9 τ9 ≡ τ7 1 τ9 ≡ τ8 δ4 : τ4 D D • A declaration constraint s x specifies that declaration x x8 y3 δ8 : τ8 τ6 ≡ Bool belongs to scope s. Graphically: s x . τ4 ≡ Int τ5 ≡ Int Resolution constraints • R R τ7 ≡ Int A reference constraint x s specifies that reference x R x4 7→ δ4 Solution belongs to scope s. Graphically: x s . R x8 7→ δ8 D δ4 = δ8 = x !D(1) 1 The “solution” to a set of scope graph constraints is a well-formed τ2 = τ4 = τ5 = Int scope graph, i.e. one in which each declaration and reference be- τ7 = τ8 = τ9 = Int longs to (is connected by an edge with) exactly one scope. Note that τ6 = Bool the existence of nodes (declarations, references, and scopes) of the scope graph is specified implicitly by their appearance in an edge Figure 2. Declarations and references in global scope constraint. For convenience, we sometimes write Sc(xD) = s for D R R s x and Sc(x ) = s for x s. We define by comprehen- • A type declaration constraint D : T associates a type with a sion the sets of declarations and references belonging to a scope s, declaration. This constraint is used in two flavors: associating D D R R as D(s) = {x | Sc(x ) = s} and R(s) = {x | Sc(x ) = s}. a type variable (τ) with a concrete declaration, or associat- In most contexts, constraints and derived notations are implicitly ing a type variable with a declaration variable. In Fig. 2, the D D parameterized by the scope graph under consideration; when they constraints x1 : τ2 and y3 : τ9 associate distinct type variables G D D need to be explicitly parameterized by a scope graph , we use a with declarations x1 and y3 . (For ease of reading, we choose subscript notation (e.g. DG (s)). type variable names corresponding to subexpression label num- bers.) The constraint δ4 : τ4 requires the type of the declaration Resolution Constraints The basic semantic intuition behind R to which x4 resolves to be the same as the type τ4 of the refer- scope graphs is that a reference resolves to a declaration iff there is ence considered as an expression. a path from the reference node to the declaration node. In this case we say that the declaration is visible from the reference. Resolu- • A type equality constraint T ≡ T specifies that two types tion constraints( Res in the grammar) represent requirements on should be equal. In Fig. 2, the constraint τ2 ≡ Int arises from successful name resolution: the constant expression 12, and the constraint τ4 ≡ Int arises from the fact that the == operator takes integer operands. The • A resolution constraint R 7→ D specifies that a given reference constraint τ6 ≡ Bool arises in two ways, from the fact that must resolve to a given declaration. Typically, the declaration == returns a Boolean and the fact that if requires one; since is specified as a declaration variable δ. For example, in Fig. 2 constraints should be thought of as a set, we list each distinct R R constraint only once. the constraints x4 7→ δ4 and x8 7→ δ8 require that references R R x4 and x8 resolve to (as yet unknown) declarations δ4 and δ8, respectively. A solution to a set of typing constraints is a substitution on dec- laration and type variables that satisfies all the constraints. For A solution to a set of resolution constraints is a substitution map- example, the substitution for τ9 can be deduced either from the ping each declaration variable to a declaration, such that applying constraints τ9 ≡ τ7 and τ7 ≡ Int, or from the constraints τ9 ≡ τ8, D this substitution to the constraints generates valid resolutions ac- τ2 ≡ Int and the unification of τ8 and τ2 (via δ8 = x1 ). cording to the scope graph resolution calculus (which we formalize Note that for a program to be both well-bound and well-typed, R R single in Section 3). In Fig. 2, since the only paths starting at x4 and x8 we need to find a substitution on declaration and type vari- D both end at declaration x1 , the (sole) solution to these constraints ables that allows both resolution and typing constraints to be sat- D is a substitution mapping both δ4 and δ8 to x1 . Applying this sub- isfied simultaneously. In this simple example, it is clear that the R D R D stitution yields the valid resolutions x4 7−→ x1 and x8 7−→ x1 . declaration variables are determined solely by the resolution con- In addition to constraints about the resolution of references, straints, but this will not always be the case in general. CRes also includes constraints on properties of name collections N, which are multisets of identifiers. For now we only consider the 2.2 Lexical Scope uniqueness constraint: Only very trivial programs have just a single scope. The left part of Fig. 3 shows an LMR example that illustrates nested lexical scopes. • A uniqueness constraint !N specifies that a given name collec- Scope graphs use edges between scopes to model inclusion of the tion N contains no duplicates. (visible) declarations in one scope in another. They can be used to • A declaration name collection D(s) is obtained by projecting model lexical nesting or direct import of all the names from one the identifiers from the set of declarations in scope s. scope into another, according to the label on the edge.

l Thus, for example, in Fig. 2 the constraint !D(1) requires that • A direct edge constraint s1 s2 specifies a direct l-labeled l scope 1 should have no duplicate declarations. These types of edge from scope s1 to s2. (Graphically: s1 s2 .) The constraints are satisfied when the property they specify holds. general meaning of such an edge is that the declarations visible in s2 are also visible in s1. Or, following the direction of the Ty Typing Constraints Typing constraints(C ) represent require- arrow, that a reference in s1 can be resolved by searching for a ments for type consistency of the program: declaration in s2. to the scope declaring those names (e.g. the body of a module). def n = true module A { 1 2 1 Graphically: x s . def f3 = ( def a2 = 43 fun (n4:Int5){ } The LMR program in the right part of Fig. 3 consists of two f6(n7) module B4 { modules A1 and B4 and an import of the former into the latter. The } import A 8 5 declarations in these modules are contained in 2 and 3 . Each )9 def b6 = a7 of these scopes is associated with the corresponding declaration } of the name of the module, which is represented in a scope graph A Scope graph constraints Scope graph constraints diagram with an open arrow, e.g. 1 2 . These scopes are also child scopes of the program global scope 1 . P f6 2 1 f3 B4 1 A1 Imports A nominal import makes the declarations in an associ- ated scope visible in another, not necessarily lexically related, tar- P P n7 n4 n1 get scope. A nominal import is represented by (1) a regular refer- 3 2 I ence to the name of the scope being imported, and (2) an import Resolution Constraints edge of that name into the target scope: fR 7→ δ nR 7→ δ a7 b6 A5 a2 6 6 7 7 • A nominal edge constraint s l xR specifies a nominal l- !D(1)! D(2) R Resolution constraints labeled edge from scope s to reference x . (Graphically: l Typing constraints R s x ) Such an edge makes visible in s all declarations a7 7→ δ7 !D(1) D !D(2)! D(3) that are visible in the associated scope of the declaration to n1 : τ2 τ2 ≡ Bool R D which x resolves, according to the label on the edge. f3 : τ9 τ9 ≡ Fun[τ5,τ8] Typing constraints D n : τ5 τ5 ≡ Int R 4 aD : τ τ ≡ Int For example, import A5 is represented by the reference A5 in δ : τ τ ≡ Fun[τ ,τ ] 2 3 3 I 6 6 6 7 8 D scope 3 and an import arrow 3 A5 . It is also possible to b6 : τ7 δ7 : τ7 δ7 : τ7 import the declarations of another scope directly, using an (ordi- Solution nary) nameless edge; this feature is used in the next sub-section. Solution D δ7 = a2 δ = fD δ = nD 6 3 7 4 τ3 = Int τ7 = Int Resolving through Imports Name resolution in the presence of τ2 = Bool τ8 = t0 associated scopes and imports proceeds as follows. If a scope S1 R D τ5 = τ7 = Int contains an import xi , which resolves to a declaration xj with τ6 = τ9 = Fun[Int,t0] associated scope S2, then all declarations in S2 are reachable in S1. R D Thus, in the example, reference a7 resolves to declaration a2 since where t0 is any fixed arbitrary type R D the import A5 resolves to declaration A1 , and the associated scope D D Figure 3. Lexical scope and module imports. 2 of A1 contains declaration a2 . Note that the resolution calculus is parameterized by the policy used to disambiguate conflicting In the left part of Fig. 3, scope 2 — corresponding to the body of resolutions. Here we use a default policy that prefers imported the fun — is nested within the program global scope 1 , which declarations over declarations in parents; alternatives are discussed P in Section 3.4. is expressed by the scope edge constraint 2 1 . This edge is labeled P for “parent;” we will see other possible labels shortly. 2.4 Type-Dependent Name Resolution A resolution path starting from a reference may traverse a P edge R D So far, we have seen how to use resolution constraints to express to find a matching declaration, e.g. reference f6 resolves to f3 . However, in order to model shadowing of outer declarations by the dependence of type resolution on name resolution. However, for inner ones, paths that traverse fewer (or no) P edges are preferred, some language constructs the resolution of a name to its declaration R D D depends on the type of another expression. For example, in a so reference n7 resolves to declaration n4 rather than to n1 . The kinds of typing constraints generated by this example are field access expression e.f, in order to resolve the field f, one the same as those from the previous example. Note that the solution first needs to find the type of the expression e and then to look to the typing constraints leaves f’s result type unspecified (since it for f in the scope associated with the type. This scheme induces is never used). dependencies on type resolution, not only from name resolution but also from scope graph construction (one does not know in which 2.3 Imports scope the reference f lies). We model such type-dependent name resolution by using scope graph constraints with scope variables. In addition to lexical scope, many programming languages provide The examples in Fig. 4 illustrate the approach. features for making declarations in scopes selectively available ‘at a distance’. Examples of such constructs are modules with imports Field Declaration and Initialization Before we can study field in ML and classes with inheritance in Java. To model such features, access proper, we need to consider modeling of record types, field scope graphs provide associated scopes and imports. declarations, and record initialization. We identify each record type by the declaration of the record name in its type definition, e.g. Associated Scope The essence of module-like constructs is that D Rec(A1 ). We model the fields of a record type definition as declara- they encapsulate a collection of declarations and make these avail- D tions (here just x2 ) in a scope (here, scope 2 ) associated with the able through import of the module. That requires an association D between the encapsulated declarations and the declaration of the record type name declaration A1 . The resolution constraint !D(2) forbids duplicate field names. module, which is modeled by associated scopes: D To construct a new record of a declared record type (e.g. A1 ), • An association constraint xD s specifies s as the associated we create a new parentless scope (here, scope 3 ) which imports scope of declaration xD. Associated scopes can be used to the field names of the record by importing (the associated scope of) connect the declaration (e.g. a module) of a collection of names the record declaration (via a reference to the name of the type, here • An association constraint D S specifies that a given decla- record A { x : Int } 1 2 3 ration has a given associated scope. def a4 = ( new A5{x6=17})8 def y = ( a .x ) 9 10 11 12 Specifically, we use δ12 ς12 to say that ς12 must be the associ- ated scope of δ12. Scope graph constraints Typing constraints Solving these constraints will lead to a solution for ς12 — in D D x : τ3 τ3 ≡ Int this case the associated scope of A1 , scope 2 — such that the ς a y a 2 12 10 9 4 D 4 R a4 : τ8 τ8 ≡ Rec(δ8) appropriate scope can be imported into scope . After that, x11 D I D y9 : τ12 δ6 : τ6 can be resolved as usual to the corresponding field declaration x2 , 4 A5 1 A1 δ10 : τ10 δ11 : τ12 yielding its type τ3 ≡ Int. τ ≡ τ τ ≡ Int I P 7 6 7 With As a further variant, we discuss an expression form inspired τ ≡ Rec(δ ) x11 x6 3 x2 2 10 12 by the with statement in the Pascal language. In the expression Solution with e do e’, e should be a record-valued expression; the Resolution constraints D field names of the record are added to the lexical environment of δ6 = δ11 = x2 AR 7→ δ xR 7→ δ D e’. That is, a variable reference x in e’ will be interpreted as a 5 8 6 6 δ8 = δ12 = A1 R R D a10 7→ δ10 x11 7→ δ11 δ10 = a ς12 =2 field of the record value when the record has indeed a field with 4 x !D(1)! D(2) τ3 = τ6 = τ7 = τ12 = Int name ; otherwise the variable is considered as a regular reference D in the enclosing lexical context. Static resolution again requires V(3) ≈ R(3) δ12 ς12 τ8 = τ10 = Rec(A ) 1 resolving variables in e’ in the associated scope of the record Figure 4. Field access. type of e, but this time also allowing resolution to the enclosing lexical scope. Replacing (a.x) by (with a do x) in the code

D of Fig. 4 produces identical constraints, with the addition of a scope A5 ). We then process field initializers by putting references to the P R graph edge 4 1 . field names (here just x6 ) into this new scope; these references can only resolve to the field declarations. In order to check that each field of a record type is initialized, This concludes the informal explanation-by-example of the con- we use the following additional kinds of name collections and straint language and its application to LMR. A constraint extraction constraints: algorithm for the full LMR language is given in Fig. 6, but we do not discuss this in detail. Instead, in the next sections we formalize • A reference name collection R(s) denotes the multiset of ref- the syntax and semantics of the constraint language and discuss the erence identifiers of scope s. definition of a resolution algorithm based on the semantics. • A visible name collection V(s) denotes the multiset of decla- ration identifiers that are visible from scope s (i.e., would be 3. Syntax and Semantics of Constraints visible from a reference to the declared identifier in s). In this section we formally define the syntax of the constraint ⊂ • A subset constraint N ∼ N specifies that one name collection language and its declarative semantics. is included in another. ⊂ 3.1 Syntax • An iso constraint N1 ≈ N2 is syntactic sugar forN 1 ∼ N2 ∧ ⊂ N2 ∼ N1 and specifies that two name collections are isomor- Fig. 7 defines the full syntax of the constraint language. Constraints phic. are divided into three categories: Scope graph constraintsC G spec- ify a scope graph which defines the binding structures of the pro- Thus, the constraint V(3) ≈ R(3) requires that the set of visible gram. Resolution constraintsC Res describe requirements for all field declarations V(3) (the declarations visible in scope 3 ) is program names to be properly resolved and, where appropriate, to isomorphic to the set of initializers R(3) (the references in 3 ). be unique or complete. Typing constraintsC Ty describe require- R R Field Access Now we consider the field access a10.x11 at subex- ments for the program to be well-typed. The informal meaning of R pression 12 in Fig. 4. The reference x11 is a field access in the each constraint form was described by a bulleted definition in Sec- R R record value of a10. Thus, x11 should be resolved in a scope con- tion 2. Constraints can be combined using conjunction (C ∧ C) and taining (just) the declarations for the field names, i.e. the associated True represents the trivially satisfiable constraint. R A ground constraint is one having no variables. A scope graph is scope of the type of the a10, namely 2 . Once again, we create a R ground if it is specified by a set of ground scope graph constraints; parentless scope 4 and add the field being accessed (here x8 ) as a reference in that scope. However, in this case we do not know at otherwise it is incomplete. The constraint language is parameterized by a family of type constraint extraction time that 2 is the correct scope to import, be- R constructors c ∈ CT and a set of labels l ∈ L. We describe the cause we do not know the type of a10. That is, the name resolution R R former here and the latter in Section 3.4. of x11 depends on the type resolution of a10. To model this we proceed as follows. We create a new scope Type Constructors Types in T are either type variables τ or variable ς12 that acts as a placeholder for the scope that we want to c( , ..., ) c ∈ C I type constructor applications T T with T , a set of import into scope 4 . We add a direct edge constraint 4 ς12 , language-specific type constructors. Each constructor c has an asso- this time labeled with I rather than P, which makes the reso- ciated arity c :: n. For example, Int and Bool are type constructors lution process more eager to follow the edge (see Section 3.4 with arity 0 and Fun is a type constructor with arity 2. Well-formed R for details). We also have the usual constraints a10 7→ δ10 and constraints respect the arity of the type constructors. R δ10 : τ10 corresponding to reference a10. And we have the con- To represent user-defined types, such as classes in object- straint τ10 ≡ Rec(δ12) for some unknown record type declaration oriented languages or algebraic data types in functional languages, R δ12 because of the use of a10 in the field position of a field access. a type constructor can also include the scope graph declaration To make the connection between the declaration of the record type corresponding to the type definition. For example, record types in and the placeholder scope, we use an association constraint: LMR are represented by Rec(d) with d a type name declaration in prog = decl ∗ prog decl∗ [[ds]] := !D(s) ∧ [[ds]]s decl = module id {decl ∗} | import id ∗ def module decl D D 0 0 P 0 decl | bind [[ xi {ds}]]s := s xi ∧ xi s ∧ s s ∧ !D(s ) ∧ [[ds]]s0 | record id {fdecl ∗} decl R I R fdecl = id : ty [[import xi]]s := xi s ∧ s xi ty = Int decl bind | Bool [[def b]]s := [[b]]s

| id ∗ record { } decl D D 0 0 P 0 fdecl | ty → ty [[ xi fs ]]s := s xi ∧ xi s ∧ s s ∧ !D(s ) ∧ [[fs]]s,s0 exp = int | true = bind D D exp | false [[xi e]]s := s xi ∧ xi : τ ∧ [[e]]s,τ | id [[x : t = e]]bind s xD ∧ xD : τ ∧ [[t]]ty ∧ [[e]]exp | exp ⊕ exp i s := i i s,τ s,τ | if exp then exp else exp [[x :t]]fdecl := s xD ∧ xD : τ ∧ [[t]]ty | fun ( id : ty ){ exp} i sr ,sd d i i sr ,τ | exp exp | letrec tbind in exp [[Int]]ty := t ≡ Int | new id {fbind ∗} s,t | with exp do exp ty [[Bool]]s,t := t ≡ Bool | exp . id bind = id = exp ty ty ty [[t1 → t2]]s,t := t ≡ Fun[τ1,τ2] ∧ [[t1]]s,τ1 ∧ [[t2]]s,τ2 | tbind tbind = id : ty = exp ty R R [[xi]]s,t := t ≡ Rec(δ) ∧ xi s ∧ xi 7→ δ fbind = id = exp

exp 0 P 0 0 D Figure 5. Syntax of LMR. [[fun (xi:t1){e}]]s,t := t ≡ Fun[τ1,τ2] ∧ s s ∧ !D(s ) ∧ s xi

D ty exp ∧ x : τ1 ∧ [[t1]]s,τ ∧ [[e]] 0 i 1 s ,τ2 letrec in exp 0 P 0 bind exp [[ bs e]]s,t := s s ∧ !D(s ) ∧ [[bs]]s0 ∧ [[e]]s0,t exp [[n]]s,t := t ≡ Int For simplicity, we describe the algo- exp rithm as operating over LMR’s concrete [[true]]s,t := t ≡ Bool syntax. The algorithm is defined by a [[false]]exp := t ≡ Bool family of functions indexed by syntactic s,t category (decl, exp, etc.). Each function exp exp exp [[e1 ⊕ e2]]s,t := t ≡ t3 ∧ τ1 ≡ t1 ∧ τ2 ≡ t2 ∧ [[e1]]s,τ1 ∧ [[e2]]s,τ2 takes a syntactic component and possi- (where ⊕ has type t × t → t ) bly one or more auxiliary parameters, 1 2 3 if then else exp exp exp exp and returns a constraint, possibly involv- [[ e1 e2 e3]]s,t := τ1 ≡ Bool ∧ [[e1]]s,τ1 ∧ [[e2]]s,t ∧ [[e3]]s,t ing one or more fresh variables or new exp [[x ]] := xR s ∧ xR 7→ δ ∧ δ : t scope identifiers. Functions are defined i s,t i i exp exp exp by a set of rules, one for each possible [[e1 e2]]s,t := τ ≡ Fun[τ1,t] ∧ [[e1]]s,τ ∧ [[e2]]s,τ1 syntactic form in the category. For ex- exp [[e.x ]]exp [[e]]exp ∧ τ ≡ Rec(δ) ∧ δ ς ∧ s0 I ς ∧ [[x ]]exp ample, [[−]]s,t has twelve rules, and is i s,t := s,τ i s0,t parameterized by the scope s in which [[with e do e ]]exp := [[e ]]exp ∧ τ ≡ Rec(δ) ∧ δ ς identifier references within the expres- 1 2 s,t 1 s,τ 0 P 0 I exp sion are to go and the expected type t ∧ s s ∧ s ς ∧ [[e2]]s0,t of the expression. We use the notation exp ∗ [[new x {bs}]] := xR s ∧ xR 7→ δ ∧ s0 I xR [[−]]c on sequences of items of syntac- i s,t i i i ∗ tic category c to mean the result of ap- fbind 0 0 ∧ [[bs]] 0 ∧ V(s ) ≈ R(s ) ∧ t ≡ Rec(δ) plying [[−]]c to each item and return- s,s ing the conjunction of the resulting con- = fbind R 0 R exp straints, or True for the empty sequence. [[xi e]]s,s0 := xi s ∧ xi 7→ δ ∧ δ : τ ∧ [[e]]s,τ Figure 6. Constraint extraction for LMR. Scope names (s) occurring free in rhs of rules are new ground scopes. Type variables (τ), scope variables (ς), and declaration variables (δ) occurring free are fresh variables. the program; thus, in Fig. 4, the record definition A defines the type Our basic approach to defining satisfaction is as follows. First D Rec(A1 ). assume that we have only ground constraints. Then we can interpret scope graph constraintsC G directly as a ground scope graph. We 3.2 Constraint Satisfaction next define a satisfiability relation |= by cases on ground resolution Res Ty In our approach, the abstract syntax tree of a program p is re- constraintsC and typing constraintsC relative to a context duced by the language-specific extraction function to a constraint (G, ψ), where G is a ground scope graph and ψ is a typing environ- G Ty D(G) [[p]] = C ∧ CRes ∧ C where commutativity and associativity of ment mapping declarations in to unique ground types in T. p p p G conjunction let us group the subconstraints into categories. In particular, resolution constraints are checked against using the G Ty Res (C-TRUE) C := C | C | C | C ∧ C | True G, ψ |= True G := | | l | | l C R S S D S S D S S R G, ψ |= C1 G, ψ |= C2 Res ⊂ (C-AND) C := R 7→ D | D S | !N | N ∼ N G, ψ |= C1 ∧ C2 CTy := T ≡ T | D : T ψ(d) = T (C-TYPEOF) D D := δ | xi G, ψ |= d : T R R D R := xi `G p : xi 7→ xj R D (C-RESOLVE) S := ς | n G, ψ |= xi 7→ xj T := τ | c(T, ..., T) with c ∈ CT d G S (C-SCOPEOF) N := D(S) | R(S) | V(S) G, ψ |= d S

Figure 7. Syntax of constraints ∀x, 1 N G (x) ≤ 1 J K (C-UNIQUE) G, ψ |=! N scope graph resolution calculus (described in Section 3.3). Finally, G N1 G ⊆ N2 G we apply |= with G set toC . J K J ⊂ K (C-SUBNAME) To lift this approach to constraints with variables, we simply G, ψ |= N1 ∼ N2 apply a multi-sorted substitution φ, mapping type variables τ to t1 = t2 (C-EQ) ground types, declaration variables δ to ground declarations and G, ψ |= t1 ≡ t2 scope variables ς to ground scopes. Thus, our overall definition of satisfaction for a program p is: Figure 8. Interpretation of resolution and typing constraints G Res Ty φ(Cp ), ψ |= φ(Cp ) ∧ φ(Cp ) () Resolution paths D R where φ(E) denotes the application of the substitution φ to all the s := D(xi ) | E(l, S) | N(l, xi ,S) variables appearing in E that are in the domain of φ. When the p := [] | s | p · p (inductively generated) proposition  holds we say that ψ and φ resolve p. [] · p = p · [] = p Resolution and Typing Constraints The |= relation is given by (p1 · p2) · p3 = p1 · (p2 · p3) the inductive rules in Fig. 8, where = is the syntactic equality on Well-formed paths R D terms and `G xi 7−→ xj is the resolution relation for graph G. The interpretation of a name collection N G is the multiset defined WF(p) ⇔ labels(p) ∈ E as follows: D(S) G = π(DG (S)), RJ (SK) G = π(RG (S)), and Visibility ordering on paths V(S) =J π({xKD | ∃p, ` p : SJ 7−→KxD}) where π(A) is G i G i label(s ) < label(s ) p < p Jthe multisetK produced by projecting the identifiers from a set A of 1 2 1 2 s1 · p1 < s2 · p2 s · p1 < s · p2 references or declarations. Given a multiset M, 1M (x) denotes the multiplicity of x in M. Edges in scope graph l S1 S2 3.3 Resolution Calculus (E) I ` E(l, S2): S1 −→ S2 The resolution calculus defines the resolution of a reference to a S l yR yR ∈/ ` p : yR 7−→ yD yD S declaration in a scope graph as a most specific, well-formed path 1 i i II i j j 2 R (N) from reference to declaration through a sequence of edges. A path I ` N(l, yi ,S2): S1 −→ S2 p is a list of steps representing the atomic scope transitions in the Transitive closure graph. There are three kinds of steps: (I) I, S ` []: A A • A (direct) edge step E(l, S2) is a direct transition from the  B/∈ ` s : A −→ B , {B} ∪ ` p : B C current scope to the scope S2. This step records the label of SI I S  (T ) the scope transition that is used. I, S ` s · p : A  C • A nominal edge step N(l, yR,S) requires the resolution of ref- Reachable declarations R erence y to a declaration with associated scope S to allow a 0 0 D I, {S} ` p : S  S WF(p) S xi transition between the current scope and scope S. D D (R) I ` p · D(xi ): S xi • D  A complete path always ends with a declaration step D(x ) that Visible declarations stores the declaration the path is leading to. D I ` p : S xi R 0 0 D 0 A path p is a valid resolution in the graph from reference xi ∀j, p (I ` p : S xj ⇒ ¬(p < p)) D R D  (V ) to declaration xi such that `G p : xi 7−→ xi according to D I ` p : S 7−→ xi the calculus rules in Fig. 9. These rules all implicitly apply to Reference resolution a fixed graph G, which we omit to avoid clutter. The calculus defines the resolution relation in terms of edges in the scope graph, xR S {xR} ∪ ` p : S 7−→ xD i i I j (X) reachable declarations, and visible declarations. Here is the set of R D I I ` p : xi 7−→ xj seen imports, a technical device needed to avoid “out of thin air” anomalies in resolution of nominal imports. We often drop I from Figure 9. Resolution calculus from [14] extended for arbitrary a resolution when it is empty. The S component that appears in edge labels and parameterized with well-formedness predicate WF the transitive closure rules is the set of seen scopes that is used to and visibility ordering <. Here label projects the label from a step prevent cycles in the resolution path of a given reference. and labels projects the sequence of labels from a path. Lexical scope R R R R[I](x ) := let (r, s) = EnvE [{x } ∪ I, ∅](Sc(x ))} in L := {P}E := P∗ D < P ( D D Non-transitive imports U if r = P and {x |x ∈ s} = ∅ {xD|xD ∈ s} ∗ ? L := {P, I}E := P · I D < P, D < I, I < P ( (T, ∅) if S ∈ S or re = ∅ Transitive imports Envre [I, S](S) := L∪{D} Envre [I, S](S) ∗ ∗ L := {P, TI}E := P · TI D < P, D < TI, TI < P 0 0 L [  {l ∈L|l EnvL > Envl > EnvD > IS > R Soundness Given this definition, we can prove that the algorithm re re re re on incomplete graphs is correct with respect with the calculus: This termination order is used as the induction principle in most of the proofs. Lemma 2. For any incomplete graph G: xD ∈ R (xR) =⇒ ` xR 7−→ xD ∧ G ↓ xR Correctness on ground scope graphs We want to prove that i G G i R R when this algorithm operates on a ground scope graph, it is sound where RG (x ) denotes the top-level resolution function R[∅](x ) and complete with respect to the calculus presented in Fig. 9. First, for the graph G. Lemma 1 states that this property holds when the graph G is will eliminate clauses from C while instantiating G and filling ψ. ground. We next prove that if the resolution on an incomplete graph The algorithm terminates when the constraint is empty or no more G terminates with a total flag T then for any graph G0 that is an clauses can be solved. Each rule solves one constraint, possibly instance of G, the result is the same. updating components of the tuple or applying a substitution to it.

R Envre [I, S](x )G = (T,D) =⇒ • Rule S-RESOLVE solves resolution constraints xR 7→ δ using R the resolution algorithm from Fig. 11. If a resolution is found, it Envre [I, S](x )G0 = (T,D) (i) is substituted for the variable δ. If the scope graph is incomplete, Proof. We prove this result along with similar result for all the other the algorithm might return U, in which case the constraint is left functions by induction on the termination order of the algorithm. to be solved later. The fact that the result is total implies that the results of all the D • Rule S-ASSOC solves scope association constraints x ς by recursive calls are also total and this allows us to apply the desired looking up the scope S associated with ground declaration xD induction hypothesis (when a P or U flag is raised it is always in the scope graph. By substituting S for ς, the scope graph propagated). becomes more complete, possibly allowing more references to be resolved. Now we show that the resolution is also correct in the partial case. 0 Let G be an incomplete scope graph and G one of its instances. If a • Rule S-EQUAL solves equality constraints T1 ≡ T2. It uses first resolution on G contains a set of declarations for a given name then order unification U(T1,T2), as described in [1]. The resulting the resolution on G0 contains the same declarations for this name: substitution is applied to the tuple. 0 • Rule S-UNIQUE solves !N constraints by checking that the Env [ , ](S) = (_,D) =⇒ Env [ , ](S) 0 = _,D =⇒ re I S G re I S G identifier collection N can be computed and all identifiers in D D D 0 ∀x, {x ∈ D} 6= ∅ ⇒ {x ∈ D} = {x ∈ D } (ii) it are distinct. (1A(x) is the multiplicity of x in A). • UB AME N ⊂ N Proof. We prove this result along with similar result for all the other Rule S-S N solves 1 ∼ 2 constraints by checking N N functions by induction on the termination order of the algorithm, that the identifier collections 1 and 2 can be computed and N N using (i). that every identifier in 1 is also in 2. • Rule S-TYPEOF solves type assignment constraints xD : T . Finally, we can prove Lemma 2: The rule considers two cases. When no type assignment is declared for xD in ψ (i.e. the first time that it is encountered) R D R Proof. Let Sx = RG (x ) and pick xi ∈ Sx. To prove that x the assignment is added to the typing environment ψ. When a D 0 resolves to xi in G, let G be an arbitrary ground instance of G. type assignment is declared (i.e. for subsequent encounters), the D R D Using (ii) we have xi ∈ RG0 (x ) and by Lemma 1 we have type T from the constraint is unified with the type ψ(x ) from R D R D `G0 x 7−→ xi . By , we get that `G x 7−→ xi . the typing environment. To prove stability, let G1 and G2 be ground instances of G. Then R R using (ii), we have RG1 (x ) = RG2 (x ) = Sx, so by definition we The constraint resolution algorithm is sound with respect to the R have G ↓ x . constraints semantics. 4.4 Name Collection Computation Lemma 4 (Constraint Solver correctness). If the algorithm pro- duces a solution to a resolution problem then the solution is valid: This resolution algorithm on partial graphs is used to compute not 0 0 only resolution of references but also the set of names visible from for all C, G, G , ψ : a given scope. Given an incomplete graph G and a scope S, we ∗ 0 0 compute name collections as: (C, G, ∅) −→ (True, G , ψ ) =⇒ 0 0 0 NG (D(S)) = π(DG (S)) NG (R(S)) = π(RG (S)) ∃φ, φ(G) = G ∧ ∀σ, σG , σψ |= σ(φ(C)) D D NG (V(S)) = π({xi | ∃E, EnvE [∅, ∅](S)G = (T,E) ∧ xi ∈ E}) If the computation of a Lemma 3 (Name computation soundness). Proof. To prove this result we first state some results on the auxil- name collection E terminates on an incomplete graph G, its results iary unification. is the semantics of the name collection for any graph G0 that is an instance of G: Unification: If U(t1, t2) = σ then σt1 = σt2 ∧ σσ = σ. See [1] NG (E) = M =⇒ E G0 = M. J K for a survey on unification problem and unification algorithms for 4.5 Constraint Solving Algorithm first order terms. With this name resolution algorithm in hand, Fig. 12 gives an algo- Resolution Soundness: Now we can prove the Lemma 4 of the rithm to solve the constraint system from Section 3. The algorithm constraint resolution algorithm. We first prove that for each reduc- is a non-deterministic rewrite system working over tuples (C, G, ψ) tion step, if the output is satisfiable, the input is also satisfiable in of a constraint, a scope graph, and a typing environment. It is non- the same definition-to-type environment: deterministic in the sense that rules may be applied to any atomic constraint in any order considering that ∧ is associative and com- ∀(C1, G1, ψ1), (C2, G2, ψ2), (C1, G1, ψ1) −→ (C2, G2, ψ2) ⇒ mutative. 0 0 Name resolution introduces ambiguity, since a reference xR ∃σ , σ (G1) = G2 ∧   may resolve to multiple definitions. If this happens the solver ∀σ, (σ(G2), σψ2) |= σ(C2) ⇒ R 0 (1) branches, picking a different resolution for x in every branch. (σG2, σψ2) |= σσ (C1) The returned solution is a set of all the (C, G, ψ) tuples the solver was able to construct. The initial state of the solver is the col- The proof of this property is by case analysis on the reduction lected constraint, the (incomplete) scope graph built from the scope step. From it, we can prove Lemma 4 by a simple induction on graph constraints and an empty typing environment. The algorithm the number of reduction steps. R D D R (x 7→ δ ∧ C, G, ψ) −→ [δ 7→ x ](C, G, ψ) where x ∈ RG (x ) (S-RESOLVE) D D (x ς ∧ C, G, ψ) −→ [ς 7→ S](C, G, ψ) where x S (S-ASSOC) (T1 ≡ T2 ∧ C, G, ψ) −→ σ(C, G, ψ) where U(T1,T2) = σ (S-EQUAL)

(!N ∧ C, G, ψ) −→ (C, G, ψ) where ∀x ∈ NG (N), 1NG (N)(x) = 1 (S-UNIQUE) ⊂ (N1 ∼ N2 ∧ C, G, ψ) −→ (C, G, ψ) where NG (N1) ⊆ NG (N2) (S-SUBNAME)  (C, G, {xD 7→ T } ∪ ψ) if xD 6∈ dom(ψ) (xD : T ∧ C, G, ψ) −→ (S-TYPEOF) (ψ(xD) ≡ T ∧ C, G, ψ) otherwise (True ∧ C, G, ψ) −→ (C, G, ψ) (S-TRUE) Figure 12. Constraint solving algorithm

5. Related Work and Discussion ence. The constraint language developed in this paper provides a In this section, we discuss the relation of this paper with previous solid formal basis for developing a new generation of name bind- and other related work, and discuss limitations and ideas for future ing and type specification languages. work. Prototype Implementation We have developed a prototype im- plementation of the constraint solver and applied it in the IDE gen- Previous Work The work in this paper is based closely on our pre- erated with the Spoofax Language Workbench [10] for the LMR vious theory of name resolution [14], which we extend and general- model language used in this paper. However, the prototype does ize here as follows: (i) a scope graph is now defined directly by a set not yet implement the parameterized name resolution algorithm de- of constraints; (ii) we generalize the parent relation to an arbitrary veloped in this paper, but uses the fixed policy from [14]. In the labeled direct edge between pairs of scopes, and the named import prototype implementation, sets of constraints for erroneous pro- relation to an arbitrary labeled nominal edge between scopes and grams lead to partial solutions with unsolvable residual constraints references; (iii) we extend the resolution algorithm to handle arbi- that can be translated into error messages in an IDE. However, we trary well-formedness conditions expressed as regular expressions have not formalized this; we have only proven the soundness of over arbitrary sets of path labels and arbitrary visibility orderings the solver for successful reductions. Furthermore, the implementa- on labels; (iv) we support partial resolution over incomplete scope tion is not optimized, nor does it support incremental evaluation of graphs; (v) we add the seen-scopes component, previously an arti- constraints in the sense of the NaBL/TS task engine [19]. fact of the resolution algorithm, to the resolution calculus to prevent Constraints The use of constraints to abstract out type inference cyclic resolution paths. problems from the abstract syntax tree is a common approach in The development of the scope graph framework fits in an ongo- implementations and extensions of the Hindley/Milner type sys- ing line of research to provide high-level domain-specific support tem [13] and has been applied to a huge variety of typing features. for name binding and type analysis in the Spoofax Language Work- However, these approaches do not address name resolution using bench [10] using the NaBL and TS meta-DSLs [12, 19, 18]. NaBL constraints, but rather perform name resolution during constraint is a DSL for defining the name binding rules of programming lan- collection. For example, in the work of Palsberg et al. [15, 16] on guages by identifying the references, definitions, scopes, and im- object-oriented type systems, constraints are associated with iden- ports in an abstract syntax tree without recourse to environments or tifiers, which requires these to be resolved before constraint collec- symbol tables [12]. TS is a complementary DSL for defining type tion. We believe that our use of constraints to define static name analysis rules. (The design of TS is not formally published, but it is resolution is novel. Instead of performing name resolution during sketched in [18].) Rules in TS are similar to traditional typing judg- constraint collection, we provide a reusable set of constraints to ments, relating an expression to a type. However, type rules do not express name resolution problems, including name resolution for have to propagate context information, since that is taken care of ‘remote’ names through imports and the interaction between name by the separate binding rules. TS rules refer to the results of name and type resolution in type-dependent name resolution. analysis produced by NaBL (e.g. definition of x has type A variation on traditional type system definitions using infer- t), and NaBL rules refer to the results of type analysis to achieve ence rules is the co-contextual approach of Erdweg et al. [5]. In- type-dependent name resolution. NaBL and TS are implemented by stead of propagating an environment to the sub-terms, environ- generation of (1) a language-specific AST traversal that generates ments are ‘synthesized’ along with type constraints, and the con- ‘tasks’, and (2) a language-independent task engine that evaluates straints and environments for sub-terms are merged. This allows for tasks in order to (incrementally) compute a name and type assign- compositional and incremental processing of name and type con- ment [19]. The resulting name and type analysis engines produce straints. Name resolution is expressed using operations on environ- Eclipse IDE support for editor services such as name and type error ments. It would be interesting to consider a bottom-up collection of checking, reference resolution, and code completion. constraints in our approach. The extraction algorithm of Fig. 6 can While NaBL and TS are used in practice to build language defi- be reformulated as a bottom-up collector, using scope variables as nitions with Spoofax, the lack of a solid theoretical foundation was placeholders for as yet unknown scopes. However, a key difference a problem for further development. The aim to verify properties of with our approach is the support for imports (and nominal instead language definitions [18] requires a semantics that can be explained of structural record types, which requires inspecting the AST asso- to a proof assistant such as Coq. In particular, the semantics of no- ciated with a type declaration), which precludes a representation of tions such as imports and ‘subsequent scope’ were hard to capture. context information using a flat environment. A general challenge NaBL has some limitations in its coverage of name binding pat- lies in the convergence of these approaches: how to realize incre- terns. For example, it cannot express variations on let bindings such mental name and type analysis in the face of imports? as sequential and parallel let. While the task engine is constraint- like, its type resolution is not based on unification, which entails Attribute Grammars Another common approach to the imple- that TS cannot be used to express languages requiring type infer- mentation of static semantic analysis is by means of attribute gram- mars [11]. In traditional attribute grammars all ‘semantic’ oper- Acknowledgments ations are carried out in the value domain. Thus, name resolu- We thank the anonymous reviewers for their feedback on previous tion is expressed by propagating a type environment or symbol versions of this paper. This research was partially funded by the table through attribute values. Kastens and Waite [9] provide a NWO VICI Language Designer’s Workbench project (639.023.206). reusable ADT for the definition of name analysis that bears some Andrew Tolmach was partly supported by a Digiteo Chair at Labo- resemblance to our scope graph framework, although the treat- ratoire de Recherche en Informatique, Université Paris-Sud. ment of modules and imports is only discussed at the implemen- tation level. Such attribute grammars would be a suitable mecha- nism for the definition of constraint collection. The extraction al- gorithm in Fig. 6 could easily be rephrased as an attribute gram- References mar with scopes and type variables as inherited attributes and constraints as synthesized attribute. In reference attribute gram- [1] F. Baader and T. Nipkow. Term rewriting and all that. Cambridge mars [7], attributes can get references to tree nodes as values. Thus, University Press, 1998. attributes can be used to link references (in the scope graph sense) [2] J. A. Brzozowski. Derivatives of regular expressions. JACM, to their declarations. For example, Ekman and Hedin [3] provide a 11(4):481–494, 1964. generic framework for name resolution based on generic reference [3] T. Ekman and G. Hedin. Modular name analysis for Java using attributes. Though this framework is part of the JastAdd Java com- JastAdd. In GTTSE, pages 422–436, 2006. piler, it can be reused for other languages as well. The framework [4] T. Ekman and G. Hedin. The JastAdd extensible Java . In needs to be instantiated with language-specific lookup functions to OOPSLA, pages 1–18, 2007. resolve names. These can be specified modularly per language con- [5] S. Erdweg, O. Bracevac, E. Kuci, M. Krebs, and M. Mezini. A co- struct, making it possible to echo the structure of the Java language contextual formulation of type rules and its application to incremental specification of name binding closely. However, these lookup func- type checking. In OOPSLA, pages 880–897, 2015. tions programatically encode name binding idioms such as lexical scoping, shadowing, and hiding. Reference attributes can also be [6] S. Erdweg, T. van der Storm, M. Völter, L. Tratt, et al. Evaluating and comparing language workbenches: Existing results and benchmarks used in the specification of type analysis. Similar to our approach, for the future. Computer Languages, Systems & Structures, 44:24– name binding and typing rules can be specified mostly separately. 47, 2015. In a generic framework, Ekman and Hedin [4] use reference at- tributes to link language constructs to their types and to represent [7] G. Hedin. Reference attributed grammars. informaticaSI, 24(3), 2000. type relations such as subtyping. Similar to name resolution, in- [8] B. Heeren, J. Hage, and S. D. Swierstra. Scripting the type inference stantiations of the framework need to be encoded programatically. process. In ICFP, pages 3–13, 2003. Modularity and extensibility require particular encoding patterns [9] U. Kastens and W. M. Waite. An abstract data type for name analysis. such as double dispatch. ACTA, 28(6):539–558, 1991. The distinctive feature of our approach is that we treat name res- [10] L. C. L. Kats and E. Visser. The Spoofax language workbench: rules olution using a largely separate mechanism, the scope graph, rather for declarative specification of languages and IDEs. In OOPSLA, than integrating it into type resolution. Since some language con- pages 444–463, 2010. structs require type-dependent name resolution, there is inevitably [11] D. E. Knuth. Semantics of context-free languages. mst, 2(2):127–145, some interaction between naming and typing, but we are still able 1968. to reuse most of our existing name resolution theory, which gives us the ability to handle a very rich variety of name binding schemes. [12] G. D. P. Konat, L. C. L. Kats, G. Wachsmuth, and E. Visser. Declara- tive name binding and scope rules. In SLE, pages 311–331, 2012. Future Work There are many directions for future work. One [13] R. Milner. A theory of type polymorphism in programming. jcss, important goal is to extend our theory to handle languages with 17(3):348–375, 1978. more sophisticated typing features, including subtyping, type- [14] P. Neron, A. P. Tolmach, E. Visser, and G. Wachsmuth. A theory of parameterized classes and functions, and modules with type signa- name resolution. In ESOP, pages 205–231, 2015. tures. To support popular OO language idioms, we also need to add support for multiple independent name spaces (and disambiguation [15] J. Palsberg and M. I. Schwartzbach. Object-oriented type inference. In OOPSLA, pages 146–161, 1991. across them) and type-based overloading resolution. As we make such extensions, we would also like to address the completeness [16] J. Palsberg and M. I. Schwartzbach. Object-oriented type systems. of the constraint resolution algorithm (on suitably restricted sets Wiley professional computing. Wiley, 1994. of constraints). In particular, it would be interesting to integrate [17] H. van Antwerpen, P. Neron, A. P. Tolmach, E. Visser, and G. Wachsmuth. approaches to type error recovery [8, 20, 21] in order to generate A constraint language for static semantic analysis based on scope good quality type error messages automatically. graphs with proofs. Technical Report TUD-SERG-2015-012, Soft- On a pragmatic front, more analysis and implementation ex- ware Engineering Research Group, Delft University of Technol- periments are needed to determine if our approach will scale to ogy, 2015. Available at http://swerl.tudelft.nl/twiki/pub/ Main/TechnicalReports/TUD-SERG-2015-012.pdf. real-world tools. In particular, we need to assess the theoretical and actual efficiency of our constraint solving algorithm. In addition, [18] E. Visser, G. Wachsmuth, A. P. Tolmach, P. Neron, V. A. Vergu, many applications for semantic analysis (e.g. in IDEs) require effi- A. Passalaqua, and G. D. P. Konat. A language designer’s workbench: cient incremental computation of name and type resolution. A one-stop-shop for implementation and verification of language de- signs. In OOPSLA, pages 95–111, 2014. On the usability front, we are interested in evaluating the ex- pressivity and understandability of our constraint language and of [19] G. Wachsmuth, G. D. P. Konat, V. A. Vergu, D. M. Groenewegen, and higher-level name and type specification languages that we express E. Visser. A language independent task engine for incremental name in terms of it. Is there a payoff to the use of high-level, but perhaps and type analysis. In SLE, pages 260–280, 2013. more abstract concepts, in contrast to a direct implementation? [20] D. Zhang and A. C. Myers. Toward general diagnosis of static errors. Finally, we are interested in extending the application of our In POPL, pages 569–582, 2014. building block approach to other tasks where constraint-based [21] D. Zhang, A. C. Myers, D. Vytiniotis, and S. L. P. Jones. Diagnosing methods have proved useful, such as pointer analysis. type errors with class. In PLDI, pages 12–21, 2015.