RbSyn: Type- and Effect-Guided Program Synthesis

Sankha Narayan Guria Jeffrey S. Foster David Van Horn University of Maryland Tufts University University of Maryland College Park, Maryland, USA Medford, Massachusetts, USA College Park, Maryland, USA [email protected] [email protected] [email protected]

Abstract does not explicitly consider side effects, which are perva- In recent years, researchers have explored component-based sive in many domains. For example, consider synthesizing a synthesis, which aims to automatically construct programs method that updates a database. Without reasoning about that operate by composing calls to existing APIs. However, effects—in this case, that the method body needs to change prior work has not considered efficient synthesis of meth- the database—synthesis of such a method reduces to brute- ods with side effects, e.g., web app methods that update a force search, limiting its performance. database. In this paper, we introduce RbSyn, a novel type- In this paper, we address this issue by introducing RbSyn, and effect-guided synthesis tool for Ruby. An RbSyn synthe- a new tool for synthesizing Ruby methods. In RbSyn, the sis goal is specified as the type for the target method anda user specifies the desired method by its type signature and series of test cases it must pass. RbSyn works by recursively a series of test cases it must pass. RbSyn then searches for generating well-typed candidate method bodies whose write a solution by enumerating candidates and checking them effects match the read effects of the test case assertions. After against the tests. The key novelty of RbSyn is that the search finding a set of candidates that separately satisfy each test, is both type- and effect-guided. Specifically, the search begins RbSyn synthesizes a solution that branches to execute the with a typed hole tagged with the method’s return type. Each correct candidate code under the appropriate conditions. We step either replaces a typed hole with an expression of that type, possibly introducing more typed holes; inserts an effect formalize RbSyn on a core, object-oriented language 휆푠푦푛 and describe how the key ideas of the model are scaled-up hole, annotated with a write effect that may be needed to in our implementation for Ruby. We evaluated RbSyn on 19 satisfy a test assertion; or replaces an effect hole with an benchmarks, 12 of which come from popular, open-source expression with the given write effect, possibly inserting Ruby apps. We found that RbSyn synthesizes correct solu- another effect hole. Once this process finds a set of method tions for all benchmarks, with 15 benchmarks synthesizing bodies that cumulatively pass all tests, RbSyn uses a novel in under 9 seconds, while the slowest benchmark takes 83 merging strategy to construct a complete solution: It cre- seconds. Using observed reads to guide synthesize is effec- ates a method whose body branches among the conditions, tive: using type-guidance alone times out on 10 of 12 app executing the corresponding (passing) code, thus yielding benchmarks. We also found that using less precise effect an- a single method that passes all tests. (§2 gives a complete notations leads to worse synthesis performance. In summary, example of RbSyn’s synthesis process.) 휆 we believe type- and effect-guided synthesis is an important We formalize RbSyn for 푠푦푛, a core object-oriented lan- step forward in synthesis of effectful methods from test cases. guage. The synthesis is comprised of three parts. The first part, type-guided synthesis, is similar to prior work Keywords program synthesis, type and effect systems, Ruby [16, 29, 31], but is geared towards imperative, object-oriented programs. The second part is effect-guided synthesis, which 1 Introduction tries to fill an effect hole ^ : 휖 with an expression with arXiv:2102.13183v2 [cs.PL] 8 Apr 2021 effect 휖. In 휆 , an effect accesses a region 퐴.푟, where 퐴 A key task in modern software development is writing code 푠푦푛 is a class and 푟 is an uninterpreted identifier. For example, that composes calls to existing APIs, such as from a library Post.author might indicate reading instance field author or framework. Component-based synthesis aims to carry out of class Post. This notion of effects balances precision and this task automatically, and researchers have shown how to tractability: effects are precise enough to guide synthesis perform component-based synthesis using SMT solvers [25]; effectively, yet coarse enough that reasoning about them is how to synthesize branch conditions [30]; and how to per- simple. The last part of the synthesis algorithm synthesizes form synthesis given a very large number of components [12]. branch conditions to create a merged program that combines This prior work guides the synthesis process using types solutions for individual tests into an overall solution for the or special properties of the synthesis domain, which is crit- complete problem. (§3 discusses our formalism.) ical to achieving good performance. However, prior work Our implementation of RbSyn is built on top of RDL, a Ruby type system [15]. Our implementation extends RDL to Conference’17, July 2017, Washington, DC, USA include effect annotations, including a self region to give 2021. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn more precise effect information in the presence of inheri- 1 # User schema{name: Str, username: Str} # Post schema{author: Str, title: Str, slug: Str} tance. Our implementation also makes use of RDL’s type-level 2 3 computations [26] to provide precise typing during synthesis. 4 define :update_post,"(Str, Str,{author:?Str, title: Finally, when searching for solutions, our implementation 5 ?Str, slug:?Str}) → Post", [User, Post] do heuristically prioritizes further exploration of candidates that 6 spec"author can only change titles" do are small and have passed more assertions. (§4 describes our 7 setup { seed_db# add some users and their posts to db implementation.) 8 9 @post = Post.create(author: 'author', slug: We evaluated RbSyn on a suite of 19 benchmarks, in- 10 'hello-world', title: 'Hello World') cluding seven benchmarks we wrote and 12 benchmarks 11 update_post('author', 'hello-world', author: extracted from three widely used, open-source Ruby apps: 12 'dummy', title: 'Foo Bar', slug: 'foobar') Discourse, Gitlab, and Disaspora. For the former, we wrote 13 } postcond { |updated| our own specifications. For the latter, we used unit tests 14 15 assert { updated.id == @post.id } that came with the benchmarks. We found that RbSyn syn- 16 assert { updated.author =="author"} thesizes correct solutions for all benchmarks and does so 17 assert { updated.title =="Foo Bar"} quickly, taking less than 9 seconds each for 15 of the bench- 18 assert { updated.slug == 'hello-world' } marks, and 83 seconds for the slowest benchmark. Moreover, 19 } end type- and effect-guidance is critical. Without it, a majority 20 21 spec"other users cannot change anything" do of the benchmarks time out after five minutes. Finally, we 22 setup { ...# same setup as above except next line examine the tradeoff of effect precision versus performance. 23 update_post('dummy', ...)# other args same We found that restricting effects to class names only causes 3 24 } benchmarks to time out, and restricting effects to only puri- 25 postcond { |updated| ...# same other three asserts assert { updated.title =="Hello World"} ty/impurity causes 10 benchmarks to time out. (§5 discusses 26 27 } the evaluation in detail.) 28 end end We believe that RbSyn is an important step forward in synthesis of effectful methods from test cases. Figure 1. Specification for update_post method 2 Overview In this section, we illustrate RbSyn by using it to synthesize These classes can then be used to invoke singleton (class) a method from a hypothetical web blogging app. This app methods in the synthesized method. For simplicity, we as- makes heavy use of ActiveRecord, a popular database access sume that RbSyn can use only these constants for this exam- library for Ruby on Rails. It is the ActiveRecord methods ple. In practice, RbSyn can synthesize predefined numeric whose side effects RbSyn uses to guide synthesis. or string constants like 0, 1 or the empty string. Figure1 shows the synthesis problem. This particular app Finally, the synthesis problem includes a number of specs, includes database tables for users and posts. In ActiveRe- which are just test cases. Each spec has a title, for human cord, rows of these tables are represented as instances of convenience; a setup block to establish any necessary pre- classes User and Post, respectively. For reference, the table conditions and call the synthesized method; and a postcond schemas are shown in lines1 and2. Each user has a name block with assertions that must hold after the synthesized and username. Each post has the author’s username, the method runs. As we will see below, separating the pre- and post’s title, and a slug, used to compute a permalink. postconditions allows RbSyn to more easily use effects to The goal of this particular synthesis problem, given by guide synthesis. In this example, both specs add a few users the call to define, is to create a method update_post that and a post created by each of them to the database (call to allows users to change the information about a post. Lines4 seed_db, details not shown) and then create a post titled and5 specify the method’s type signature in the format of “Hello World” by the user author. The first spec asserts that RDL [15], a Ruby type system that RbSyn uses for types and update_post allows author to update a post’s title. The sec- type checking. Here, the first two arguments are strings, and ond spec asserts that a dummy user cannot update the post. the last is a finite hash type that describes an instance of The check for id ensures that only existing posts are updated Hash with optional (indicated by ?) keys author, title, and (any new posts will have a new unique id). slug (all symbols, which are just interned strings) that map The final, synthesized solution is shown on the right of to strings. The method itself returns a Post. Figure2. Notice the synthesized code calls several ActiveRe- In addition to the type signature, the synthesis problem cord methods (exists?, where, and first) as well as the also includes a list of constants that can be used in the target hash access method []. Applying solver-aided synthesis to method. In this case, those constants are the classes User this problem would require developing accurate models of and Post, as given by the last argument to define on line5. these methods, which is a difficult challenge [28]. To address 2 Conference’17, July 2017, Washington, DC, USA this limitation, RbSyn instead enumerates candidates, which RbSyn extends RDL’s type annotations to include read can then be run to check them against the specs. As the and write effects. When the expression inside an assert search space is vast, RbSyn uses update_post’s type signa- evaluates to false, RbSyn infers the assert’s read and write ture and the effects from the specs’ postconds the guide the effects based on those of the methods it calls. For example, search. Finally, RbSyn uses a novel merging algorithm to we can give the Post#title1 method, used by the third synthesize the necessary branch condition to yield a solution assertion, the following signature: that satisfies both specs. type Post, :title, '() → Str', read: ['Post.title'] 2.1 Synthesizing Spec Solutions Thus, RbSyn sees that the failing assertion reads Post.title, an abstract effect label. To make the assertion succeed, Rb- The left portion of Figure2 shows the search process RbSyn Syn inserts an effect hole ^ : Post.title in the candidate uses to solve this synthesis problem. To begin, RbSyn ob- program (C10). It also saves the value of the previous candi- serves that the return type of update_post is Post. Thus, date expression in a temporary variable, and inserts a hole the search begins (upper left) by creating a candidate method with the candidate’s type at the end. RbSyn then continues body □:Post, which is a typed hole that must be filled by the search, trying to fill the effect hole with a call to amethod an expression of type Post. RbSyn then iteratively expands whose write effect matches the hole—such a call could poten- holes in candidates, running the specs whenever it produces tially satisfy the failed assertion. Here, RbSyn replaces the fully concretized candidates with no holes. effect hole (C11) with a call to Post#title=, which is such In general, RbSyn can fill a typed hole with a local variable, a method. (We should note that all previous candidates that a constant, or a method call. As there are no local variables failed a spec due to a side effect will also have effect holes (which so far are just parameters) or constants of the ap- added in a similar fashion. We omit these candidates from propriate type, RbSyn chooses a method call. To do so, it the discussion as they do not lead to a solution.) searches through the available method type annotations to RbSyn continues by using type-guided synthesis for the find those that could return Post. In this case, RbSyn takes ad- typed holes of C11—yielding C12, rejected due to assertion vantage of RDL’s type annotations for ActiveRecord [26] to failures—and then C13. After several steps (not shown), Rb- synthesize candidates C1 and C2, among others (not shown). Syn arrives at C14, which fails the spec, and C15, which fully It is straightforward for the user to add type annotations satisfies the first spec. Indeed, we see this exact expression for any other library methods that might be needed by the in lines4–6 of the solution in Figure2. synthesized method. For illustration purposes, we also show a candidate C3 that returns the wrong type. Such candidates 2.2 Merging Solutions are discarded by RbSyn, vastly reducing the search space. Note that C2 contains two method calls, and thus would take RbSyn next uses the same technique to synthesize an expres- two steps to produce, but we show it here as a single step sion that satisfies the second spec, yielding the expression for conciseness. shown on line8. Now RbSyn needs to merge these individual Next, RbSyn tries to fill holes in candidate expressions, solutions into a single solution that passes all specs. At a 푏 starting with smaller candidates. In this case, it first consid- high-level, it does so by constructing a program if 1 then 푒 푏 푒 푒 ers C1, which has a hole of type Class, which is the 1 else if 2 then 2 end, where the 푖 are the solutions 푏 singleton type for the constant Post. Thus, there is only one for the specs and the 푖 are branch conditions capturing the choice for the hole, yielding candidate C4. Since C4 has no conditions under which those expressions pass the specs. 푏 holes, RbSyn runs it against the specs. More specifically, it To create the 푖 , RbSyn uses the same technique again, this runs it against the first spec—as we will discuss shortly, Rb- time synthesizing a boolean-valued expression that evaluates 푖 Syn synthesizes solutions for each spec independently, and to true under the setup of spec . In this case, this process then combines them. In this case, C4 fails the spec (because results in the same branch condition true for both specs. the first post in the database is not the one to be updated, However, since this trivially holds for both specs, this branch due to the initial database seeding) and hence is rejected. condition does not work—we need to find a branch condition Continuing with C5, RbSyn fills in the (finite hash-typed) that distinguishes the two cases. 푏′ hole, yielding choices that include C6 and C7. RbSyn rejects Next RbSyn tries to synthesize a branch condition 1 C6 since there is no way to construct an expression of type that evaluates to true for the setup of the first spec and Int. However, for C7, there are two local variables of type false for the setup of the second. This yields the more pre- 푏′ = Post.exists?(author: arg0, Str from the method arguments. Substituting these yields cise branch condition 1 C8 and C9. C8 uses arg0, the username, to query the Post slug: arg1). This is a sufficient condition, as the update_- table’s slug, so it fails. C9 queries the Post table with the post method is supposed to update a post only if a post correct slug value arg1. This passes the first two assertions with slug arg1 is authored by arg0. It solves an analogous (line 15 onwards) but fails the third, which expects the post title to be updated from “Hello World” to “Foo Bar.” 1A#m indicates instance method m of class A. 3 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

C4 C1 (☐:Class) Post.first Test Failure ✗ No Terms .first C6 Post.where({ ✗ C2 id: (☐:Int)}) ☐: Post (☐:Class) Post.where(☐:{id: Int, .first .where(☐:{FF.}).first slug: Str, FF.}) Test Failure C5 .first Post.where({ ✗ slug: (☐:Str)}) Post.where({ (☐:Class) C7 C3 Type Error .first slug: arg0}) .exists?(☐:{FF.}) ✗ 1 def update_post(arg0, arg1, arg2) .first C8 2 if Post.exists?(author: arg0, t0 = Post.where({ ✗ Test Failure 3 slug: arg1) slug: arg1}).first C9 t0 = Post.where(slug:arg1).first t0 = Post.where({ t0.title = arg0 Post.where({ 4 slug: arg1}).first slug: arg1}) C14 t0 C12 ✗ Test Failure 5 t0.title=arg2[:title] t0.title = t0 = Post.where({ .first arg2[:author] slug: arg1}).first Effect: Post.title 6 t0 t0 Test Failure (☐:Post).title = ✗ t0 = Post.where({ 7 else (☐:Str) slug: arg1}).first C11 C13 ☐:Post t0 = Post.where({ 8 Post.where(slug: arg1).first t0 = Post.where({ C15 t0.title = (☐:{ slug: arg1}).first slug: arg1}).first author: Str, title: Str, FF.}) (◇:Post.title) 9 end t0.title = [☐:author or title or FF.] C10 ☐:Post arg2[:title] ✓ t0 10 end t0

Figure 2. Left: Steps in the synthesis of solution to the first specification. Note C2 takes two steps to synthesize but isshownas a single composite step. Some choices available to the synthesis algorithm have been omitted for simplicity. Right: Synthesized update_post method.

푏′ synthesis problem for the second spec, yielding 2 = !Post.- Values 푣 ::= nil | true | false | [퐴] exists?(author: arg0, slug: arg1). As these are the Expressions 푒 ::= 푣 | 푥 | 푒; 푒 | 푒.푚(푒) negation of each other, RbSyn then merges these two to- | if 푏 then 푒 else 푒 if-then-else if-then-else if-- gether as (rather than an | let 푥 = 푒 in 푒 | □ : 휏 | ^ : 휖 then-else), yielding the final synthesized program in Fig- Conditionals 푏 ::= 푒 | !푏 | 푏 ∨ 푏 ure2. Types 휏 ::= 퐴 | 휏 ∪ 휏 Programs 푃 ::= def 푚(푥) = 푒 3 Formalism Specs 푠 ::= ⟨푆, 푄⟩ In this section, we formalize RbSyn on 휆 , a core object- 푠푦푛 Setup 푆 ::= 푒; 푥 = 푃 (푒) oriented calculus shown in Figure3. Values 푣 include nil, 푟 Postconditions 푄 ::= assert 푒 | 푄;푄 true, false, and objects [퐴] of class 퐴. Note that we omit Spec Set Ψ := {푠 } fields to keep the presentation simpler. Expressions 푒 include 푖 Synthesis Goal 퐺 ::= ⟨휏 → 휏, Ψ⟩ values, variables 푥, sequences 푒; 푒, method calls 푒.푚(푒), con- ditionals if 푏 then 푒 else 푒, and variable bindings let 푥 = Class Table 퐶푇 ::= ∅ | 퐴.푚 : 휎,퐶푇 푒 in 푒. A conditional guard 푏 can be an expression 푒, a nega- ⟨휖푟 ,휖푤 ⟩ tion !푏, or a disjunction 푏 ∨ 푏. The grammar for guards is Method Types 휎 ::= 휏 −−−−−−→ 휏 limited to match what RbSyn can actually synthesize. Type Env. Γ ::= ∅ | 푥 : 휏, Γ Expressions also include typed holes □ : 휏 and effect holes Dynamic Env. 퐸 ::= 푥 → 푣 ^ : 휖, which are placeholders that are eventually filled with Constants Σ ::= ∅ | 푣 : 휏, Σ an expression of the given type, or expression with the given Effect 휖 ::= • | ∗ | 퐴.∗ | 퐴.푟 | 휖 ∪ 휖 write effect, respectively. We note our synthesis algorithm 푟 ∈ effect regions • ⊆ 휖 휖 ⊆ ∗ only inserts effect holes at positions that can have any type. 퐴1.∗ ⊆ 퐴2. ∗ and 퐴1.푟 ⊆ 퐴2.푟 and 퐴1.푟 ⊆ 퐴2. ∗ if 퐴1 ≤ 퐴2 Types are either classes or unions of types, and we assume 휖1 ⊆ 휖1 ∪ 휖2 휖2 ⊆ 휖1 ∪ 휖2 classes form a lattice with Nil (the class of nil) as the bottom 휖1, 휖1 휖2, 휖2 휖1 휖2, 휖1 휖2 ⟨ 푟 푤⟩ ∪ ⟨ 푟 푤⟩ = ⟨ 푟 ∪ 푟 푤 ∪ 푤⟩ element and Obj as the top element. We write 퐴 ≤ 퐵 when class 퐴 is a subclass of 퐵 according to the lattice. We defer the 푥 ∈ variables, 푚 ∈ methods, 퐴 ∈ classes, definition of effects for the moment. Finally, a synthesized Nil ≤ 휏 휏 ≤ Obj 휏1 ≤ 휏1 ∪ 휏2 휏2 ≤ 휏1 ∪ 휏2 program 푃 is a single method definition def 푚(푥) = 푒. We restrict the method to one argument for convenience. Figure 3. Syntax and Relations of 휆푠푦푛. A spec 푠 in 휆푠푦푛 is a pair of setup code 푆 and a postcondi- tion 푄. A setup 푒1; 푥푟 = 푃 (푒2) includes some initialization 푒1 followed by a special form indicating calling the synthesized 푥푟 and inspect the global state using library methods. We 푃 푒 푥 method in with argument 2 and binding the result to 푟 . write Ψ for a set of specs, and a synthesis goal 퐺 is a pair The postcondition is a sequence of assertions that can test ⟨휏1 → 휏2, Ψ⟩, where 휏1 and 휏2 are the method’s domain and 4 Conference’17, July 2017, Washington, DC, USA range types, respectively, and Ψ are the specs the synthesized Σ, Γ ⊢퐶푇 푒 ⇝ 푒 : 휏 method should satisfy. Γ(푥) = 휏 The next part of Figure3 defines additional notation used T-Var , ⊢ 푥 푥 휏 in the formalism. Synthesized methods can use classes and Σ Γ 퐶푇 ⇝ : methods from a class table 퐶푇 , which maps class and method , 푒 푒 ′ 휏 Σ Γ ⊢퐶푇 1 ⇝ 1 : 1 names to the methods’ types. For example, the class table , 푥 휏 푒 푒 ′ 휏 Σ Γ[ ↦→ 1] ⊢퐶푇 2 ⇝ 2 : 2 has type information for other methods of a target app and T-Let , ⊢ let 푥 = 푒 in 푒 let 푥 = 푒 ′ in 푒 ′ 휏 library methods such as those from ActiveRecord. A method Σ Γ 퐶푇 1 2 ⇝ 1 2 : 2 ⟨휖푟 ,휖푤 ⟩ ′ ′ type 휎 has the form 휏 −−−−−−→ 휏 , where 휏 and 휏 are the T-Hole Σ, Γ ⊢퐶푇 □ : 휏 ⇝ (□ : 휏) : 휏 domain and range types, respectively, and ⟨휖푟 , 휖푤⟩ specifies the method’s read effect 휖푟 and write effect 휖푤 (discussed 푣 : 휏1 ∈ Σ 휏1 ≤ 휏2 shortly). During type-guided synthesis, RbSyn maintains a S-Const Σ, Γ ⊢ □ : 휏 ⇝ 푣 : 휏 type environment Γ mapping variables to their types. When 퐶푇 2 1 executing a synthesized program, the operational semantics Γ(푥) = 휏1 휏1 ≤ 휏2 (omitted) uses a dynamic environment 퐸 mapping variables S-Var Σ, Γ ⊢ □ : 휏 ⇝ 푥 : 휏 to their values. During synthesis, Σ is a list of user-supplied 퐶푇 2 1 constants that can fill holes. 푚 : 휏1 → 휏2 ∈ 퐶푇 (퐴) 휏2 ≤ 휏3 S-App Σ, Γ ⊢퐶푇 □ : 휏3 ⇝ (□ : 퐴).푚(□ : 휏1) : 휏2 Effects. The last part of Figure3 defines effects 휖. In RbSyn, effects are hierarchical names that abstractly label the pro- Figure 4. Type-guided synthesis rules (selected). gram state. The empty effect • denotes no side effect, used for pure computations. The effect ∗ is the top effect, indicating a computation that might touch any state in the program. effects (in our case, reads and writes) and then creating can- Lastly, effect 퐴.∗ denotes code that touches any state within didates using the opposing element of such a pair—can be class 퐴, and 퐴.푟 denotes code that touches the region labeled generalized to more complex effect systems. 푟 퐴 in , where region names are completely abstract. Effects Synthesis Problem. We can now formally specify the syn- can also be unioned together. thesis problem. Given a synthesis goal ⟨휏 → 휏 , {⟨푆 , 푄 ⟩}⟩, 휖 ⊆ 휖 휖 1 2 푖 푖 We define subsumption 1 2 on effects to hold when 2 RbSyn searches for a program 푃 such that, for all 푖, assuming 휖 • ∗ includes 1. Effects and are the bottom and top, respec- that 푆 calls 푃 with an argument of type 휏 , evaluating to 푥 ⊆ 퐴 ≤ 퐴 퐴 .푟 ⊆ 퐴 .푟 푖 1 푟 tively, of the relation, and if 1 2 then 1 2 and of type 휏 , it is the case that 푃 ⊢ 푆 ;푄 ⇓ 푣. In other words, 퐴 .푟 ⊆ 퐴 .∗ 퐴 .∗ ⊆ 퐴 .∗ 2 푖 푖 1 2 and 1 2 . We also have standard rules evaluating the setup followed by the postcondition yields for subsumption with effect unions. some value rather than aborting with a failed assertion. We In RbSyn, all effects arise from calling methods from the omit the evaluation rules as they are standard. class table 퐶푇 , which have effect annotations of the form ⟨휖푟 , 휖푤⟩, where 휖푟 and 휖푤 are the method’s read and write 3.1 Type-Guided Synthesis effects, respectively. We extend subsumption to such paired The first component of RbSyn is type-guided synthesis, effects in the natural way. During synthesis, if RbSyn ob- which creates candidate expressions of a given type by try- serves the failure of an assertion with some read effect 휖푟 , it ing to fill a hole □ : 휏2 where 휏2 is the method return type. tries to fix the failure by inserting a call to some method with Figure4 shows a subset of the type-guided synthesis rules; 휖 휖 ⊆ 휖 write effect 푤 such that 푟 푤, i.e., it tries writing to the the full set can be found in Appendix A.2. These rules have state that is read. For example, in Section2, this technique the form Σ, Γ ⊢퐶푇 푒1 ⇝ 푒2 : 휏, meaning with constants Σ, generated a call to Post#title. in type environment Γ, under class table 퐶푇 , the holes in 푒1 Our effect language is inspired by the region path lists can be rewritten to yield 푒2, which has type 휏. approach of Bocchino Jr et al.[4], but is much simpler. We The rules in Figure4 have two forms. The T- rules apply opted for coarse-grained, abstract effects to make it easier to to expressions whose outermost form is not rewritten. Thus write annotations for library methods. Although class names these rules perform standard type checking. For example, T- are included in the effect language, such names are for hu- Var type checks a variable 푥 by checking its type against the man convenience only—nothing precludes a method in class type environment Γ, leaving the term unchanged. T-Let type- 퐴 퐵.푟 being annotated with an effect to for some other class checks and recursively rewrites (or not) the subexpressions 퐵 . We found that this approach works well for our problem and then rewrites those new expressions into a let-binding, setting of synthesizing code for Ruby apps, where trying ensuring the resulting term is type-correct. Finally, T-Hole to precisely model heap and database state would be diffi- applies to a typed hole that is not being rewritten, in which cult. However, we believe the core of this approach—pairing case it remains the same and has the given type. 5 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

Σ, Γ, 휖푟 ⊢퐶푇 푒 ↠ 푒 The first must be filled with an expression of the desired effect 휖푟 . The second must have 푒’s type 휏, to preserve type- Σ, Γ ⊢퐶푇 푒 ⇝ 푒 : 휏 S-Eff correctness. For example, it could be filled by 푥, as happened , , 휖 ⊢ 푒 let 푥 = 푒 in ( 휖 휏) Σ Γ 푟 퐶푇 ↠ ^ : 푟 ; □ : in Figure2 when t0 is returned. The rules for working with effect holes are shown in the , 푒 푒 휏 Σ Γ ⊢퐶푇 ⇝ : bottom of Figure5, which extends Figure4. T-EffObj gives an effect hole, that is not rewritten, type Obj. Since this is , 휖 휖 T-EffObj Σ Γ ⊢퐶푇 ^ : ⇝ (^ : ) : Obj the top of the type hierarchy, this ensures an effect hole

⟨ ′ ′ ⟩ can safely be replaced by a term with any type. In other 휖 휖 ′ 푚 휏 휖푟 ,휖푤 휏 퐶푇 퐴 푟 ⊆ 푤 : 1 −−−−−−→ 2 ∈ ( ) words, effect holes are filled for their effects, not their types. ′ S-EffApp S-EffApp does the heavy lifting, filling an effect hole with a Σ, Γ ⊢퐶푇 ^ : 휖푟 ⇝ □ : 휖 ; (□ : 퐴).푚(□ : 휏1) : 휏2 푟 푚 휖 ′ call to a method with a write effect 푤 that subsumes the S-EffNil desired effect 휖 . Of course, this call may itself read state 휖 ′, , ⊢ 휖 nil 푟 푟 Σ Γ 퐶푇 ^ : ⇝ : Nil so the rule precedes the method call with a hole with that effect, in case said state needs to change. Finally, S-EffNil Figure 5. Effect guided synthesis rule replaces an effect hole with nil, which removes it from the program. This is used in case some extra effect holes are The S- rules rewrite typed holes. S-Const replaces a hole added that are not actually needed. by a constant of the correct type from Σ. S-Var is similar, 3.3 Merging Solutions replacing a hole by a variable from Γ. Finally, S-App replaces a hole with a call to a method with the right return type, The last component of RbSyn combines expressions that pass inserting typed holes for the method receiver and argument. individual specs into a final program that passes all specs. More specifically, given a synthesis goal ⟨휏1 → 휏2, {푠푖 }⟩, Type Narrowing. Notice that in these three S-rules, the term RbSyn first uses type- and effect-guided synthesis to create replacing the hole may actually have a subtype of the orig- expressions 푒푖 such that 푒푖 is the solution for spec 푠푖 . Then, inal hole’s type. Thus, type-guided synthesis could narrow RbSyn combines the 푒푖 into a branching program roughly of types in a synthesized program, potentially also narrow- the form if 푏1 then 푒1 else if 푏2 then 푒2 ... for some 푏푖 . ing the search space. For example, consider an expression For each 푖, RbSyn uses the type-guided synthesis rules in . (□1 : Str) append(□2 : Str) that joins two strings, and as- § 3.1 to synthesize a 푏푖 such that under the setup 푆푖 of spec sume the set of constants Σ includes nil. Notice that nil is 푠푖 , conditional 푏푖 evaluates to true, i.e., def 푚(푥) = 푏푖 ⊢ a valid substitution for □1, which will then cause the type of 푆푖 ; assert 푥푟 ⇓ 푣. Note effect-guided synthesis is not used the receiver to narrow to Nil. But then the typing derivation here as the asserted expression 푥푟 is pure. fails because the Nil type has no append method, stopping Notice that while each initial 푏푖 evaluates to true under further exploration along this path. In contrast, if we had the precondition, there is no guarantee it is a sufficient con- typed the replacement term at Str, then RbSyn would have dition for 푠푖 to satisfy the postcondition—especially because fruitlessly continued the search, trying various replacements RbSyn aims to synthesize small expressions, as discussed for □2 only to reject them due to a runtime failure for invok- further in §4. Moreover, there may be multiple 푒푖 that are ac- ing a method on nil. tually the same expression, and therefore could be combined to yield a smaller solution. 3.2 Effect-Guided Synthesis Thus, RbSyn next performs a merging step to create the The second component of RbSyn is effect-guided synthesis, final solution. This process operates on tuples of theform used when type-guided synthesis creates a candidate that ⟨푒,푏, Ψ⟩, which is a hypothesis that the program fragment does not satisfy the postcondition of the tests. If this happens, if 푏 then 푒 satisfies the specs Ψ. RbSyn repeatedly merges RbSyn computes the effect ⟨휖푟 , 휖푤⟩ of the failed assertion in such tuples using an operation ⟨푒1,푏1, Ψ1⟩ ⊕ ⟨푒2,푏2, Ψ2⟩ to the postcondition. (We defer the formal rules for computing represent that if 푏1 then 푒1 else if 푏2 then 푒2 satisfies the Ð this effect to Appendix A.1, as they simply union the effects specs Ψ1 ∪ Ψ2. We define Specs(⟨푒1,푏1, Ψ1⟩ ⊕ ...) = Ψ푖 , i.e., of method calls in the assertion.) Then, we hypothesize that the specs from merged tuples, and Prog(⟨푒1,푏1, Ψ1⟩ ⊕ ...) the assertion may have failed because the region denoted by = def 푚(푥) = if 푏1 then 푒1 else ..., a definition with the 휖푟 is in the wrong state. expression represented by the merged tuples. To potentially fix the state, RbSyn applies a new rule S-Eff, Figure6 defines rewriting rules that are applied to create shown in Figure5. The hypothesis computes the type 휏 of the final solution. Rule1 simplifies the case where 푒1 and 푒2 푒, the candidate expression that failed the postcondition. In are the same and 푏1 implies 푏2, yielding a single expression the conclusion, 푒 is rewritten to let 푥 = 푒 in (^ : 휖푟 ; □ : 휏), and branch that satisfy Ψ1 ∪ Ψ2. Note we omit the symmetric i.e., 푒 is computed, bound to 푥, and two holes are sequenced. case for all rules due to space limitations. Rule2 applies when 6 Conference’17, July 2017, Washington, DC, USA

⟨푒1,푏1, Ψ1⟩ ⊕ ⟨푒2,푏2, Ψ2⟩ = ⟨푒1,푏1, Ψ1 ∪ Ψ2⟩ Algorithm 1 Merge programs (1) if 푒 ≡ 푒 and 푏 =⇒ 푏 1: procedure MergeProgram(candidates = {⟨푒푖,푏푖, Ψ푖 ⟩}) 1 2 1 2 É 2: merged ← { ⟨푒푖,푏푖, Ψ푖 ⟩} ⟨푒1,푏1, Ψ1⟩ ⊕ ⟨푒2,푏2, Ψ2⟩ = ⟨푒1,푏1 ∨ 푏2, Ψ1 ∪ Ψ2⟩ (2) 3: final ← {} 푒 ≡ 푒 푏 ≠⇒ 푏 if 1 2 and 1 2 4: for all 푚 ∈ merged do 푒 ,푏 , 푒 ,푏 , 푒 ,푏푠푦푛, 푒 ,푏푠푦푛, 푚 ← 푚 ⟨ 1 1 Ψ1⟩ ⊕ ⟨ 2 2 Ψ2⟩ = ⟨ 1 1 Ψ1⟩ ⊕ ⟨ 2 2 Ψ2⟩ 5: apply (1)-(3) to until no rewrites possible 6: final ← final ∪ {푚} if ∀⟨푆 , 푄 ⟩ ∈ Specs(푚). if 푒1 . 푒2 and 푏1 =⇒ 푏2 푖 푖 7: Ó Prog(푚) ⊢ 푆 ;푄 ⇓ 푣 푆 , 푄 . 푚 푥 푏푠푦푛 푆 푥 푣 푖 푖 where ∀⟨ 푖 푖 ⟩ ∈ Ψ1 def ( ) = 1 ⊢ 푖 ; assert 푟 ⇓ 푖 8: end for ∧ ∀⟨푆 , 푄 ⟩ ∈ Ψ .def 푚(푥) = 푏푠푦푛 ⊢ 푆 ; assert !푥 ⇓ 푣 푗 푗 2 1 푗 푟 9: return Prog(푚) s.t. 푚 ∈ final 푆 , 푄 . 푚 푥 푏푠푦푛 푆 푥 푣 and ∀⟨ 푖 푖 ⟩ ∈ Ψ1 def ( ) = 2 ⊢ 푖 ; assert ! 푟 ⇓ 10: end procedure 푆 , 푄 . 푚 푥 푏푠푦푛 푆 푥 푣 ∧ ∀⟨ 푗 푗 ⟩ ∈ Ψ2 def ( ) = 2 ⊢ 푗 ; assert 푟 ⇓ (3) Constructing the Final Program. Finally, notice that the merge operation ⊕ is not associative, and it may yield dif- Figure 6. Rewriting rules. ferent results depending on the order in which it is applied. Thus, to get the best solution, RbSyn uses Algorithm1. It builds the set of all possible merged fragments (line2). Then 푏 does not imply 푏 but 푒 and 푒 are the same. In this case, 1 2 1 2 it simplifies each candidate solution using the rewrite rules 푒 satisfies the union of the specs under the disjunction ofthe 1 and only considers a candidate valid if it passes all tests. branch conditions. (Note this rule could also applied if 푏 ⇒ 1 It returns any such program as the solution. This branch 푏 , but the resulting solution would be longer than Rule1 2 merging strategy tries all combinations, so it is less sensi- generates.) Finally, Rule3 applies when 푒 and 푒 differ but 푏 1 2 1 tive to spec order than other component based synthesis implies 푏 . In such a scenario, 푏 holds for both 푒 and 푒 and 2 2 1 2 approaches [30]. In practice, we found that reordering the thus it must be that푏 and푏 are insufficient to branch among 1 2 specs does not have much effect. 푒1 and 푒2. Thus, RbSyn synthesizes a stronger conditional 푏푠푦푛 1 that hold for all specs in Ψ1 and does not hold for the 3.4 Discussion 푏푠푦푛 specs in Ψ2, and the reverse for 2 . For example, recall the application of this rule in the example of §2, to synthesize a Before discussing our implementation in the next section, more precise branch condition because the initial condition we briefly discuss some design choices in our algorithm. true was the same for both branches. Our effect system uses pairs of read and write effectsin RbSyn also includes a number of other merging rules, regions. As mentioned, this core idea could be extended to deferred to Appendix A.4, for further simplifying expressions. any effects in a test assertion that can be paired with aneffect For example, if 푏 then 푒 else if !푏 then 푒 else nil can in the synthesized method body. For example, throwing and 1 1 1 2 catching exceptions, I/O to disk or network, or enabling/dis- be rewritten as if 푏1 then 푒1 else 푒2, which was used to generate the solution in Figure2. abling features in a UI could all be expressed this way. We leave exploring such effect pairs to future work. Checking Implication. Checking the implications in Fig- One convenient feature of our algorithm is that correct- ure6 is challenging since branch conditions may include ness is determined by passing specs, which are directly exe- method calls whose semantics is hard to reason about. To cuted. Thus, the synthesizer can generate as many candidates solve this problem, RbSyn checks implications using a heuris- as it likes—i.e., be as over approximate as it likes—as long tic approach that is effective in practice. Each unique branch as its set of candidates includes the solution. This feature condition 푏 is mapped to a fresh boolean variable 푧. Simi- enables RbSyn to use a fairly simple effect annotation system larly, !푏 is encoded as ¬푧, and 푏1 ∨ 푏2 is encoded as 푧1 ∨ 푧2. compared to effect analysis tools [4]. Then to check an implication 푏1 ⇒ 푏2, RbSyn uses a SAT We could potentially adapt our algorithm to work in a solver to check the implication of the encoding. While this capability-based setting, using the observation that capabili- check could err in either direction (due to not modeling the ties and effects are related [6, 8, 19]. In this setting, assertion semantics of the 푏푖 precisely), we found it works surprisingly failures in tests would indicate specific capabilities needed well in practice. In case the implication check fails due to by the synthesized code. We leave exploring this idea further lack of precision, we fall back on the original ⊕ form which to future work. represents the complete program if 푏1 then 푒1 else if ... Finally, we distinguish typed holes from effect holes, rather without loss of precision. Should the implication check in- than have a single type-and-effect hole, to control where to correctly succeed, it will be caught by running the merged use type-guidance and where to use effect-guidance. When program against the assertions. initially trying to synthesize a method body, we omit effects 7 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn because it is unclear which effects are needed. For example, solution: after running tests and effect-guided synthesis on in Figure1, the second spec has read effects on all fields ofthe such candidates, their size increases, but if they do not pass post, and yet the target method does not write any fields, as more assertions, they are pushed further down the search the spec is checking the case when the post is not modified. queue. We leave experimenting with other search strategies Thus, we cannot simply compute the union of all read effects to future work. in all assertions and use those for effect guidance. Moreover, Effect Annotations. We extended RDL to support effect an- type-guided synthesis often will synthesize effectful expres- notations along with type annotations for library methods. sions, e.g., the call to Post.where in Figure2. Conversely, Programmers specify read and write effects following the our algorithm only places effect holes in positions where grammar in §3. For example a method annotated with a write the type does not matter—hence type information for such a effect Post.author writes to some region author in some hole would not add anything. Nonetheless, type-and-effect object of class Post. Here author is an uninterpreted string, holes would be a simple extension of our approach, and we selected by the programmer. Similarly the labels “.” and “∗” leave exploration of them to future work in other synthesis stand for pure and any region (or simply “impure”), respec- domains. tively. A region Post.* is written as Post for convenience. 4 Implementation One important extension is a self effect region, which in- dicates a read or write to the class of the receiver. This is RbSyn is implemented in approximately 3,600 lines of Ruby, essential for supporting ActiveRecord, whose query methods excluding its dependencies. are inherited by the actual Rails model classes. For exam- Synthesis specifications, as discussed in§2, are written ple, we use the self effect on the exists? query method of in a custom domain-specific language. Each has the form: ActiveRecord::Base. Then at a call Post.exists?, where define :name, "method-sig", [consts,...] do Post inherits from ActiveRecord::Base, we know the query spec "spec1" do setup { ... } postcond { ... } end ... reads the Post table and not any other table. end Effect annotations are similar to frame conditions5 [ , 14, where :name names the method to be synthesized; method-sig 27] used in verification literature. More precise effect an- is its type signature; and consts lists constants that can be notations help RbSyn find a solution faster because it will used in the synthesized method. Each spec is a test case the have fewer methods with subsumed effects than an impre- method must pass: setup describes the test case setup, and cise one, shrinking the search space. But effect precision postcond makes assertions about the results. does not affect the correctness of the synthesized program, In Ruby, do...end and {...} are equivalent syntax for since correctness is ensured by the specs. For example, if creating code blocks, i.e., closures. Having the setup and post- the effect annotation for the method Post#title= shown in condition in separate code blocks allows RbSyn to run the § 2.1 had just Post as its write annotation, synthesis would setup code and check the postcondition independently. still work, but would try more candidate programs. In some RbSyn also has optional hooks for resetting the global cases, coarse effects are required, e.g. the Post.where method state before any setup block is run. This ensures candidate queries records from the Post table. It has the coarser Post programs are tested in a clean slate without being affected by annotation because which columns such a query will access side-effects from previous runs. In our experiments, RbSyn cannot be statically specified: it depends on the arguments. resets the global state by clearing the database. We evaluate some of the tradeoffs in effect precision in§ 5.4. Program Exploration Order. While our synthesis rules are Type Level Computations. RbSyn uses RDL [15, 34] to rea- non-deterministic, our implementation is completely deter- son about types, e.g., checking if one type is a subtype of ministic. This makes it sensitive to the order in which ex- another, and using the type environment and class table to pressions are explored. RbSyn uses two metrics to prioritize find terms that can fill holes. RDL includes type-level compu- search. First, programs are explored in order of their size; tations [26], or comp types, in which certain methods’ types smaller programs are preferred over larger ones. Program include computations that run during type checking. For size is calculated as the number of AST nodes in the program. example, a comp type for the ActiveRecord#joins method Second, RbSyn prefers trying effect-guided synthesis for can compute that A.joins(B) returns a model that includes expressions that have passed more assertions rather than all columns of tables A and B combined. Using a comp type fewer. (Appendix A.1 formally describes counting passed for joins encodes a quadratic number of type signatures, assertions.) Untested candidates are assumed to have passed for different combinations of receivers and arguments, into a zero assertions. In general, expressions are explored in de- single type, and more for joins of more than two tables [26]. creasing order of number of passed assertions, then in in- RbSyn uses RDL’s comp types, but with new type sig- creasing order of program size. natures designed for synthesis. In particular, the previous These metrics combined also help when RbSyn synthe- version of RDL’s comp types gave precise types when the sizes a candidate that does not make any progress towards a receiver and arguments were known, e.g., in A.joins(B), 8 Conference’17, July 2017, Washington, DC, USA

RDL knows exactly which two classes are being joined. But • How does RbSyn perform using code based on existing this may not hold during synthesis, e.g., if B is replaced by a unit tests in widely deployed applications? (§ 5.2) hole in the example, then the exact return type of the joins • How much improvement is type-and-effect guidance call cannot be computed. compared to alternatives such as only type-guidance To address this issue, we modified RDL’s existing comp or only effect-guidance? (§ 5.3) type signatures for ActiveRecord methods like joins so • How does the precision of effect annotations affect that they compute all possible types. For example, if a hole synthesis performance? (§ 5.4) is an argument to joins, then the type finds all models B1, B2, ... that could be joined (i.e., those with associations); 5.1 Benchmarks gives the hole type B1 ∪ B2 ∪ ...; and sets the return type To answer the questions above, we collected a benchmark of joins to a table containing the columns of A, B1, B2,.... suite comprised of programs from the following sources: This over-approximation is narrowed as the argument terms • Synthetic benchmarks is a set of minimal examples that are synthesized, leading to cascading narrowing of types demonstrate features of RbSyn. throughout the program as discussed in § 3.1. • Discourse [23] is a Rails-based discussion platform used by over 1,500 companies and online communities. Optimizations. Synthesis of terms that pass a spec is an • Gitlab [18] is a web-based Git repository manager with expensive procedure. In practice, we found solutions to a wiki, issue tracking, and CI/CD tools built on Rails. single spec often satisfy others. Thus, when confronted with • Diaspora [9] is a distributed social network, with groups a new spec, RbSyn first tries existing solutions and condi- of independent nodes (called Pods), also built on Rails. tionals to see if they hold for the spec, before falling back on synthesis from scratch if needed. This makes the bottleneck We selected these apps because they are popular, well- for synthesis not the number of tests, but the number of maintained, widely used, and representative of programs unique paths through the program. Moreover, this reduces that are written with supporting unit tests. We selected a the number of tuples for merging, as a single expression and subset of the app’s methods for synthesis, choosing ones that conditional tuple can represent multiple specs Ψ. fall into the Ruby grammar we can synthesize: method calls, Finally, we found that in practice, the condition in one hashes, sequences of statements and branches. We currently spec often turns out to be the negation of the condition do not synthesize blocks (lambdas), for/while loops, case in another. Thus during synthesis of conditionals, RbSyn statements, or meta-programming in the synthesized code. tries the negation of already synthesized conditionals before All benchmarks from apps have side effects due to either falling back on synthesis from scratch. database accesses or reading and writing globals. Table1 lists the benchmarks. The first column group lists Limitations. While RbSyn works on a wide range of pro- the app name (or Synthetic for the synthetic benchmarks); grams, as we will demonstrate next, it does have several the benchmark id; the benchmark name; and the number of key limitations. First, RbSyn currently only synthesizes code specs. The synthetic benchmarks exercise features of RbSyn that does not need type casts to be well-typed. This ensures by synthesizing pure methods, methods with side effects, programs do not have type errors at run time, but eliminates methods in which multiple branches are folded into a single some valid programs from consideration. Second, the set of line program, etc. The Discourse benchmarks include a num- constants RbSyn can use during synthesis is fixed ahead of ber of effectful methods in the User model, such as methods time. This places programs that use unlikely constants out to activate an user account, unstage a placeholder account of reach, e.g., we have encountered Rails model methods created for email integration, etc. The Gitlab benchmarks that include raw SQL query strings (instead of only using Ac- include methods that disable two factor authentication for tiveRecord). Finally, because RbSyn uses enumerative search, a user, methods to close and reopen issues, etc. Finally, the it can face a combinatorial explosion when searching for Disaspora benchmarks include methods to confirm a user’s nested method calls, e.g., if there are 푛 possible method calls, email, accept a user invitation, etc. available, synthesizing 퐴.푚(퐴.푚(퐴.푚(푥))) may require an We derived the specs for the non-synthetic benchmarks 푂 (푛3) search. In practice, we did not face this problem as directly from the unit tests included in the app. We split deeply nested method calls are rarely used in Rails apps. each test into setup and postcondition blocks in the obvious way, and we added an appropriate type annotation to the synthesis goal. Across all benchmarks, we started with a 5 Evaluation base set of constants (Σ in §3) to be true, false, 0, 1 and the We evaluated RbSyn by using it to synthesize a range of empty string. Then we added nil and singleton classes (for benchmarks extracted from widely used open source appli- calling class methods) on a per benchmark basis as needed. cations that use a variety of libraries. We pose the following (As with many enumerative search based methods, we rely questions in our evaluation: on the user to provide the right set of constants.) 9 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

# Asserts # Orig # Lib Time (sec) Meth # Syn

Group ID Name Specs Min Max Paths Meth Median ± SIQR Types Effects Neither Size Paths S1 lvar 1 1 1 1 164 0.34 ± 0.01 1.36 11.97 - 4 1 S2 false 1 1 1 1 164 0.35 ± 0.01 1.37 12.19 - 4 1 S3 method chains 2 1 1 1 164 0.98 ± 0.01 9.56 - - 10 1 S4 user exists 2 1 1 1 164 0.98 ± 0.02 9.52 - - 9 1 S5 branching 3 1 1 2 165 2.49 ± 0.07 38.37 - - 17 2 Synthetic S6 overview (ext) 3 4 4 3 164 12.78 ± 0.09 - - - 72 3 S7 fold branches 3 1 1 1 164 82.44 ± 0.95 218.51 - - 13 1 A1 User#clear_glob... 3 2 2 3 169 2.11 ± 0.04 - - - 24 3 A2 User#activate 2 (3) 1 4 2 170 8.95 ± 0.23 - - - 28 2 A3 User#unstage 3 (4) 1 5 2 164 50.02 ± 0.55 - - - 31 2

Discourse A4 User#check_site... 5 1 1 2 168 51.6 ± 0.23 - - - 28 3 A5 Discussion#build 1 4 4 1 167 0.24 ± 0.01 - - - 18 1 A6 User#disable_two... 1 10 10 1 164 0.25 ± 0.01 - 0.44 - 22 1 A7 Issue#close 1 (2) 3 3 1 166 0.77 ± 0.03 25.99 0.13 0.37 15 1 Gitlab A8 Issue#reopen 1 (3) 5 5 1 166 3.68 ± 0.1 - 0.55 45.66 17 1 A9 Pod#schedule_... 3 (4) 1 1 2 161 2.44 ± 0.04 - - - 19 2 A10 User#process_inv... 1 2 2 2 165 2.64 ± 0.05 0.81 - 0.85 12 1 A11 InvitationCode#use! 1 1 1 1 165 4.23 ± 0.06 - - - 12 1

Diaspora A12 User#confirm_email 7 4 4 2 166 7.28 ± 0.11 - - - 31 3 Table 1. Synthesis benchmarks and results. # Specs is the number of specs used to synthesize the method; Asserts reports the minimum and maximum number of assertions over all specs for every benchmark; # Orig Paths is the number of paths through the method as written in the app; # Lib Meth is the number of library methods used for every benchmark; Time shows the median and semi-interquartile range over 11 runs, followed by the median time for synthesis using only types, only effects and naive term enumeration (Neither). Meth Size is the number of AST nodes in the synthesized method; # Syn Paths shows the number of paths through the synthesized method.

A few apps have several different unit tests with exactly methods, such as those from ActiveRecord, and required the same setup but different assertions in the postcondition. app-specific methods. We slightly modify the set of library We merged any such group of tests into a single spec with methods for A9, as discussed further below. that setup and the union of the assertions as the postcondi- To find effect labels for app-specific methods, we found tion, to ensure that every spec setup can be distinguished examining the method name and quickly scanning its code with a unique branch condition, if necessary. We indicate this was typically quite helpful. Often it was clear if a method was in the # Specs column of Table1 by listing the final number of pure or impure. For impure methods, there were a few cases. specs followed by the original number of tests in parentheses Sometimes, methods access the same object fields irrespec- if they differ. We report the minimum and maximum number tive of how the method is called, so we give such methods the of assertions over all specs per benchmark in the Asserts most precise labels, e.g., we used effect InvitationCode.count columns and the number of paths through the method in the for benchmark A10. Other times, it is apparent the method true canonical solution (from the app) in the # Orig Paths accesses different fields of a class depending on the method’s column. arguments or the global state, so we give these class effect labels, e.g., User (equivalent to User.*). Overall, the sim- Annotations for Benchmarks. Finally, the # Lib Meth col- plicity of the effect system helped here, as we could use umn lists the number of library methods available during human-readable region identifiers even without any object synthesis. These are methods for which we provided type- references, e.g., the effect InvitationCode.count abstracts and-effect annotations. In total, 164 such methods are shared over all possible instances of InvitationCode class. across all benchmarks, including, e.g., ActiveRecord and core The other main category of effect labels was for Rails li- Ruby libraries. Since our benchmarks are sourced from full braries such as ActiveRecord. We constructed these labels apps, they often also depend on some other methods in the by following the documentation. For - app. We wrote type-and-effect annotations for such methods generated column accessor methods, we extended RDL’s and included those annotations only when synthesizing that existing type generating annotations [34] to also generate app. Since RbSyn needs to run the synthesized code, when effects. For example, when RDL creates the type signature running specs we include the code for both general-purpose for an accessor method Post#title for the title column of 10 Conference’17, July 2017, Washington, DC, USA the Post table, it now also creates a read effect annotation 19 TE Enabled Post.title for it. 16 E Only Overall, we found writing effect annotations to be eas- T Only ier than our previous efforts writing type annotations for 13 TE Disabled Ruby [26, 34], though of course we relied on that previous Synthetic experience. We leave a systematic evaluation of the effort of 10 Apps writing effect annotations to future work. 7 # of benchmarks

5.2 Synthesis Correctness and Performance 4 RbSyn successfully synthesized methods that pass the specs 1 for every benchmark. We manually examined the output and 0 25 50 75 100 125 150 175 200 found that the synthesized code is equivalent to the original, Time (s) human-written code, modulo minor differences that do not change the code’s behavior in practice. For example, one such Figure 7. Number of benchmarks synthesized using type- difference occurs with original code that updates multiple and-effect (TE Enabled) guided synthesis relative to using database columns with a single ActiveRecord call, and then only type (T Only) or effectE ( Only) guidance separately and has a sequence of asserts to check that each updated column naive enumeration (TE Disabled). Higher is better. is correct. Because RbSyn considers the effects of assertions in the postcondition one by one, it instead synthesizes a sequence of database updates, one per column. Another dif- because the remainder of the assertion looks at only one par- ference occurs in Gitlab, which uses the state_machine gem ticular field—but that one read is subsumed by the effectof (an external package) to maintain an issue’s state (closed, the reload, making it invisible to RbSyn’s search. As a result, reopened, etc). RbSyn synthesizes correct implementations synthesis for A9 slows down by two orders of magnitude. that work without the gem. We addressed this by removing four ActiveRecord methods The middle group of columns in Table1 summarizes Rb- that manipulate specific fields and adding ActiveRecord’s Syn’s running time. We set a timeout of 300 seconds on all update! method as the only way to write a field back to the experiments. The first column reports performance num- database. An alternative approach would have been to move bers for the full system as the median and semi-interquar- the reload call to be outside the assertion. tile range (SIQR) of 11 runs on a 2016 Macbook Pro with a As this example shows, and as is common with many syn- 2.7GHz Intel Core i7 processor and 16GB RAM. The next thesis problems, performance is very hard to predict. Indeed, three columns show the median performance when RbSyn we can see from Table1 that performance is generally not uses only type-guidance, only effect-guidance, and naive well correlated with either the size of the output program enumeration, respectively. The SIQRs (omitted due to space or with the number of branches. The number of assertions constraints) for these runs are very small compared to the (which direct the side effect guided synthesis) does not cor- median runtime, similar to the performance numbers with all relate with the synthesis time. We do observe that RbSyn’s features enabled. We discuss the runs with certain guidance branch merging strategy is effective, often producing fewer disabled in detail in § 5.3. The right-most group of columns conditionals than there are specs, e.g., in A12 there are seven shows the synthesized method size (in terms of number of specs but only three conditionals. Though, sometimes the AST nodes) and the number of paths through the method (1 results are not always optimal if the branch merging strat- for straight-line code). egy finds a program that passes all tests, but a program with Overall, RbSyn runs quickly, with around 80% of bench- fewer branches exists, e.g., for A4 and A12, RbSyn produces marks solving in less than 9s. Benchmarks like A3 take longer a program with one more branch than the hand-written. because it requires synthesis of nil terms—recall nil is the bottom element of our type lattice, causing RbSyn to synthe- size nil at every typed hole for method arguments. Conse- 5.3 Performance of Type- and Effect-Guidance quently, this requires testing all completed candidates—even Next, we explore the performance benefits of type- and effect- though they eventually fail—consuming significant time. guidance. Figure7 plots the running times from Table1 For one benchmark, A9, we changed the set of default when all features of RbSyn are enabled (TE Enabled), with library methods slightly due to some pathological behavior. only type-guidance (T Only), with only effect-guidance (E This benchmark includes an assertion that invokes ActiveRe- Only) and with neither (TE Disabled). The plot shows the cord’s reload method, which touches all fields of that record. number of benchmarks that complete (푦-axis) in a given But then when RbSyn tries to find matching write effects, amount of time (푥-axis), based on the median running times. it explores a combinatorial explosion of writes to different This experiment serves as a proxy to show how a synthesis subsets of the fields. This effort is almost entirely wasted, procedure that uses type-guidance but not effect-guidance, 11 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

300 From these experiments, we see that synthesis time in- 290 creases as effect annotation precision decreases, often lead-

80 Precise Effects ing to a timeout. Class labels were sufficient to synthesize 16 Class Effects of 19 benchmarks. Overall, class labels take time similar to Purity Effects 60 precise labels, except for the three cases (A8, A11, and A3) 40 where side-effecting method calls require precise labels to Time (s) quickly find the candidate. As all precise effects are reduced 20 to class effects, RbSyn must try many candidates with class 0 effect before finding the correct one, leading to timeouts. S2 S1 S4 S3 S5 A9 A6 A1 S7 A5 A10 A7 A2 A12 S6 A4 A8 A11 A3 Benchmarks We note that A1 and A4 are slightly faster when using class effects. The reason is an implementation detail. The Figure 8. Performance of RbSyn with varying effect annota- effect holes in these benchmarks can only be correctly filled tion precision: full, class effects only, and purity annotations by methods whose regular annotations are class annotations on library methods. Lower is better. Full height indicates (more precise annotations are not possible). However, when timeout. trying to fill holes, RbSyn first tries all methods with precise annotations, only afterward trying methods with class an- notations. Since the precise annotations never match, this such as SyPet [12] or Myth [16, 29], may have performed if yields worse performance under the precise effect condition adapted for Ruby. than under the class effect condition, when the search could We can clearly see that type- and effect-guided synthesis by chance find the matching methods sooner. performs best, successfully synthesizing all benchmarks; the Purity labels only enabled synthesis of 9 benchmarks, in- slowest takes 83s. In contrast, with both strategies disabled, cluding just 3 of 12 app benchmarks. The purity annotations all but three small benchmarks time out. Performance with are slow in general and only effective in the cases where the only type- or only effect-guidance lies in between. With only number of impure library methods is small. type-guidance, synthesis completes on eight benchmarks, of which the majority are pure methods from the synthetic benchmarks. From apps, it only synthesize A7 and A10. In 6 Related Work these benchmarks, the needed effectful expressions are small Component-Based Synthesis. Several researchers have pro- and hence can be found with essentially brute-force search. posed component-based synthesis, which creates code by With only effect-guidance, synthesis performance signifi- composing calls to existing APIs, as RbSyn does. For example, cantly worse, completing only five benchmarks, of which Jha et al.[25] propose synthesis of loop-free programs for only three are from apps. These benchmarks succeeded be- bit-vector manipulation. Their approach uses formal spec- cause effect-guided synthesis quickly generates the template ifications for synthesis, in contrast to RbSyn, which uses for the effectful method calls and then correctly fills them unit tests. Hoogle+ [24] uses Haskell tests and types to syn- since they are small and can be found quickly by naive enu- thesize potential solutions, primarily geared towards API meration. discovery. CodeHint [17] synthesizes Java programs, using a probabilistic model to guide the search towards expres- 5.4 Effect Annotation Precision vs. Performance sions more often used in practice. SyPet [12] also synthe- Finally, we explore the tradeoff between effect annotation sizes programs that use Java APIs, by modeling them as a precision and synthesis performance. Recall that we found petri net and using SAT-based techniques to find a solution. writing effect annotations easier for our benchmarks than These approaches do not support synthesis of programs with writing type annotations. However, the effort can be further branches, which are common in the domain of web apps. minimized by writing less precise annotations. This will While SyPet supports synthesis with side-effecting methods not affect correctness, since RbSyn only accepts synthesis and CodeHint detects undesirable side effects during the candidates that pass all specs, but it does affect performance. search and avoids them, RbSyn uses side effect information Figure8 plots the median of synthesis times for bench- from test cases to guide the search. marks over 11 runs under three conditions: Precise Effects, which are the effects used above; Class Effects, in which anno- Programming by Example. Myth [16, 29] uses bidirec- tations include only class names and eliminate region labels tional type checking to synthesize programs, using input/out- (e.g., Post.title becomes Post); and Purity Effects, in which put examples as the specification. However, Myth expects ex- the only effect annotations are pure or impure (the • and amples to be trace complete, meaning the user has to provide ∗ effects, respectively, in our formalism). The benchmarks input/output examples for any recursive calls on the function (x-axis) are ordered in increasing order of time for Purity arguments. RbSyn does not synthesize recursive functions, Effects, then Class Effects, and finally Precise Effects. as they are rarely needed in our target domain of Ruby web 12 Conference’17, July 2017, Washington, DC, USA apps. Escher [1] and spreadsheet manipulation tools [20– has the additional cost of requiring formal specifications of 22] all accept input/output examples as a partial specification library method semantics, an impractical task in the Rails set- for synthesis. These tools primarily target users who cannot ting. SuSLik [32] synthesizes heap-manipulating programs program, whereas RbSyn is targeted towards programmers. using separation logic to precisely model the the heap. Rb- In addition, RbSyn’s specs are full unit tests, so they can Syn, in contrast, uses very coarse effects to track accesses check both return values and side effects. 휆2 [13] synthesizes that can go beyond the heap, such as database reads and data structure transformations using higher-order functions, writes. a feature not handled by RbSyn because of our target do- main of Rails web apps, which rarely use such functions. 7 Conclusion STUN [2] uses a program merging strategy that is similar to ours, but it depends on defining domain-specific unification We presented RbSyn, a system for type- and effect-guided operators to safely combine programs under branches. In program synthesis for Ruby. In RbSyn, the synthesis goal is contrast, our approach may be more domain-independent, described by the target method type and a series of specs using preconditions and tests to find correct branch condi- comprising preconditions followed by postconditions that tions. There have been multiple approaches to synthesizing use assertions. The user also supplies the set of constants database programs [7, 11]. Perhaps the closest in purpose to the synthesized method can use, and type-and-effect an- RbSyn is Scythe [38], which synthesizes SQL queries based notations for any library methods it can call. RbSyn then 휏 on input/output examples. Scythe uses a two-phased syn- searches for a solution starting from a hole □ : typed with 휖 thesis process to synthesize an abstract query, after which the method’s return type, inserting (write) effect holes ^ : enumeration is used to concretize the abstract query. In con- derived from the read effects of failing assertions. Finally, trast, the use of comp types [26] allows RbSyn to quickly RbSyn merges together solutions for individual specs by construct a template for a database query. With precise types synthesizing branch conditions to select among the different for the method argument holes, this essentially builds ab- solutions as needed. We evaluated RbSyn by running it on a stract queries for free, whose holes are then filled later during suite of 19 benchmarks, 12 of which are representative pro- synthesis. grams from popular open-source Ruby on Rails apps. RbSyn synthesized correct solutions to all benchmarks, completing Solver-Aided Synthesis. In solver-aided synthesis, synthe- synthesis of 15 of the 19 benchmarks in under 9s, with the sis specifications are transformed to a set of constraints for slowest benchmark solving in 83s. We believe RbSyn demon- a SAT or SMT solver. Synqid [31] uses polymorphic refine- strates a promising new approach to synthesizing effectful ment types as the specification for synthesis. Lifty [33] is programs. a similar type system that verifies information flow control policies and synthesizes program repairs as needed to satisfy Acknowledgments the policies. Both Synqid and Lifty synthesize condition- Thanks to the anonymous reviewers for their helpful com- als using logical abduction. In contrast, RbSyn uses branch ments. This research was supported in part by NSF CCF- merging to synthesize conditionals, since translating Rails 1900563, and NSF CCF-1846350. code and libraries into logical formulas is impractical. Sketch [35] allows users to write partial programs, called sketches, where the omitted parts are then synthesized by References the tool. Migrator [39] uses conflict-driven learning [10] to [1] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recur- synthesize raw SQL queries, for use in database programs for sive program synthesis. In International conference on computer aided schema refactoring. In contrast, programs synthesized by Rb- verification. Springer, 934–950. [2] Rajeev Alur, Pavol Černy,` and Arjun Radhakrishna. 2015. Synthesis Syn use ActiveRecord to access the database. Rosette [36, 37] through unification. In International Conference on Computer Aided is a solver-aided language that provides access to verifica- Verification. Springer, 163–179. tion and synthesis. It relies on symbolic execution, and thus [3] Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. Scaling requires significant modeling of external libraries for syn- Enumerative Program Synthesis via Divide and Conquer. In Tools thesizing programs that use such libraries. and for the Construction and Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of the European Joint eusolver [3] synthesizes programs with branches, us- Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, ing an information-gain heuristic via decision tree learning. Sweden, April 22-29, 2017, Proceedings, Part I (Lecture Notes in Computer While, the decision tree learning procedure can produce Science), Vol. 10205. 319–336. branches in an enumerative search setting (provided the in- [4] Robert L Bocchino Jr, Vikram S Adve, Danny Dig, Sarita V Adve, put/output example set is complete), we leave an exploration Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Sim- mons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effect of how it compares to our rule-based merging to future work. system for deterministic parallel Java. In Proceedings of the 24th ACM However, eusolver requires a SMT solver to produce coun- SIGPLAN conference on Object oriented programming systems languages terexamples to build the input/output example set which and applications. 97–116. 13 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

[5] Alexander Borgida, John Mylopoulos, and Raymond Reiter. 1995. On SIGPLAN-SIGACT Symposium on Principles of Programming Languages the frame problem in procedure specifications. IEEE Transactions on 46, 1 (2011), 317–330. Software Engineering 21, 10 (1995), 785–798. [21] Sumit Gulwani, William R Harris, and Rishabh Singh. 2012. Spread- [6] Jonathan Immanuel Brachthäuser, Philipp Schuster, and Klaus Oster- sheet data manipulation using examples. Commun. ACM 55, 8 (2012), mann. 2020. Effects as capabilities: effect handlers and lightweight 97–105. effect polymorphism. Proc. ACM Program. Lang. 4, OOPSLA (2020), [22] William R Harris and Sumit Gulwani. 2011. Spreadsheet table trans- 126:1–126:30. formations from examples. Proceedings of the 32nd ACM SIGPLAN [7] Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Op- Conference on Programming Language Design and Implementation 46, timizing database-backed applications with query synthesis. In ACM 6 (2011), 317–328. SIGPLAN Conference on Programming Language Design and Implemen- [23] Civilized Discourse Construction Kit Inc. 2020. Discourse: A platform tation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013, Vol. 48. ACM New for community discussion. https://github.com/discourse/discourse. York, NY, USA, 3–14. [24] Michael B James, Zheng Guo, Ziteng Wang, Shivani Doshi, Hila Peleg, [8] Aaron Craig, Alex Potanin, Lindsay Groves, and Jonathan Aldrich. 2018. Ranjit Jhala, and Nadia Polikarpova. 2020. Digging for fold: synthesis- Capabilities: Effects for Free. In Formal Methods and Software Engi- aided API discovery for Haskell. Proceedings of the ACM on Program- neering - 20th International Conference on Formal Engineering Methods, ming Languages 4, OOPSLA (2020), 1–27. ICFEM 2018, Gold Coast, QLD, Australia, November 12-16, 2018, Proceed- [25] Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010. ings (Lecture Notes in ), Jing Sun and Meng Sun (Eds.), Oracle-guided component-based program synthesis. In 2010 ACM/IEEE Vol. 11232. Springer, 231–247. 32nd International Conference on Software Engineering, Vol. 1. IEEE, [9] Diaspora Inc. 2020. Diaspora: A privacy-aware, distributed, open 215–224. source social network. https://github.com/diaspora/diaspora. [26] Milod Kazerounian, Sankha Narayan Guria, Niki Vazou, Jeffrey S Foster, [10] Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Pro- and David Van Horn. 2019. Type-level computations for Ruby libraries. gram synthesis using conflict-driven learning. In Proceedings of the In Proceedings of the 40th ACM SIGPLAN Conference on Programming 39th ACM SIGPLAN Conference on Programming Language Design and Language Design and Implementation. 966–979. Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, [27] Bertrand Meyer. 2015. Framing the Frame Problem. In Dependable Vol. 53. ACM New York, NY, USA, 420–435. Software Systems Engineering. Vol. 40. IOS Press, 193–203. [11] Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat [28] Jaideep Nijjar and Tevfik Bultan. 2011. Bounded verification of Rubyon Chaudhuri. 2017. Component-based synthesis of table consolidation Rails data models. In Proceedings of the 2011 International Symposium and transformation tasks from examples. In Proceedings of the 38th on Software Testing and Analysis. 67–77. ACM SIGPLAN Conference on Programming Language Design and Im- [29] Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example- plementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Vol. 52. directed program synthesis. Proceedings of the 36th ACM SIGPLAN ACM New York, NY, USA, 422–436. Conference on Programming Language Design and Implementation 50, [12] Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W 6 (2015), 619–630. Reps. 2017. Component-based synthesis for complex APIs. In Proceed- [30] Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. ings of the 44th ACM SIGPLAN Symposium on Principles of Programming 2014. Test-driven synthesis. Proceedings of the 35th ACM SIGPLAN Languages. 599–612. Conference on Programming Language Design and Implementation 49, [13] John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data 6, 408–418. structure transformations from input-output examples. Proceedings of [31] Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Pro- the 36th ACM SIGPLAN Conference on Programming Language Design gram synthesis from polymorphic refinement types. Proceedings of and Implementation 50, 6 (2015), 229–239. the 37th ACM SIGPLAN Conference on Programming Language Design [14] Jean-Christophe Filliâtre and Andrei Paskevich. 2013. Why3—where and Implementation 51, 6, 522–538. programs meet provers. In European symposium on programming. [32] Nadia Polikarpova and Ilya Sergey. 2019. Structuring the synthesis of Springer, 125–128. heap-manipulating programs. Proceedings of the ACM on Programming [15] Jeffrey Foster, Brianna Ren, Stephen Strickland, Alexander Yu, Milod Languages 3, POPL (2019), 1–30. Kazerounian, and Sankha Narayan Guria. 2020. RDL: Types, type [33] Nadia Polikarpova, Deian Stefan, Jean Yang, Shachar Itzhaky, Travis checking, and contracts for Ruby. https://github.com/tupl-tufts/rdl. Hance, and Armando Solar-Lezama. 2020. Liquid information flow [16] Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve control. Proc. ACM Program. Lang. 4, ICFP (2020), 105:1–105:30. Zdancewic. 2016. Example-directed synthesis: a type-theoretic in- [34] Brianna M Ren and Jeffrey S Foster. 2016. Just-in-time static type check- terpretation. Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT ing for dynamic languages. In Proceedings of the 37th ACM SIGPLAN Symposium on Principles of Programming Languages 51, 1 (2016), 802– Conference on Programming Language Design and Implementation. 462– 815. 476. [17] Joel Galenson, Philip Reames, Rastislav Bodik, Björn Hartmann, and [35] Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, Koushik Sen. 2014. Codehint: Dynamic and interactive synthesis of and Vijay Saraswat. 2006. Combinatorial sketching for finite programs. code snippets. In Proceedings of the 36th International Conference on In Proceedings of the 12th international conference on Architectural Software Engineering. 653–663. support for programming languages and operating systems. 404–415. [18] GitLab B.V. 2020. GitLab is an open source end-to-end software devel- [36] Emina Torlak and Rastislav Bodik. 2013. Growing solver-aided lan- opment platform with built-in version control, issue tracking, code guages with rosette. In Proceedings of the 2013 ACM international sym- review, CI/CD, and more. https://gitlab.com/gitlab-org/gitlab. posium on New ideas, new paradigms, and reflections on programming [19] Colin S. Gordon. 2020. Designing with Static Capabilities and Effects: & software. 135–152. Use, Mention, and Invariants (Pearl). In 34th European Conference [37] Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic on Object-Oriented Programming, ECOOP 2020, November 15-17, 2020, virtual machine for solver-aided host languages. Proceedings of the Berlin, Germany (Virtual Conference) (LIPIcs), Robert Hirschfeld and 35th ACM SIGPLAN Conference on Programming Language Design and Tobias Pape (Eds.), Vol. 166. Schloss Dagstuhl - Leibniz-Zentrum für Implementation 49, 6 (2014), 530–541. Informatik, 10:1–10:25. [38] Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthe- [20] Sumit Gulwani. 2011. Automating string processing in spreadsheets sizing highly expressive SQL queries from input-output examples. In using input-output examples. Proceedings of the 38th Annual ACM 14 Conference’17, July 2017, Washington, DC, USA

Proceedings of the 38th ACM SIGPLAN Conference on Programming 40th ACM SIGPLAN Conference on Programming Language Design and Language Design and Implementation. 452–466. Implementation. 286–300. [39] Yuepeng Wang, James Dong, Rushi Shah, and Isil Dillig. 2019. Synthe- sizing database programs for schema refactoring. In Proceedings of the

15 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

Algorithm 2 Synthesis of programs that passes a spec 푠 false-y value, it results in an error, in which case it returns the collected side effects with the error ( E-AssertFail). Eval- 1: procedure Generate(휏1 → 휏2, 퐶푇 , Σ, s, maxSize) uation of a library method, gives a union of its effects with 2: Γ ← [푥 ↦→ 휏1] the already collected effects (E-MethCall). During the eval- 3: 푒0 ← □ : 휏2 uation of a sequence 푄;푄, if the first assertion evaluates toa 4: workList ← [(0, 푒0)] 5: while workList is not empty do value, the evaluation continues discarding all collected effects (E-SeqVal). If the evaluation of an assert yields an error, the 6: (푐, 푒푏) ← pop(workList) evaluation of postcondition terminates with the error as final 7: 휔휏 ← {(푐, 푒푡 ) | Σ, Γ ⊢ 푒푏 ⇝ 푒푡 : 휏} result (E-SeqErr). We define the big step semantics as fol- 8: 휔 ← {(푐, 푒푡 ) ∈ 휔휏 ∧ evaluable(푒푡 )} 푒푣푎푙 푄 ⇓ R ∃퐸′, 푐, ⟨휖 , 휖 ⟩. [푥 → 푣], , ⟨•, •⟩, 푄 →∗ 9: 휔 ← 휔 − 휔 lows: if 푟 푤 푟 0 ↩ 퐶푇 푟 휏 푒푣푎푙 ′ 퐸 , 푐, ⟨휖푟 , 휖푤⟩, R , in other words,J evaluating the postcon-K 10: for all (푐, 푒푡 ) ∈ 휔푒푣푎푙 do Jdition in an environmentK containing the return value of 11: 푐푟 , 푣푟 ← EvalProgram(푒푡, 푠) synthesis goal 푥푟 , will evaluate to a result. 12: if 푣푟 = err(휖푟 , 휖푤) then 13: 휔 ← 휔 ∪ {(푐, 푒 ) | Σ, Γ, 휖 ⊢ 푒 푒 } 푟 푟 푓 푟 푡 ↠ 푓 A.2 Type-Guided Synthesis 14: else Figure 11 shows all the type checking and type-directed 15: return 푒푡 16: end if synthesis rules. The repeated rules are same as §3. T-Nil type nil 17: end for checks the value and assigns it the type Nil. The rules T-True, T-False, T-Obj and T-Err do the same for the values 18: 휔푟 ← {(푐, 푒푏) ∈ 휔푟 ∧ size(푒푏 ) ≤ maxSize} true, false, [퐴] and err(휖푟 , 휖푤) respectively. Similarly T- 19: workList ← reorder(workList + 휔푟 ) 20: end while NegB and T-OrB give the rules to type check conditionals 21: return Error: No solution found that contain negation or disjunction. T-EffHole typechecks 22: end procedure an effect hole with the Obj type. It can be narrowed to a more 23: precise type when synthesis rules are applied. T-Seq does 24: procedure EvalProgram(푒, ⟨푆, 푄⟩) type checking and synthesis for sequences and T-App type 25: 푃 ← def 푚(푥) = 푒, 퐸 ← [] checks or synthesizes terms in the receiver and arguments 푐, 퐸, , , , 푃 푆 푄 ∗ 퐸′, 푐, 휖 , 휖 , of the method call. T-If type checks if-else expressions. 26: ( R) ← 0 ⟨• •⟩ ; ; ↩→퐶푇 ⟨ 푟 푤⟩ R J K J K 27: return (푐, R) A.3 Algorithm 28: end procedure Algorithm2 describes the synthesis of candidates that pass a single spec. The algorithm uses a work list, which ini- Errors E ::= err(휖푟 , 휖푤) tially contains a tuple with the number of passed assertions Results R ::= 푣 | E starting at 0 initially and the initial hole of type 휏2. The first tuple is popped off the work list and applies typeor effect guided synthesis rules, the ⇝ relation, to a base ex- Figure 9. Extended 휆푠푦푛. pression 푒푏 to build the set of new expressions 휔휏 . Next, A Appendix expressions with no holes are filtered into a list of evaluable expressions 휔 . Then, EvalProgram is called to make a A.1 Evaluation Rules 푒푣푎푙 program def 푚(푥) = 푒푡 and evaluate spec 푠 in an environ- We extend 휆푠푦푛 to include errors E. Errors can originate from ment 퐸, starting from a passed assertion count of 0. the evaluation of an assertion assert 푒 and encapsulates the If the evaluation of the postcondition results in an error 휖 휖 푒 read effect 푟 and write effect 푤 inferred from . Results R err(휖푟 , 휖푤), the algorithm proceeds to introduce an effect of an evaluation can either be a value or an error. Collecting hole, using the relation ↠ to build the set 휔휖 . If a program effects while evaluating the postcondition of tests require passes all the assertions then it means a correct solution has special evaluation rules. Figure 10 shows selected rules of been found so, Generate returns it. Finally, the algorithm the small step operational semantics for only postconditions collects all the remainder expressions 휔푟 with holes and (rest omitted as they are standard rules). The rules prove judg- filters the programs that exceed the maximum permissible 퐸, 푐, 휖 , 휖 , 푄 퐸′, 푐 ′, 휖 ′, 휖 ′ , 푄 ′ ments of the form ⟨ 푟 푤⟩ ↩→퐶푇 ⟨ 푟 푤⟩ size maxSize. This bounds the search to particular search that reduce configurationsJ thatK containJ a dynamic environ-K space size. It then takes the filtered programs and programs ment 퐸, counter of passed assertions 푐, the pair of read and from the remainder of the work list and reorders them. The write effects ⟨휖푟 , 휖푤⟩ collected during evaluation, and post- programs are sorted by the number of passed assertions 푐 condition 푄 under evaluation. Rule E-AssertPass applies in the decreasing order and then by the program size in when assertion evaluates to a truthy-y value. It also incre- the increasing order. This assumes that a program that is ments the counter for passed assertions. If 푒 evaluates to a 16 Conference’17, July 2017, Washington, DC, USA

퐸, 푐, 휖 , 휖 , 푄 퐸′, 푐 ′, 휖 ′, 휖 ′ , 푄 ′ ⟨ 푟 푤⟩ ↩→퐶푇 ⟨ 푟 푤⟩ J K J K 푣 ∈ {true, [퐴]} E-AssertPass 퐸, 푐, ⟨휖푟 , 휖푤⟩, assert 푣 ↩→퐶푇 퐸, 푐 + 1, ⟨휖푟 , 휖푤⟩, 푣 J K J K 푣 ∈ {false, nil} E-AssertFail 퐸, 푐, ⟨휖푟 , 휖푤⟩, assert 푣 ↩→퐶푇 퐸, 푐, ⟨휖푟 , 휖푤⟩, err(휖푟 , 휖푤) J K J K 퐸, 푐, 휖 , 휖 , 푒 퐸′, 푐, 휖 ′, 휖 ′ , 푒 ′ ⟨ 푟 푤⟩ ↩→퐶푇 ⟨ 푟 푤⟩ E-AssertStep 퐸, 푐, 휖 J, 휖 , 푒K J퐸′, 푐, 휖 ′, 휖 ′ , K 푒 ′ ⟨ 푟 푤⟩ assert ↩→퐶푇 ⟨ 푟 푤⟩ assert J K J K

⟨휖푟 ,휖푤 ⟩ type_of(푣푟 , 푣푎) = (퐴푟 , 퐴푎) 푚 : 휏푎 −−−−−−→ 휏 ∈ 퐶푇 (퐴) 퐴푟 ≤ 퐴 퐴푎 ≤ 휏푎 call(퐴.푚, 푣푟 , 푣푎) = 푣 E-MethCall 퐸, 푐, 휖 ′, 휖 ′ , 푣 .푚 푣 퐸, 푐, 휖 ′, 휖 ′ 휖 , 휖 , 푣 ⟨ 푟 푤⟩ 푟 ( 푎) ↩→퐶푇 ⟨ 푟 푤⟩ ∪ ⟨ 푟 푤⟩ J K J K 퐸, 푐, 휖 , 휖 , 푄 퐸, 푐, 휖 ′, 휖 ′ , 푄 ′ ⟨ 푟 푤⟩ 1 ↩→퐶푇 ⟨ 푟 푤⟩ 1 J K J K E-SeqStep E-SeqVal 퐸, 푐, 휖 , 휖 , 푄 푄 퐸, 푐, 휖 ′, 휖 ′ , 푄 ′ 푄 퐸, 푐, 휖 , 휖 , 푣 푄 퐸, 푐, , , 푄 ⟨ 푟 푤⟩ 1; 2 ↩→퐶푇 ⟨ 푟 푤⟩ 1; 2 ⟨ 푟 푤⟩ ; 2 ↩→퐶푇 ⟨• •⟩ 2 J K J K J K J K E-SeqErr 퐸, 푐, ⟨휖푟 , 휖푤⟩, err(휖푟 , 휖푤);푄2 ↩→퐶푇 퐸, 푐, ⟨휖푟 , 휖푤⟩, err(휖푟 , 휖푤) J K J K Figure 10. Selected rules for operational semantics of the postcondition 푄

Σ, Γ ⊢퐶푇 푒 ⇝ 푒 : 휏 T-Nil T-True Σ, Γ ⊢퐶푇 nil ⇝ nil : Nil Σ, Γ ⊢퐶푇 true ⇝ true : Bool

T-False T-Obj Σ, Γ ⊢퐶푇 false ⇝ false : Bool Σ, Γ ⊢퐶푇 [퐴] ⇝ [퐴] : 퐴

′ Γ(푥) = 휏 Σ, Γ ⊢퐶푇 푏 ⇝ 푏 : Bool T-Err , 푥 푥 휏 T-Var ′ T-NegB Σ, Γ ⊢퐶푇 err(휖푟 , 휖푤) ⇝ err(휖푟 , 휖푤) : Err Σ Γ ⊢퐶푇 ⇝ : Σ, Γ ⊢퐶푇 !푏 ⇝!푏 : Bool , 푏 푏′ Σ Γ ⊢퐶푇 1 ⇝ 1 : Bool , 푏 푏′ Σ Γ ⊢퐶푇 2 ⇝ 2 : Bool T-OrB T-Hole , 푏 푏 푏′ 푏′ , ⊢ 휏 휏 Σ Γ ⊢퐶푇 1 ∨ 2 ⇝ 1 ∨ 2 : Bool Σ Γ 퐶푇 □ : ⇝ □ :

푣 : 휏1 ∈ Σ 휏1 ≤ 휏2 Γ(푥) = 휏1 휏1 ≤ 휏2 T-EffHole , 휏 푣 휏 S-Const , 휏 푥 휏 S-Var Σ, Γ ⊢퐶푇 ^ : 휖 ⇝ (^ : 휖) : Obj Σ Γ ⊢퐶푇 □ : 2 ⇝ : 1 Σ Γ ⊢퐶푇 □ : 2 ⇝ : 1 , 푒 푒 ′ 휏 , 푒 푒 ′ 휏 푚 : 휏1 → 휏2 ∈ 퐶푇 (퐴) 휏2 ≤ 휏3 Σ Γ ⊢퐶푇 1 ⇝ 1 : 1 Σ Γ ⊢퐶푇 2 ⇝ 2 : 2 , 휏 퐴 .푚 휏 휏 S-App , 푒 푒 푒 ′ 푒 ′ 휏 T-Seq Σ Γ ⊢퐶푇 □ : 3 ⇝ (□ : ) (□ : 1) : 2 Σ Γ ⊢퐶푇 1; 2 ⇝ 1; 2 : 2 ′ ′ , 푒 푒 ′ 휏 Σ, Γ ⊢퐶푇 푒1 ⇝ 푒 : 휏 Σ, Γ ⊢퐶푇 푒2 ⇝ 푒 : 휏3 Σ Γ ⊢퐶푇 1 ⇝ 1 : 1 1 2 , 푥 휏 푒 푒 ′ 휏 푚 : 휏1 → 휏2 ∈ 퐶푇 (퐴) 휏 ≤ 퐴 휏3 ≤ 휏1 Σ Γ[ ↦→ 1] ⊢퐶푇 2 ⇝ 2 : 2 , 푒 .푚 푒 푒 ′.푚 푒 ′ 휏 T-App , 푥 푒 푒 푥 푒 ′ 푒 ′ 휏 T-Let Σ Γ ⊢퐶푇 1 ( 2) ⇝ 1 ( 2) : 2 Σ Γ ⊢퐶푇 let = 1 in 2 ⇝ let = 1 in 2 : 2

′ Σ, Γ ⊢퐶푇 푏 ⇝ 푏 : Bool , 푒 푒 ′ 휏 , 푒 푒 ′ 휏 Σ Γ ⊢퐶푇 1 ⇝ 1 : 1 Σ Γ ⊢퐶푇 2 ⇝ 2 : 2 , 푏 푒 푒 푏′ 푒 ′ 푒 ′ 휏 휏 T-If Σ Γ ⊢퐶푇 if then 1 else 2 ⇝ if then 1 else 2 : 1 ∪ 2

Figure 11. Type checking and type-directed synthesis rules

17 Conference’17, July 2017, Washington, DC, USA Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn

evaluable 푒1; 푒2 = evaluable 푒1 ∧ evaluable 푒2 evaluable 푒1.푚(푒2) = evaluable 푒1 ∧ evaluable 푒2 evaluable □ : 휏 = false evaluable let 푥 = 푒1 in 푒2 = evaluable 푒1 ∧ evaluable 푒2 evaluable if 푏 then 푒1 else 푒2 = evaluable 푏 ∧ evaluable 푒1 ∧ evaluable 푒2 evaluable !푏 = evaluable 푏 evaluable 푏1 ∨ 푏2 = evaluable 푏1 ∧ evaluable 푏2 evaluable _ = true

size 푒1; 푒2 = size 푒1 + size 푒2 size 푒1.푚(푒2) = size 푒1 + size 푒2 + 1 size let 푥 = 푒1 in 푒2 = size 푒1 + size 푒2 size if 푏 then 푒1 else 푒2 = size 푏 + size 푒1 + size 푒2 size !푏 = size 푏 size 푏1 ∨ 푏2 = size 푏1 + size 푏2 size _ = 0

Figure 12. Helper functions used by RbSyn

⟨푒1,푏1, Ψ1⟩ ⊕ ⟨푒2,푏2, Ψ2⟩ = ⟨푏1,푏1 ∨ 푏2, Ψ1 ∪ Ψ2⟩ a correct program in that search space it will return an error (4) for the same. if 푒 ≡ true, 푒 ≡ false and 푏 ≡ !푏 1 2 1 2 Figure 12 shows the formal definitions of hasHole, and 푒 ,푏 , 푒 ,푏 , 푏 ,푏 푏 , ⟨ 1 1 Ψ1⟩ ⊕ ⟨ 2 2 Ψ2⟩ = ⟨ 2 1 ∨ 2 Ψ1 ∪ Ψ2⟩ size functions. (5) if 푒1 ≡ false, 푒2 ≡ true and 푏1 ≡ !푏2 A.4 Branch pruning rules ⟨푒1,푏1, Ψ1⟩ ⊕ ⟨푒2,푏2, Ψ2⟩ = ⟨푒1,푏1, Ψ1⟩ ⊕ ⟨푒2,푏푔, Ψ2⟩ if 푏푔 ≡ !푏1 Figure 13 formally describes the rules that allows RbSyn to and ∀⟨푆 , 푄 ⟩ ∈ Ψ .def 푚(푥) = 푏 ⊢ 푆 ; assert 푥 ⇓ 푣 푖 푖 2 푔 푖 푟 do term rewriting in 휆 for branch pruning. These are use- (6) 푠푦푛 ful particularly for reducing boolean programs. Rules4 and5 푒 ,푏 , 푒 ,푏 , 푒 ,푏 , 푒 ,푏 , 푏 푏 ⟨ 1 1 Ψ1⟩ ⊕ ⟨ 2 2 Ψ2⟩ = ⟨ 1 푔 Ψ1⟩ ⊕ ⟨ 2 2 Ψ2⟩ if 푔 ≡ ! 2 allows us to rewrite expressions into their branch condition and ∀⟨푆푖, 푄푖 ⟩ ∈ Ψ1.def 푚(푥) = 푏푔 ⊢ 푆푖 ; assert !푥푟 ⇓ 푣 if the expression body is true or false reducing expressions (7) like if 푏 then true else false to 푏. Rules6 and7 guess a conditional that is the negation of the other, if the negation holds for the tests. Any ⊕ term where two branch condi- Figure 13. Branch pruning rules. tions are negations of each other reflect a shorter program if 푏1 then 푒1 else 푒2 in 휆푠푦푛. If it is a boolean program, then more likely correct and smaller will be selected earlier for it might even enable application of rules4 and5 producing processing in the work list. Lastly, if Generate doesn’t find a single line program.

18