Constraint-Based Type-Directed Program Synthesis

Peter-Michael Osera Department of Grinnell College United States of America [email protected]

Abstract 2. The result of pattern matching on the second argument a We explore an approach to type-directed program synthe- (in the Just case, its argument will have type ). sis rooted in constraint-based type inference techniques. By These restrictions highly constrain the set of valid programs doing this, we aim to more efficiently synthesize polymor- that will typecheck, giving the impression that the function phic code while also tackling advanced typing features such writes itself as long as the programmer can navigate the type as GADTs that build upon polymorphism. Along the way, system appropriately. To do this, they must systematically we also present an implementation of these techniques in check the context to see what program elements are rele- Scythe, a prototype live, type-directed programming tool vant to their final goal and try to put them together into for the Haskell programming language and reflect on our a complete program. However, such navigation is usually initial experience with the tool. mechanical and tedious in nature. It would be preferable if a CCS Concepts • Software and its engineering → Se- tool automated some or all of this type-directed development mantics; ; • Theory of compu- process. tation → Logic and verification. Such luxuries are part of the promise of type-directed pro- gram synthesis tools. Program synthesis is the automatic Keywords , Program Synthesis, generation of programs from specification. In this particular Type Inference, case, the specification of our program is its rich type cou- ACM Reference Format: pled with auxiliary information, e.g., concrete examples of Peter-Michael Osera. 2019. Constraint-Based Type-Directed Pro- intended behavior. gram Synthesis. In Proceedings of the 4th ACM SIGPLAN Interna- tional Workshop on Type-Driven Development (TyDe ’19), August 1.1 From Typechecking to Program Synthesis 18, 2019, Berlin, Germany. ACM, New York, NY, USA, 13 pages. Type-directed synthesis techniques search the space of pos- hps://doi.org/10.1145/3331554.3342608 sible programs primarily through a reinterpretation of the programming language’s type system [6, 19, 22]. Tradition- 1 Introduction ally, type systems are specified by defining a relation between Functional programmers frequently comment how richly- a context, expression, and type, e.g., Γ ⊢ e : τ declares that e typed functional programs just “write themselves”.1 For ex- has type τ under context Γ. From this specification, we would ample, consider writing down a function that obeys the type: like to extract a typechecking for the language. f :: a -> Maybe a -> a However, because relations do not distinguish between in- Because the return type of the function, a, is polymorphic puts and outputs, it is sometimes not clear how to extract we can only produce a value from two sources: such an algorithm. Bi-directional typechecking systems [21] alleviate these concerns by making it explicit which compo- 1. The first argument to the function (of type a). nents of the system are inputs and outputs of the system, 1Once you get over the complexity of the types! typically by distinguishing the cases where we check that a term has a type from the cases where we infer that a term Permission to make digital or hard copies of all or part of this work for has a type. In the former case, the type acts as an input to personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that the system where in the latter case, it is an output. copies bear this notice and the full citation on the first page. Copyrights In both cases, the term being typechecked serves as an for components of this work owned by others than the author(s) must input to the typechecking algorithm. However, with type- be honored. Abstracting with credit is permitted. To copy otherwise, or directed program synthesis, we instead view the term as republish, to post on servers or to redistribute to lists, requires prior specific an output and the type as an input. For example, consider permission and/or a fee. Request permissions from [email protected]. TyDe ’19, August 18, 2019, Berlin, Germany the standard function application rule found in most type © 2019 Copyright held by the owner/author(s). Publication rights licensed systems: to ACM. Γ ⊢ e1 : τ1 → τ2 Γ ⊢ e2 : τ1 ACM ISBN 978-1-4503-6815-5/19/08...$15.00 hps://doi.org/10.1145/3331554.3342608 Γ ⊢ e1 e2 : τ2 TyDe ’19, August 18, 2019, Berlin, Germany Peter-Michael Osera

While we can observe that the function application should be there is nothing directly constraining the type variable typed at the result type of the function e1, it isn’t clear which a. parts of the relations are inputs and outputs and the order Current systems that handle polymorphic synthesis in which the checks ought to be carried out. A bidirectional simply enumerate and explore all the possible types interpretation of this rule makes the inputs and outputs that can be created from the context. However, this explicit: may lead to an excessive exploration of type combi- nations that do not work, e.g., choosing a to be Int Γ Γ ⊢ e1 ⇒ τ1 → τ2 ⊢ e2 ⇐ τ1 but not having any functions of type Int -> Bool Γ ⊢ e1 e2 ⇒ τ1 → τ2 available in the context. Furthermore, it may lead to repeated checking of terms that are, themselves, poly- Γ Here, the relation ⊢ e1 ⇒ τ1 → τ2 means that we infer that morphic, e.g., the empty list [] :: [a] for any type a. Γ the type of e1 is τ1 → τ2. The relation ⊢ e2 ⇐ τ1 means We would like to develop a system that systematically that we check that the type of e2 is indeed τ1. With this, the searches the space of polymorphic instantiations while procedure for type checking a function application is clear: minimizing the work done as much as possible. infer a τ1 → τ2 for e1 and then check that e2 • Reasoning about richer types: polymorphic types has that input type τ1; the type of the overall application is are relatively easy to handle in an ad hoc fashion. How- then inferred to be the output type τ2. ever, polymorphic types form the basis for a variety of With type-directed synthesis, we turn typechecking into advanced type features such as generalized algebraic term generation by reinterpreting the inputs and outputs of datatypes (GADTs) and typeclasses that are commonly the typechecking relation. used in advanced functional programming languages. Rather than developing ad hoc solutions for all these Γ ⊢ τ1 → τ2 ⇒ e1 Γ ⊢ τ1 ⇒ e2 related features, it would be useful to have a single Γ ⊢ τ2 ⇒ e1 e2 framework for tackling them all at once. The relation Γ ⊢ τ ⇒ e asserts that whenever we have a goal type τ we can generate a term e of that type. Now our term 1.3 Outline generation rule for function application says that whenever In this paper we present a solution to the problems described we have a goal typeτ2, we can generate a function application above: a constraint-based approach to type-directed pro- e1 e2 where the output of the function type agrees with the gram synthesis. Constraint-based typing is used primarily to goal. We can apply this pattern to the other rules of the specify type inference systems which generate constraints type system to obtain a complete term generation system between types in the program and then solves those con- for a language. We can further augment the system with straints to discover the types of unknown type variables. In type-directed example decomposition [6, 19] or richer types the spirit of type-directed program synthesis, we flip the such as refinements [22] to obtain true program synthesis inputs and outputs of the constraint-based typing system systems. to arrive at a synthesis system that tracks type constraints This type-theoretic interpretation of program synthesis throughout the synthesis process. This embodies a new strat- gives us immediate insight into how to synthesize programs egy for synthesis—“infer types while synthesizing”—that for languages with rich type systems. Because rich types allows for efficient synthesis of polymorphic code while also constrain the set of possible programs dramatically, program giving us a framework to tackle GADTs and other advanced synthesis with types can lead to superior synthesis perfor- type features built on polymorphism. mance. On top of this, the type-directed synthesis style di- In section 2, we develop a constraint-based program syn- rectly supports type-directed programming, a hallmark of thesis based on a basic constraint-based type inference sys- richly-typed functional programming languages. tem. In section 3, we extend the system to account for GADTs. In section 4, we discuss our prototype implementation of 1.2 The Perils of Polymorphism this approach in our program synthesis tool, Scythe. And fi- However, in supporting rich types, in particular polymor- nally in section 5 and section 6 we close by discussing future phism, we run into a pair of problems that deserve special extensions and related work. attention. • The type enumeration problem: when dealing with 2 Constraint-based Program Synthesis polymorphic types, we must first instantiate them. We first develop a constraint-based program synthesis sys- However, there may be many possible instantiations tem for a core functional programming language featuring of a polymorphic type. For example, consider the map algebraic datatypes and polymorphism. To do this, we first function of type (a -> b) -> [a] -> [b]. Even if present a standard Hindley-Milner style type inference sys- we know that we need to create a value of type [Bool], tem for the language and then show how to re-interpret it Constraint-Based Type-Directed Program Synthesis TyDe ’19, August 18, 2019, Berlin, Germany

i σ :: = ∀ai . τ Type Schemes i τ :: = α | a | T τi | τ1 → τ2 Monotypes j e :: = x | K | λx. e | e1 e2 | case e of mj Expressions k m :: = K xk → e Match Branches b :: = x = e Top-level Bindings

= ′i C :: {τi ∼ τi } Constraints θ :: = · | [τ /a]θ | [τ /α]θ Type Environments i k i Γ :: = · | x:σ, Γ | K:∀ai .C ⇒ τk → T ai , Γ Contexts

Figure 1. Basic syntax of the object language. as a program synthesis system, augmented with example infer-var i i propagation in the style of Myth [19]. fresh αi x:∀ai . τ ∈ Γ Γ ⊢ e : τ ⇒ C Figure 1 gives the syntax of the object language. Types i Γ ⊢ x : [αi /ai ] τ ⇒ · are composed of top-level type schemes σ which introduce infer-app universally quantified type variables. The language of mono- fresh α Γ ⊢ e1 : τ1 ⇒ C1 Γ ⊢ e2 : τ2 ⇒ C2 types τ enclosed in type schemes include type variables a, Γ function types, and saturated algebraic datatypes T τ j . The ⊢ e1 e2 : α ⇒ C1 ∪ C2 ∪ {τ1 ∼ τ2 → α} j infer-case constraint-generation process also generates unification vari- i ′ i fresh αi , β Γ ⊢ e : τ ⇒ CC = C ∪ {τ ∼ T αi } ables α, β,γ,..., that are eventually substituted away during j i the inference process. The context Γ records both the type Γ ⊢ αi ;bj : τj ⇒ Cj of variables and constructors K, enforcing that all construc- j ′ j j Γ ⊢ case e of bj : β ⇒ C ∪ Cj ∪ {β ∼ τj } tors are (possibly) polymorphic functions whose co-domain infer-lam is their associated algebraic datatype. The co-domain is re- fresh α x:α, Γ ⊢ e : τ ⇒ C stricted to the universally quantified type variables of the Γ declaration; in section 3, we lift this restriction to enable ⊢ λx. e : α → τ ⇒ C generalized algebraic datatypes. The expression language is a standard typed, functional Γ i language with algebraic datatype constructors and pattern ⊢ αi ;m : τ ⇒ C matching restricted to one-level peeling of head constructors infer-match k from their arguments. Like Haskell, we treat constructors i k i i K:∀ai . τk → T ai ∈ Γ xk :[αi /ai ] τk , Γ ⊢ e : τ ⇒ C like function application and allow for partial application of Γ i k constructor values. Conspicuously absent from the expres- ⊢ αi ; K xk → e : τ ⇒ C sion language are let-bindings. This is because we do not synthesize let-bindings (that are not top-level) during the infer-bind synthesis process. As we shall see shortly, there is no type- Γ ⊢ e : τ ⇒ C unify(C) = θ i i or-example-directed way to synthesize such bindings in the αi < fuvs(θ) fresh ai Γ ⊢ b : σ general case, so we elide them from our object language. i Γ ⊢ f = e : ∀a i . [a /α ] θτ Throughout this paper, we refer to sequences of objects i i i either using ellipses, e.g., e1 ... ek , or using overbar notation k Figure 2. Type inference system. ek where we use index variables i, j, k,... to represent the lengths of these sequences. We denote nested sequences (i.e., k sequences of sequences) using nested overbars and both sub- • k: arguments (to constructors), e.g., K ek . scripts and superscripts to separate the lengths. For example, i j 2.1 Type Inference ei refers to a sequence (of size i) of sequences (of size j) j Figure 2 gives the type inference rules for the system. As i of expressions e denoted ej . Finally, whenever possible we is standard, our constraint-based type inference system op- use the metavariables i, j, and k to denote the lengths of the erates in two phases: type-and-constraint generation and following sequences: constraint solving. The primary judgment of the system Γ ⊢ e : τ ⇒ C infers that expression e has type τ under i i • i: types, e.g., ∀ai . τ , T τi , context Γ and generates constraints C. Constraints in this j • j: examples, e.g., {Sj → χj } , and system τ1 ∼ τ2 assert equalities between types, in particular, TyDe ’19, August 18, 2019, Berlin, Germany Peter-Michael Osera giving concrete types to unification variables α that are gen- would like to specify the behavior of a function through a erated for unknown types during the inference process. For collection of input-output examples (either realized as test example, in the case of an un-annotated lambda (infer-lam), cases or documentation). we do not immediately know the type of the argument to the The synthesis system takes as input a context Γ, a goal lambda x. Thus, we create a fresh unification variable α and type τ , and a set of examples, and produces a program of that use that variable as the type of the argument. In inferring the type that agrees with the set of examples. Ultimately, the type of the body of the lambda, we will generate constraints system decomposes the examples, assigning sub-values to C that refine the type of α. generated binders—from lambdas and case expressions—as In addition to assigning unification variables to types, con- the synthesizer generates them. These are recorded along- straints also serve the purpose of asserting expected equali- side the examples that generated them, leading to the notion ties between types of sub-expressions. When inferring the of an example world W as a pair of a (value) environment S type of a function application (infer-app), we assert that and a example goal χ for that world. When checking that a the type of the head expression is indeed a function type program agrees with these examples, we evaluate the pro- and the argument expression’s type is the domain of that gram closed with each example world’s environment and function type (τ1 ∼ τ2 → α). When inferring the type of a check that the resulting value is consistent with that world’s case scrutinee (infer-case), we assert that the type of the example goal. We write the application of an environment S scrutinee is indeed an algebraic datatype with fresh unifi- to an expression e using juxtaposition: Se. i cation variables in place of its arguments (τ ∼ T αi ). And Figure 4 describes the complete synthesis system over the finally, we add constraints when inferring the overall type language. In the spirit of bidirectional typechecking [21], we of a case expression stating that all of the branches have the divide type-directed synthesis into two processes which are j realized as a pair of judgments in the system: same type (β ∼ τj ). Polymorphic types interact with the inference system in 1. Refinement, Γ ⊢ τ ⊲W ⇒ e, decomposes a set of input two ways. The first is let-generalization which occurs at top- examples according to their types. level bindings f = e. After generating constraints C for the 2. Generation, Γ ⊢ τ ⇒ e enumerates terms in a type- body of the binding, we solve them using a standard unifica- directed fashion in the absence of examples. tion algorithm to find a type substitution θ for each of the The separation is, in fact, a division of the syntax of the unification variables found in C (infer-bind). Those unifi- language into introduction forms and elimination forms for cation variables with no substituting types (∀α i < fuvs(θ)) i refinement and generation, respectively. From a type infer- become universal type variables in the final type scheme ence perspective, type information flows into introduction of f . The second way lies in inferring the monotypes we forms (i.e., we check against their expected types based on use to instantiate the type schemes of variables. Rather than their shape) and out of elimination forms (i.e., we infer their requiring that the user provides the instantiation via a type types based on their sub-components). From a synthesis per- application form or guessing the instantiation upfront, we spective, this means we can use the shape of an introduction instantiate a type scheme with fresh unification variables α i i form to determine how to decompose examples whereas we that will be later constrained based on how the the variable have no such affordances in the general case for elimination x is used by the program (infer-var). forms. Thus, we must resort to raw term enumeration for those forms. 2.2 Constraint-based Synthesis As a final note, the system as described does not support Figure 3 gives the syntactic extensions to the core language to recursive functions (note that in Figure 4, refine-lam does support type-and-example-directed program synthesis in the not bind anything for the function itself) in order to make style of Myth [6, 19]. We extend the language with example clear the details of constraint propagation in the synthesis values χ that the user provides as additional specification to process. Recursion can be easily integrated into the system the synthesizer. Intuitively, example values are specifications in the same manner as Myth by treating the set of input- of values at a particular type that can be decomposed into output examples for a function as a partial function and specifications for those values’ components. For example, using that function value as a binding for the function-being- each of the values vi of a saturated constructor example synthesized. This requires that the example set is trace com- K vi can become examples for synthesizing each of K’s plete so that any call to the function has a known value [19] arguments. and is how the Scythe system discussed in section 4 imple- However, it is not immediately clear how to decompose ments recursion. values at function type, i.e., lambdas, in a similar way. Like Myth, we introduce input-output pairs v ⇒ χ as a dis- 2.3 Refinement tinguished example value at function type. Such examples The non-polymorphic refinement rules are largely identi- are easily decomposed and align with our intuition that we cal to those found in Myth. The function of each rule is to Constraint-Based Type-Directed Program Synthesis TyDe ’19, August 18, 2019, Berlin, Germany

i v :: = c | λx. e | K vi Values i χ :: = c | K χi | v ⇒ χ Examples i j X :: = ∀ci :ai . χj Top-level Examples i S :: = [vi /xi ] Environments j W :: = [Sj 7→ χj ] Example Worlds Γ :: = · · · | c:a, Γ Contexts

Figure 3. Syntactic extensions for type-and-example-directed program synthesis. describe how to decompose a set of example worlds accord- head constructor K (written using the filtering operator refine- K ing to the goal type’s example form. For lambdas ( W |e defined in Figure 5). lam), we decompose a collection of worlds containing input- When generating a scrutinee, we use the term-generation output examples by assigning the lambda’s binder x the judgment C; Γ ⊢ τ ⇒ e;C ′ which produces a set of type input and synthesizing the body of the lambda with the constraints C ′ under which e typechecks at type τ . However, output as the goal. For constructors (refine-data), if all because we assume the concrete datatype of type T τ i up- the examples share the same head K, then we can safely i front, we know that the type does not contain any unification synthesize that constructor and divide up the examples’ ar- variables and thus we can ignore C ′ for now. In theory this guments into examples for each of that constructor’s ar- means that an implementation of the system must explore guments. For example, suppose that we have a construc- all possible combinations of datatypes to arguments which tor K : Int → Bool → Int → T with example worlds is the exact problem we are trying to solve with constraint- W = w ,w ,w : 1 2 3 based synthesis! But in practice, we heavily restrict the kinds w1 = S1 7→ K 0 True 1 of expressions that appear as scrutinees already, e.g., to direct arguments of the synthesized function, so such type search w2 = S2 7→ K 2 False 3 is tolerable in practice. w3 = S3 7→ K 4 True 5 i The helper judgment Γ ⊢ e;τi ; K;τ ⊲W ⇒ m is responsi- ble for synthesizing match branches from the scrutinee and candidate constructor. Its sole rule (refine-match) extracts By refine-data, we will synthesize K    with three 1 2 3 and binds the evaluated-to constructor values’ arguments in synthesis sub-goals for each of K’s arguments. Each of the the filtered example worlds and continues synthesis of the sub-goals contains example worlds W , W , and W respec- 1 2 3 match’s body. tively: Finally, we connect refinement and term generation with W1 = S1 7→ 0, S2 7→ 2, S3 7→ 4 the refine-guess rule which generates a term e and then checks to see if, for each example world S 7→ χ, that the W2 = S1 7→ True, S2 7→ False, S3 7→ True closed term Se is equivalent to the goal example χ according W3 = S1 7→ 1, S2 7→ 3, S3 7→ 5 to the standard evaluation rules of the language (e ≡β v Notably with refinement, the fully specified goal type is whenever e −→∗ v). The only non-standard case that can required as input to the procedure, so there is no need to arise is the comparison of a lambda (a v) to an input-output reason about type constraints at this point in the system2. example (a χ). We thus define: Unlike other type-directed forms, case-expressions do not immediately imply a particular type. Indeed the type of a λx:τ . e ≡β v ⇒ χ iff (λx:τ . e) v ≡β χ. case-expression is exactly the type shared by its match bodies. Nevertheless, we can refine examples through case expres- That is, we run the input of the input-output pair through sions as described by the refine-case rule: the lambda and check to see the resulting value is equivalent to the output of the pair. 1. Generate a scrutinee expression e at some concrete datatype T τj . Polymorphism in Refinement We must consider poly- 2. For each constructor K of T , create a match for that morphic types at two points within refinement. First, unlike j constructor using the example worlds {Sj 7→ χj } ⊆ the presentations of the case rule in prior work [19], construc- W where the scrutinee would evaluate to a value with tors now have (potentially) polymorphic types. However, i because the type T τi is fully concrete, we can immediately 2 Note that even with goal types elided, the examples make inference of = i the goal type trivial due to canonicity of types. However, we may want to instantiate the constructor’s polymorphic type (θ [τi /ai ]) consider type-directed synthesis in the absence of any examples which we and use that type substitution on each of the types of the discuss in more detail in section 5. arguments of the constructor. TyDe ’19, August 18, 2019, Berlin, Germany Peter-Michael Osera

refine-guess ∗ ′ j e ≡β v iff e −→ v C; Γ ⊢ τ ⇒ e;C Sje ≡β χj consistent(C ′) λx:τ . e ≡β v ⇒ χ iff (λx:τ . e) v ≡β χ. Γ ⊢ τ ⊲ W ⇒ e K ∗ j W | = {S → χ ∈ W | Se −→ K v1 ··· vk } Γ ⊢ τ ⊲ {Sj 7→ χj } ⇒ e e refine-lam unify(C) = θ (standard unification algorithm) j ′ ′ ′ ′ fresh x x:τ1, Γ ⊢ τ2 ⊲ {[vj /x]Sj 7→ χj } ⇒ e C  C iff ∀τ ∼ τ ∈ C . θτ = θτ whereunify(C) = θ. j ( ) ∃ . ( ) = Γ ⊢ τ1 → τ2 ⊲ {Sj 7→ vj ⇒ χj } ⇒ λx. e consistent C iff θ unify C θ refine-data i k i i K:∀ai . τk → T ai ∈ Γ θ = [τi /ai ] Figure 5. Auxiliary definitions for the synthesis system. k j = j Γ ⊲ k Wk {Sj 7→ χk } ⊢ θτk Wk ⇒ ek j k Secondly, top-level bindings are at type schemes σ, so Γ i ⊲ j k we must provide examples at polymorphic type for them. ⊢ T τi {Sj 7→ K χk } ⇒ K ek refine-case However, there is no type-directed syntactic form for poly- j k morphic types! Furthermore, even an explicit term-level type Γ i ′ i . j i Γ Λ C; ⊢ T τi ⇒ e;C Kj :∀ai τk → T ai ∈ abstraction, e.g., a. χ, does not provide much help! This is j i because the type abstraction does not introduce values of Γ ⊢ e;τi ; Kj ;τ ⊲ W ⇒ mj the type variable a, and we need such values to write down Γ ⊲ j ⊢ τ W ⇒ case e of mj complete examples. i We therefore introduce a new top-level example value Γ ⊢ e;τi ; K;τ ⊲ W ⇒ m form X in Figure 3, the polymorphic example ∀c :a i . χ j , refine-match i i j i k i Γ K = j which introduces a set of polymorphic constants ci at type K:∀ai . τk → T ai ∈ We {Sj 7→ χj } j variables ai . This example form was first introduced in Os- i k = ∗ j k era’s thesis [17] in the context of synthesizing within Sys- θ [τi /ai ] Sje −→ K vk fresh xk j tem F, and we have adapted it to this Haskell-like setting. k k j ′ For example, we might specify for a goal type of ∀a. a → xk :θτk , Γ ⊢ τ ⊲ {[v /xk ] Sj 7→ χj } ⇒ e k Maybe a → a: i k ′ Γ ⊢ e;τi ; K;τ ⊲ W ⇒ K xk → e ∀c1:a, c2:a. c1 ⇒ Nothing ⇒ c1, Γ ⊢ σ ⊲ W ⇒ e . refine-poly c1 ⇒ Just c2 ⇒ c2 Γ ⊲ j ⊢ τ {Sj 7→ χj } ⇒ e In these two examples, we introduce polymorphic constants j c and c of type a and use them throughout the input-output Γ ⊢ ∀ i . ⊲ { 7→ ∀ k . } ⇒ 1 2 ai τ Sj ck :ak χj e examples for the standard fromMaybe function.3 We refine polymorphic examples introduced at top-level C; Γ ⊢ τ ⇒ e;C ′ bindings (refine-poly) by simply stripping the top-level gen-var forall and synthesizing at the contained sub-example value. i i ′ ′ Γ = i fresh αi x:∀ai .C ⇒ τ ∈ θ [αi /ai ] The binding information for the polymorphic constant is ′′ = ′ ′ ′′ C C ∪ θC ∪ {τ ∼ θτ } consistent(C ) only useful for typechecking the examples themselves; they C; Γ ⊢ τ ⇒ x;C ′′ are not necessary for the synthesis process as the example gen-app values are simply carried around as values bound to variables fresh α C; Γ ⊢ α → τ ⇒ e1;C1 in an example world’s environment. C1; Γ ⊢ α ⇒ e2;C2 2.4 Generation C; Γ ⊢ τ ⇒ e1 e2;C2 gen-unify Like refinement, term generation takes as input a goal type. ′ C  τ1 ∼ τ2 C; Γ ⊢ τ2 ⇒ e;C However, during the course of generation, we may need to ′ instantiate types schemes when generating variables. Rather C; Γ ⊢ τ1 ⇒ e;C than committing to a choice of a complete instantiation of a type scheme upfront (which requires us to systematically Figure 4. Type-and-example program synthesis rules.

3The astute reader ought to notice that these examples are virtually the definition of fromMaybe already! We reflect on this fact in more detail in section 4. Constraint-Based Type-Directed Program Synthesis TyDe ’19, August 18, 2019, Berlin, Germany search the space of possible types), we infer the instantia-  : [Bool] tion as we generate the complete term. If type-and-example- directed program synthesis captures the idea of “evaluating  : α1 → [Bool]  : α1 during enumeration” [19], then our approach of integrat- ing type inference into the term generation adds the idea of    “inferring during enumeration”. : α2 → (α1 → [Bool]) : α2 : [Int] To do so, we “invert” the appropriate rules from our type inference system (Figure 2) to create a term generation sys- map  : γ1 → γ2 l tem that infers types during the generation process. This Γ ′ judgment, written C; ⊢ τ ⇒ e;C , takes a collection of isEven constraints C, a context Γ, and type τ as input as produces ′ a program e and an updated constraint set C as output. In Figure 6. Example generation derivation of map. effect, we thread a set of type constraints throughout the generation process which makes concrete the idea that a choice of a component in one part of an expression might in- For example, consider generating a program of type [Bool] fluence choices in the rest of the expression. It is worthwhile under the context: to note that our rules are not an exact inversion of the type inference system defined in Figure 2 because the generation Γ = map : ∀ab. (a → b) → [a] → [b], process is given a concrete goal type to work with as input. isEven : Int → Bool, While refinement handles the introduction forms of the l : [Int] language, generation concerns the elimination forms, of which there are only variables and function application. With with goal type [Bool]. Figure 6 graphically shows one poten- variable generation (gen-var), we speculatively choose a tial generation of a program of this type. variable to generate and then create a new constraint record- Starting at the root, we first apply the gen-app rule which ing that the goal type must be equivalent to the type scheme generates a fresh unification variable α1 and two gener- of the variable instantiated with fresh unification variables ation sub-goals at types α1 → [Bool] and α1. We apply i ′ (τ ∼ [αi /ai ]τ ). And with function application generation gen-app again at the first sub-goal which generates another (gen-app), we first speculatively generate a function under unification variable α2 and two more sub-goals at types the constraint that its domain matches our eventual argu- α2 → (α1 → [Bool]) and α2. ment type and the co-domain matches our goal type. We then At the first of these new sub-goals, we can apply gen-var generate function arguments under the constraints accrued and choose the map variable. This generates the constraint from generating the function. While the initial goal type for generation is a concrete α2 → α1 → [Bool] ∼ (γ1 → γ2) → [γ1] → [γ2] type, we will quickly introduce unknown types, i.e., unifica- for fresh unification variables γ and γ . Now we can return tion variables, through the process. We eliminate unification 1 2 to the sub-goal with goal type α and apply gen-unify to variables through unification of the constraints during the 2 continue generating at type γ → γ since our constraint generation process (gen-unify). If our goal type is some 1 2 implies that α ∼ γ → γ . This allows us to apply gen-var τ (presumably a unification variable) and our current con- 2 1 2 1 and choose the isEven variable, adding the constraint straint set allows us to conclude that it is equivalent to τ2 (C  τ ∼ τ ) then we can synthesize at this type τ instead. 1 2 γ1 → γ2 ∼ Int → Bool. As we generate constraints, we need to take care that these constraints are consistent. An inconsistent set of constraints Finally, we can return to the sub-goal with goal type α1, apply asserts that in-equal types are equal, e.g., Int ∼ Bool. If at gen-unify to generate at type [Int] (because α1 ∼ [γ1] ∼ any point we arrive at a set of inconsistent constraints, we [Int]), and the choose l via gen-var, generating a final, trivial know that the currently generated term is not well-typed constraint [Int] ∼ [Int]. and should be rejected. In our , this amounts Because we are ensuring that our generated constraint sets to using the consistent helper relation to check that when- are consistent at every step, the order in which we explore ever we output a constraint set from a rule that it is indeed generation sub-goals doesn’t matter. However, the order consistent. consistent(C) simply asserts that a unifier (i.e., heavily influences how much backtracking we will need to type substitution) exists for C as described in Figure 5. perform because of “bad” choices of terms. For example, if rather than synthesizing map first, we instead try to synthe- size its arguments, we may make many spurious choices if there are non-list types in the context. Thus, even though our constraint-based synthesis system allows us to consider TyDe ’19, August 18, 2019, Berlin, Germany Peter-Michael Osera

i k i Γ :: = · · · | K:∀ai .C ⇒ τk → T ai , Γ Contexts GADTs. In particular, we follow the presentation of Out- sideIn [25] and represent GADTs as standard polymorphic Figure 7. Syntactic extensions for GADT synthesis. constructors coupled with constraints C on the type vari- ables that appear in the output type of the constructor. This allows us to avoid a step of unification during type inference types in a more incremental fashion, we must still be con- and synthesis to discover these constraints on our own. As scious of performance during implementation. In this partic- an example, we can rewrite the declaration of Exp above in ular case, it makes sense to attempt to synthesize functions this style: before their arguments since the function is more likely to prune the search space of types than synthesizing one of its data Exp a where arguments. Lit :: a -> Exp a Plus :: (a ~ Int) => 3 Synthesis with GADTs Exp Int -> Exp Int -> Exp a Traditional algebraic types require that the output type of its Eq :: (a ~ Bool) => i Exp Int -> Exp Int -> Exp a constructors have the form T ai where the ai s are univer- sally quantified type variables from the type of the construc- If :: Exp Bool -> Exp a -> Exp a -> Exp a tor. Generalized algebraic data types (GADTs) lift this restric- tion so that the type arguments of T may be any type, not With this style, our GADT constructor definitions obey the just type variables. While a seemingly insignificant change, same restriction as regular algebraic datatypes—the type GADTs strike a powerful balance between power and com- arguments of the output type are type variables—but the plexity on the spectrum of rich type systems. addition of type constraints allow us greater freedom as to The quintessential example of GADTs-in-action is the what types the constructors produce. tagless interpreter given below in Haskell: Figure 8 gives the updated synthesis rules with generalized algebraic datatypes. Because GADTs only affect datatype data Exp a where declarations and constructors, we only need to update the Lit :: a -> Exp a rules for refinement; raw term generation is left untouched. Plus :: Exp Int -> Exp Int -> Exp Int We update the refinement judgment to carry around the Eq :: Exp Int -> Exp Int -> Exp Bool set of constraints C gained through case analysis of GADT If :: Exp Bool -> Exp a -> Exp a -> Exp a values. Unlike term generation, additional constraints gained through GADT analysis are scoped to their matches. Be- eval :: Exp a -> a cause of this, we only need to thread constraints down the eval (Lit x) = x refinement judgment as an input and not gather updated eval (Plus e1 e2) = eval e1 + eval e2 constraints as an output. Most rules simply pass their con- eval (Eq e1 e2) = eval e1 == eval e2 straints to their sub-refinement calls as in the case of lambdas eval (If e1 e2 e3) = (refine-gadt-lam) or to raw term generation (refine-gadt- if eval e1 then eval e2 else eval e3 guess). At the top-level, we begin refinement with the empty Here, we parameterize the type of Exp a by the type a that set of constraints (refine-gadt-binding). the expression evaluates to. This allows us to give eval the Now when generating a case expression (refine-gadt- rich type Exp a -> a which in turn allows us to write eval case), we also extract the constraints associated with each without any case analyses on the values it produces. constructor Ci and add it to the constraint set associated How do GADTs complicate typechecking and consequently with synthesizing that constructor’s match body. Notably, type-directed program synthesis? For example, consider the we instantiate the additional constraints Ci with the type Plus case of eval. If Plus was a regular algebraic datatype environment θ generated by unifying the type of the scru- (say, with type Exp -> Exp -> Exp), the body would not tinee with the output type of the constructors, written θCi . typecheck because (+) produces a value of numeric type Constraints are utilized during refinement when synthesiz- whereas the function expects a value of type a. To enable ing constructors (refine-gadt-data). We ensure that we the body of Plus to typecheck, the typechecker must de- only synthesize a constructor if the current set of constraints duce that because the input is a constructor that produces C implies all of the (instantiated) constraints bundled with an Exp Int, then a ∼ Int and thus we must typecheck the the constructor’s type. Note that this stands in contrast to body at type Int. In effect, consuming a GADT via a pattern inconsistencies detected during case generation. If the con- match introduces local type constraints into the environment straints of a constructor are inconsistent with the current that we must consider when typechecking each branch [25]. set of constraints when synthesizing a match alternative With some minor changes, we can adapt the constraint- then the alternative is unreachable and can be elided as an based framework we developed previously to accommodate optimization. Constraint-Based Type-Directed Program Synthesis TyDe ’19, August 18, 2019, Berlin, Germany

of the eval function introduced at the beginning of this sec- C; Γ ⊢ τ ⊲ W ⇒ e tion. After an initial application of refine-gadt-binding refine-gadt-lam and refine-gadt-lam, we arrive at the following refinement Γ ⊲ j fresh x C; x:τ1, ⊢ τ2 {[vj /x]Sj 7→ χj } ⇒ e state: j C; Γ ⊢ τ1 → τ2 ⊲ {Sj 7→ vj ⇒ χj } ⇒ λx. e eval e =  :: a ⊲ W refine-gadt-data Γ = eval : ∀a. Exp a → a, e : Exp a K:∀a i .C ′ ⇒ τ k → T a i ∈ Γ i k i = i ′ C · . θ = [τi /ai ] C  θC k At this point, we can apply the refine-gadt-case rule to j k = j Γ ⊲ synthesize a case analysis on e. Let’s just focus on the Plus Wk {Sj 7→ χk } C; ⊢ θτk Wk ⇒ ek j constructor which has type: k Γ i ⊲ j k C; ⊢ T τi {Sj 7→ K χk } ⇒ K ek ∀b. (b ∼ Int) ⇒ Exp Int → Exp Int → Exp b. refine-gadt-case j (We α-rename the constructor’s bound type variable to b k Γ i ′ i . j i Γ in order to distinguish it from the type variable from the C; ⊢ T τi ⇒ e;C Kj :∀ai Cj ⇒ τk → T ai ∈ j overall type of the function a). Unifying the return type of Γ i ⊲ C; ⊢ e;τi ; Kj ;τ W ⇒ mj Plus with e yields the type substitution θ = [a/b] which then j C; Γ ⊢ τ ⊲ W ⇒ case e of mj yields the final constraint a ∼ Int which is added to the set refine-gadt-guess of constraints when synthesizing the body of Plus’s match at ′ j C; Γ ⊢ τ ⇒ e;C Sje ≡β χj goal type a. This constraint is then passed to term generation consistent(C ′) which is then used by gen-unify to change the type of the goal from a to Int which then justifies synthesizing an appli- Γ ⊲ j C; ⊢ τ {Sj 7→ χj } ⇒ e cation of the (+) function. When we go to generate recursive refine-gadt-unify function calls to eval, the constraint is also utilized when C  a ∼ τ C; Γ ⊢ τ ⊲ W ⇒ e instantiating the polymorphic type of eval to Exp a → a C; Γ ⊢ a ⊲ W ⇒ e with a ∼ Int. Γ i ⊲ C; ⊢ e;τi ; K;τ W ⇒ m 3.1 Metatheory refine-gadt-match There are two properties that we ought to consider of the i ′ k i Γ K = j K:∀ai .C ⇒ τk → T ai ∈ W |e {Sj 7→ χj } synthesis system we have developed so far: j i k = ∗ j k • Soundness states that synthesized programs are con- θ [τi /ai ] Sje −→ K vk fresh xk j sistent with their specification. k ′ k j ′ • Completeness states that we are able to synthesize C ∪ θC ; xk :θτk , Γ ⊢ τ ⊲ {[v /xk ] Sj 7→ χj } ⇒ e k all programs that meet a particular specification. Γ ⊢ i ⊲ ⇒ k → ′ C; e;τi ; K;τ W K xk e Here, we briefly discuss these two properties. A summary of Γ ⊢ σ ⊲ X ⇒ b the complete system as well as detailed proofs can be found refine-gadt-binding in the extended version of this paper [18]. fresh x Soundness The kinds of specification we consider in this ‰vs(τ ) ⊆ a i ·;c :a k, Γ ⊢ τ ⊲ {· 7→ χ j } ⇒ e i k k j system are types and examples. Thus, we can consider sound- i k j 4 Γ ⊢ ∀ai . τ ⊲ ∀ck :ak . χj ⇒ x = e ness with respect to each: Lemma 3.1 (Type Soundness). If C; Γ ⊢ τ ⊲ W ⇒ e then Figure 8. Extensions to synthesis system for GADTs. C; Γ ⊢ e : τ . Proof Sketch. By induction on the synthesis derivation. Intu- itively, we synthesize terms in a type-directed manner (the Finally, like with term generation, we need to use the con- synthesis rules are derived directly from the typing rules of straints in refinement to refine the goal type when appropri- the system), so we expect these terms to be well-typed.  ate. The unification rule for refinement (refine-gadt-unify) Γ ⊲ operates identically to its generation counterpart but at type Lemma 3.2 (Example Soundness). If C; ⊢ τ W ⇒ e then variables: at a type variable a, we can use a’s assigned type for all S 7→ χ ∈ W , Se ≡β χ. as recorded by C instead. 4Note that, for brevity’s sake, we only state the relevant properties for As an example of the system in action, consider how con- expression refinement. Similar statements must be made for each of the straints are generated and utilized while synthesizing part remaining judgments of the system—match synthesis and term generation. TyDe ’19, August 18, 2019, Berlin, Germany Peter-Michael Osera

Proof Sketch. By induction on the synthesis derivation. Dur- -- ing refinement, we decompose examples in such a way that -- | {@ example satisfaction of sub-components results in example -- fromMaybe :: a -> Maybe a -> a satisfaction of the whole program. During term generation, -- fromMaybe a1 Nothing = a1 the refine-gadt-guess rule ensures example satisfaction -- fromMaybe a1 (Just a2) = a2 directly for generated terms.  -- @@ -- ctx=(Just, Nothing) Completeness This property states that our synthesis sys- -- @} tem is capable of generating all possible programs given a fromMaybe :: a -> Maybe a -> a fixed goal type and set of examples. fromMaybe s1 m1 = _ Lemma 3.3 (Completeness). If C; Γ ⊢ e : τ and Γ ⊢ W : τ Scythe looks for special {@...@} sections in Haddock com- then C; Γ ⊢ τ ⊲ W ⇒ e. ments which denote Scythe example blocks. These example However, our system does not obey this property in the blocks contain type-and-example information as well as op- ctx name of efficiency! For example, note that gen-app does not tions to control the synthesizer such as the option that allow for the generation of case expressions in the position allows the user to specify the components to consider during Scythe of arguments. synthesis. Note that like Haskell, infers the universal But such cases are not terribly interesting with respect quantifier for polymorphic example values; all variables of ak a to completeness; a case nested in an argument position of the form where is a type variable introduced by the k an application could instead be “bubbled up” to encase the function signature and is a natural number are considered a entire application (at the cost of code redundancy). Are there to be polymorphic constants of type . Scythe more interesting programs that the system is incapable of After finding the hole, offers the following single synthesizing? We leave this exploration to future work. suggestion to complete it: > (ok) case m1 of 4 Integrating Constraint-based Synthesis > Nothing -> s1 in Scythe > Just a1 -> a1 We have implemented constraint-based synthesis for poly- 4.1 Experience with Polymorphic Synthesis morphic types as described in section 2 in an in-development To date, we have performed a preliminary investigation of program synthesis tool called Scythe. Scythe supports live, how type-and-example-driven polymorphic synthesis per- type-directed development for the Haskell programming lan- forms within Scythe over small toy functions. We reflect on guage by extending prior work on GHC for typed holes [7]. our experiences and lessons learned so far from this investi- The typed holes facility of GHC allows users to annotate gation that are motivating our future work in this area. their program with holes. During typechecking, GHC will report the inferred type of each hole and suggest possible Difficulties with Examples Polymorphic example values variables from the context to fill these holes. The search give us great flexibility in how we specify goals of poly- process currently in GHC is fairly naïve, only suggesting morphic type. However, as discussed in subsection 2.3, a individual components whose type matches the goal. Scythe collection of polymorphic example values can quickly be- provides a far more in-depth search of the space of possible come the actual function definition itself! This demonstrates programs using the program synthesis techniques described the power of combining polymorphism with example-based in this work. reasoning, but it makes the story for a tool like Scythe less To do this, Scythe leverages the unmodified GHC API compelling at first glance. to interact with existing Haskell code with a high degree This is certainly true when we are forced to give many of compatibility. It uses GHC to load external libraries and examples in order to appease the synthesizer. Early synthe- parse, typecheck, interpret, and manipulate Haskell code. sizers such as Myth had the problematic requirement of On the surface, this seems difficult to do since the type-and- trace completeness that enough examples are provided so example-directed program synthesis process interweaves a that the synthesizer can carry out execution of any recur- number of compilation processes, in particular, typecheck- sive function call made during the synthesis process without ing and partial evaluation. However with some engineering having the entire function formed. techniques and compromises, we can adopt off-the-shelf However, Scythe lifts this restriction by not immediately compiler APIs to provide this rich of editing experience, rejecting candidate programs that fail due to a lack of ex- similar to the efforts of Merlin for OCaml [2]. amples. A detailed discussion of this feature is beyond the As an example of the system in operation, here is the scope of this paper, but the importance of this fact is that declaration of the fromMaybe example from subsection 2.3 Scythe can produce meaningful output with little-to-no ex- as realized in the prototype version of Scythe: amples! As a result, polymorphic example values become Constraint-Based Type-Directed Program Synthesis TyDe ’19, August 18, 2019, Berlin, Germany

-- | {@ > Just a1 -> s1 -- stutter :: [a] -> [a] > (?) s1 -- stutter [a1, a2] = [a1, a1, a2, a2] The question marks in the output denote that while each -- @@ reported program has the right type, they have not been -- recArg=0 found to be consistent with any examples (since we have -- @} not provided any!). However, there are so few programs to stutter :: [a] -> [a] report due to polymorphism, the user can simply review each stutter l = _ implementation to see if it matches the behavior they want for the function. This stands in contrast to the shallower > (ok) case l1 of search that GHC’s hole-filling mechanism conducts which > [] -> l1 only suggests s1 as a valid completion.6 > (:) a1 l2 -> (:) a1 ((:) a1 (stutter l2)) It might be the case that some polymorphic function of -- | {@ interest is too complicated to be specified with examples. -- append :: [a] -> [a] -> [a] This example shows that synthesis techniques like those -- append [a1, a2] [a3, a4, a5] = used by Scythe still have the potential to provide meaningful -- [a1, a2, a3, a4, a5] feedback, even when only types drive the synthesis process. -- @@ -- recArg=0 5 Further Extensions -- @} A constraint-based approach to program synthesis opens append :: [a] -> [a] -> [a] the doors to a number of additional features that can greatly append l1 l2 = _ expand the scope and power of the system. We briefly re- flect on them, suggesting next steps forward based on the > (ok) case l1 of techniques developed in this paper. > [] -> l2 > (:) a1 l3 -> (:) a1 (append l3 l2) Existential Types One glaring omission from the system described in Figure 4 is the lack of existential types. In Figure 9. Example Scythe runs for stutter and append Haskell, existential types are encoded as universal types in constructors that are not mentioned in the output type of the constructor. The classic example of this are a type used much more appealing to use. For example, Figure 9 shows to box elements of a heterogeneous list: how the system only requires a single example to synthesize data Obj where Obj :: Show a => a -> Obj the stutter and append functions over lists. Note how the From this, we can create our heterogeneous list from a reg- examples (1) in contrast to fromMaybe, don’t so obviously im- ular list of Obj, e.g., [Obj 1, Obj "hi", Obj True]. With ply the implementation of the function and (2) resemble the this set up, we have no way of extracting the values from examples that you might naturally write in documentation. their Obj wrappers—the Obj effectively seals the type vari- The Power of Free Theorems The guarantees of polymor- able a from the outside world—but the typeclass constraint phic types are well-known [26] and are one of the primary allows us to traverse and show the values. motivations for our work in automating type-directed pro- When working with GADT types as in refine-gadt-case gramming with synthesis techniques. Parametricity tells us and refine-gadt-data, we must take care to identify which that the very shape of a polymorphic type determines very of the type variables are truly universal (that appear in the precisely the set of programs that inhabit that type. We wit- output type of the constructor) versus those that are existen- ness this in Scythe when we provide no examples to the tial (that do not appear in the output type of the constructor). system and rely on raw-term enumeration.5 For example, For example, if we were pattern matching on a value with here is the output from Scythe when no examples are given type Obj, its sole match would contain a binder for the ex- for fromMaybe: istentially bound variable a. However, we must ensure that we do not unify that variable with a concrete type as its type > (?) case m1 of is really sealed! This can be accomplished by calculating via > Nothing -> s1 a free variable check which type variables are universal or > Just a1 -> a1 existential and then skolemizing the existential variables so > (?) case m1 of that they cannot participate in the unification process. > Nothing -> s1 On top of this, we must also decide how to deal with exam- ples at existential type. Can we get away with representing 5By default Scythe limits the depth of nested function application and cases to avoid exponential blow-up in the search space. 6As of GHC version 8.6.4. TyDe ’19, August 18, 2019, Berlin, Germany Peter-Michael Osera existential types with polymorphic constants and relying on are used to tackle this problem in a variety of contexts from skolemization to assign those constants appropriate types general-purpose programming to highly-specific domains. during synthesis or do we need a new example form for Utilizing types as a guiding principle for synthesis is a small, existential values? yet important, piece of the puzzle, especially as it pertains to richly-typed functional programming languages and type- Typeclass Constraints GADTs are only one way that type directed programming. These approaches span the simply- constraints are introduced in a Haskell program. More com- typed, general-purpose domain [1, 19], components [4, 5, 9, mon are typeclass constraints, for example, the standard as- 12], and more richly-typed domains [6, 22]. Many of these lookup sociation list function of type prior efforts handle polymorphism but in an ad hoc manner lookup :: Eq a => a -> [(a, b)] -> Maybe b or using an “enumerate all possible types” approach. The constrains its first argument of type a to be an instance of approach that we present in the paper is the first, to our the eq typeclass which in turns provides the method (==) to knowledge, to attempt an “infer while enumerate” approach compare as with. Typeclasses are ubiquitous in Haskell code, to type instantiation. especially in highly polymorphic code that mixes typeclass constraints with multiparamteter typeclasses and functional Type Inference and GADTs Our adaption of type infer- dependencies [10]. ence is heavily rooted in the constraint-based approached While seemingly unrelated at first glance, Vytiniotis et proposed by Odersky et al. with their HM(X) framework [14] al. demonstrate how the problems of typeclass constraints as well as the presentation of constraint-based typing found and GADTs inference can be solved through their constraint- in TAPL [20]. based type inference system, OutsideIn(X) and implication GADT type inference, while an undecidable problem [24], constraints [25]. In order to add typeclass constraints to our has enjoyed continuous refinement in order to develop in- synthesis system, we will likely need to adapt more of their ference that are tractable but also produce useful constraint representation and solving techniques. results [3, 10, 11, 13, 23–25]. While our system does not per- form GADT type inference directly, its approach to GADT Partial Type Information One of the goals of this paper type checking and constraint passing is inspired by Vytinio- is taking the first steps towards understanding how to marry tis’s OutsideIn(X) framework [25]. together constraint-based type inference techniques with program synthesis. However, oddly enough, even though Live Programming Our exploration of constraint-based, we’ve adopted a constraint-based approach to synthesis in type-directed program synthesis lives in the context of live this work, we still have made the problems of type inference programming support for type-directed programming. This and program synthesis rather separate! In particular, because exploration has begun to take shape in the PL community in we require that the top-level binding of our synthesis prob- several forms. First is the support for language servers which lem possesses a fully annotated type, we only need to per- provide efficient compilation services to live programming form type inference in a very limited setting—instantiating tools [2]. The second is support for hole-based development polymorphic types during term generation—which greatly that was exclusively found in dependently-typed program- simplified our approach. ming languages such as Coq, Agda, and Idris, but are now A more ambitious marriage of constraint-based type in- found in languages like Haskell [7]. The third is theoretical ference and program synthesis would not only leverage explorations of hole-based, interactive development [15, 16]. constraint-based reasoning techniques in synthesis but truly intertwine type inference and program synthesis. This would Acknowledgments involve removing restrictions of having concrete types every- We thank the TyDe 2019 reviewers for their valuable feed- where in the system (e.g., the scrutinee of refine-case) and back in improving this manuscript and the students of the reasoning more about how to incrementally refine goal types Grinnell Pioneer PL research group that have worked on in tandem with program discovery. Such a change would Scythe in its various forms over the last few years: Bogdan likely impact performance of the synthesizer significantly Abaev, List Berkowitz, Kevin Connors, Shelby Frazier, Reily but give greater flexibility to the user in terms of specifying Grant, Andrew Mack, Griffin Mareske, Ankit Pandey, Dhruv a program of interest when the types become complicated. Phumbhra, Jonathan Sadun, Zachary Segall, and Zachary Susag. 6 Related Work This material is based upon work supported by the Na- Program Synthesis with Types Program synthesis, the tional Science Foundation under Grant No. 1651817. Any automatic generation of programs from specification, is an opinions, findings, and conclusions or recommendations ex- enormous field encompassing the programming languages, pressed in this material are those of the author and do not formal verification, software engineering, and most recently necessarily reflect the views of the National Science Founda- the machine learning communities [8]. Many approaches tion. Constraint-Based Type-Directed Program Synthesis TyDe ’19, August 18, 2019, Berlin, Germany

References [13] Gabriela Moreira, Cristiano Vasconcellos, and Rodrigo Ribeiro. 2018. [1] Lennart Augustsson. 2004. [Haskell] Announcing Djinn, Version 2004- Type Inference for GADTs, Outsidein and Anti-Unification. In Pro- 12-11, a Coding Wizard. ceedings of the XXII Brazilian Symposium on Programming Languages [2] Frédéric Bour, Thomas Refis, and Gabriel Scherer. 2018. Merlin: A (SBLP 2018). ACM Press, Sao Carlos, Brazil, 51–58. hps://doi.org/10. Language Server for OCaml (Experience Report). Proceedings of the 1145/3264637.3264644 ACM on Programming Languages 2, ICFP (July 2018), 103:1–103:15. [14] Martin Odersky, Martin Sulzmann, and Martin Wehr. 1999. Type hps://doi.org/10.1145/3236798 Inference with Constrained Types. Theory and Practice of Object [3] Sheng Chen and Martin Erwig. 2016. Principal Type Inference for Systems 5, 1 (Jan. 1999), 35–55. hps://doi.org/10.1002/(SICI)1096- GADTs. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Sym- 9942(199901/03)5:1<35::AID-TAPO4>3.0.CO;2-4 posium on Principles of Programming Languages (POPL 2016). ACM [15] Cyrus Omar, Ian Voysey, Ravi Chugh, and Matthew A. Hammer. 2019. Press, St. Petersburg, FL, USA, 416–428. hps://doi.org/10.1145/ Live Functional Programming with Typed Holes. Proceedings of the 2837614.2837665 ACM on Programming Languages 3, POPL (Jan. 2019), 1–32. hps: [4] Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W. //doi.org/10.1145/3290327 Reps. 2017. Component-Based Synthesis for Complex APIs. In Pro- [16] Cyrus Omar, Ian Voysey, Michael Hilton, Jonathan Aldrich, and ceedings of the 44th ACM SIGPLAN Symposium on Principles of Pro- Matthew A. Hammer. 2017. Hazelnut: A Bidirectionally Typed Struc- gramming Languages (POPL 2017). ACM Press, Paris, France, 599–612. ture Editor Calculus. In Proceedings of the 44th ACM SIGPLAN Sympo- hps://doi.org/10.1145/3009837.3009851 sium on Principles of Programming Languages (POPL 2017). ACM Press, [5] John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesiz- Paris, France, 86–99. hps://doi.org/10.1145/3009837.3009900 ing Data Structure Transformations from Input-Output Examples. In [17] Peter-Michael Osera. 2015. Program Synthesis with Types. PhD Thesis. Proceedings of the 36th ACM SIGPLAN Conference on Programming Lan- University of Pennsylvania, Philadelphia, PA. guage Design and Implementation (PLDI 2015). ACM Press, Portland, [18] Peter-Michael Osera. 2019. Constraint-Based Type-Directed Program OR, USA, 229–239. hps://doi.org/10.1145/2737924.2737977 Synthesis. arXiv:1907.03105 [cs] (July 2019). arXiv:cs/1907.03105 [6] Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve [19] Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example- Zdancewic. 2016. Example-Directed Synthesis: A Type-Theoretic In- Directed Program Synthesis. In Proceedings of the 36th ACM SIG- terpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT PLAN Conference on Programming Language Design and Implemen- Symposium on Principles of Programming Languages (POPL 2016). ACM, tation (PLDI 2015). ACM Press, Portland, OR, USA, 619–630. hps: St. Petersburg, FL, USA. hps://doi.org/10.1145/2914770.2837629 //doi.org/10.1145/2737924.2738007 [7] Matthías Páll Gissurarson. 2018. Suggesting Valid Hole Fits for Typed- [20] Benjamin C. Pierce. 2002. Types and Programming Languages. MIT Holes (Experience Report). In Proceedings of the 11th ACM SIGPLAN Press, Cambridge, MA. International Symposium on Haskell (Haskell 2018). ACM Press, St. [21] Benjamin C. Pierce and David N. Turner. 2000. Local Type Inference. Louis, MO, USA, 179–185. hps://doi.org/10.1145/3242744.3242760 ACM Transactions on Programming Languages and Systems 22, 1 (Jan. [8] Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program 2000), 1–44. hps://doi.org/10.1145/345099.345100 Synthesis. Foundations and Trends in Programming Languages 4, 1-2 [22] Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Pro- (2017), 1–119. hps://doi.org/10.1561/2500000010 gram Synthesis from Polymorphic Refinement Types. In Proceedings of [9] Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. the 37th ACM SIGPLAN Conference on Programming Language Design Complete Completion Using Types and Weights. In Proceedings of the and Implementation (PLDI 2016). ACM Press, Santa Barbara, CA, USA, 34th ACM SIGPLAN Conference on Programming Language Design and 522–538. hps://doi.org/10.1145/2908080.2908093 Implementation (PLDI 2013), Vol. 48. ACM Press, New York, NY, USA, [23] Tom Schrijvers, Simon Peyton Jones, Martin Sulzmann, and Dimitrios 27–38. hps://doi.org/10.1145/2499370.2462192 Vytiniotis. 2009. Complete and Decidable Type Inference for GADTs. [10] Mark P. Jones. 1995. Functional Programming with Overloading and In Proceedings of the 14th ACM SIGPLAN International Conference on Higher-Order Polymorphism. In Advanced Functional Programming, Functional Programming (ICFP 2009). ACM Press, Edinburgh, Scotland, First International Spring School on Advanced Functional Programming 341. hps://doi.org/10.1145/1596550.1596599 Techniques-Tutorial Text. Springer-Verlag, Berlin, Heidelberg, 97–136. [24] Vincent Simonet and François Pottier. 2007. A Constraint-Based [11] Georgios Karachalias, Tom Schrijvers, Dimitrios Vytiniotis, and Si- Approach to Guarded Algebraic Data Types. ACM Transactions on mon Peyton Jones. 2015. GADTs Meet Their Match: Pattern-Matching Programming Languages and Systems 29, 1 (Jan. 2007), 1–es. hps: Warnings That Account for GADTs, Guards, and Laziness. In Proceed- //doi.org/10.1145/1180475.1180476 ings of the 20th ACM SIGPLAN International Conference on Functional [25] Dimitrios Vytiniotis, Simon Peyton Jones, Tom Schrijvers, and Martin Programming (ICFP 2015). ACM Press, Vancouver, BC, Canada, 424–436. Sulzmann. 2011. OutsideIn(X) Modular Type Inference with Local hps://doi.org/10.1145/2784731.2784748 Assumptions. Journal of Functional Programming 21, 4-5 (Sept. 2011), [12] Susumu Katayama. 2012. An Analytical Inductive Functional Program- 333–412. hps://doi.org/10.1017/S0956796811000098 ming System That Avoids Unintended Programs. In Proceedings of [26] Philip Wadler. 1989. Theorems for Free!. In Proceedings of the Fourth the ACM SIGPLAN 2012 Workshop on Partial Evaluation and Program International Conference on Functional Programming Languages and Manipulation (PEPM 2012). ACM Press, Philadelphia, Pennsylvania, Computer Architecture (FPCA 1989). ACM Press, Imperial College, Lon- USA, 43. hps://doi.org/10.1145/2103746.2103758 don, United Kingdom, 347–359. hps://doi.org/10.1145/99370.99404