Learning in System F

Joey Velez-Ginorio Nada Amin Massachusetts Institute of Technology Harvard University Cambridge, Massachusetts, U.S.A. Cambridge, Massachusetts, U.S.A. [email protected] [email protected] Abstract data Nat= Zero Program synthesis, type inhabitance, inductive program- | S Nat ming, and theorem proving. Different names for the same problem: learning programs from data. Sometimes the pro- succ :: Nat -> Nat grams are proofs, sometimes they’re terms. Sometimes data succn = [,] are examples, and sometimes they’re types. Yet the aim is the same. We want to construct a program which satisfies main :: Nat -> Nat some data. We want to learn a program. main = succ What might a look like, if its pro- grams could also be learned? We give it data, and it learns a program from it. This work shows that System F yields succ :: Nat -> Nat an approach for learning from types and examples. Beyond succn = case n[Nat] of simplicity, System F’s expressivity gives us the potential for Zero ->S Zero two key ideas. (1) Learning as a first-class activity. That is, S n2 ->S (S n2) learning can happen anywhere in a program. (2) Bootstrap- ping its learning to higher-level languages—much the same lam x: (R.(Unit -> R) -> (R -> R) -> R). way we bootstrap the capabilities of a functional core to forall X. modern, high-level functional languages. This work takes lam c0: (Unit -> R). some first steps in that direction of the design space. (lam c1: (R -> R). Keywords: Program Synthesis, , Inductive Pro- (c1 (((x R) c0) c1)) gramming Figure 1. An example learning problem, the learned solution 1 Introduction in pretty print, and the learned solution in System F. 1.1 A tricky learning problem It takes a natural number, and seems to return its successor. Imagine we’re teaching you a program. Your only data is Any ideas? Perhaps a program which computes... the type 푛푎푡 →푛푎푡. It takes a natural number, and returns a natural number. Any ideas? Perhaps a program which 푓 (푥) = 푥 + 1, 푓 (푥) = 푥 + 2 − 1, 푓 (푥) = 푥 + · · · computes... The good news is that 푓 (푥) = 푥 + 1 is correct. And so are all 푓 (푥) = 푥, 푓 (푥) = 푥 + 1, 푓 (푥) = 푥 + · · · the other programs, as long as we’re agnostic to some details. Types and examples impose useful constraints on learning. 푓 (푥) 푥 + The good news is that = 1 is correct. The bad news It’s the data we use when learning in System F [Girard et al. is that the data let you learn a slew of other programs too. 1989]. It doesn’t constrain learning enough if we want to teach Existing work can learn successor from similar data [Osera 푓 (푥) 푥 + = 1. As teachers, we can provide better data. 2015; Polikarpova et al. 2016]. But suppose 푛푎푡 is a church Round 2. Imagine we’re teaching you a program. But this encoding. For some base type 퐴, 푛푎푡 B (퐴 → 퐴) → (퐴 → time we give you an example of the program’s behavior. 퐴). Natural numbers are then higher-order functions. They 푛푎푡 →푛푎푡 푓 ( ) Your data are the type and an example 1 = 2. take and return functions. In this setting, existing work can no longer learn successor. Authors’ addresses: Joey Velez-Ginorio, Massachusetts Institute of Technol- ogy, 43 Vassar St., Cambridge, Massachusetts, 02139, U.S.A., [email protected]; 1.2 A way forward Nada Amin, Harvard University, 29 Oxford St., Cambridge, Massachusetts, 02138, U.S.A., [email protected]. The difficulty is with how to handle functions in the return type. The type 푛푎푡 →푛푎푡 returns a function, a program of 2018. 2475-1421/2018/1-ART1 $15.00 type 푛푎푡. To learn correct programs, you need to ensure https://doi.org/ candidates are the correct type or that they obey examples. Conference’17, July 2017, Washington, DC, USA Joey Velez-Ginorio and Nada Amin

Imagine we want to verify that our candidate program 푓 advantageous for learning. Treat this section as an answer obeys 푓 (1) = 2. With the church encoding, 푓 (1) is a function, to the following question: and so is 2. To check 푓 (1) = 2 requires that we decide func- Why learn in System F? tion equality—which is undecidable in a Turing-complete language [Sipser et al. 2006]. Functions in the return type create this issue. There are two ways out. 2.1 Syntax 1. Don’t allow functions in the return type, keep Turing- System F’s syntax is simple (Figure2). There aren’t many completeness. syntactic forms. Whenever we state, prove, or implement 2. Allow functions in the return type, leave Turing- things in System F we often use structural recursion on the completeness. syntax. A minimal syntax means we are succint when we Route 1 is the approach of existing work. They don’t allow state, prove, or implement those things. functions in the return type, but keep an expressive Turing- While simple, the syntax is still expressive. We can encode complete language for learning. This can be a productive many staples of typed : algebraic move, as many interesting programs don’t return functions. data types, inductive types, and more [Pierce 2002]. For ex- Route 2 is the approach we take. We don’t impose restric- ample, consider this encoding of products: tions on the types or examples we learn from. We instead 휏 휏 훼. 휏 휏 훼 훼 sacrifice Turing-completeness. We choose a language where 1 × 2 B ∀ ( 1 → 2 → ) → function equality is decidable, but still expressive enough to ⟨푒1, 푒2⟩ B Λ훼.휆푓 : (휏1 → 휏2 → 훼).푓 푒1푒2 learn interesting programs. Our work shows that this too is a productive move, as many interesting programs return 2.2 Typing functions. This route leads us to several contributions: System F is safe. Its typing ensures both progress and preser- • Detail how to learn arbitrary higher-order programs vation, i.e. that well-typed programs do not get stuck and in System F. (Section 2 & 3) that they do not change type [Pierce 2002]. When we intro- • Prove the soundness and completeness of learning. duce learning, we lift this safety and extend it to programs (Section 2 & 3) we learn. We omit the presentation of typing, as it’s simi- • Implement learning, extending strong theoretical guar- lar equivalent to a learning relation presented in the next antees in practice. (Section 4 & 5) section.

2 System F 2.3 Evaluation We assume you are familiar with System F, the polymor- System F is strongly normalizing. All its programs termi- phic . You should know its syntax, typing, nate. As a result, we can use a simple procedure for deciding and evaluation. If you don’t, we co-opt its specification in equality of programs (including functions). [Pierce 2002]. For a comprehensive introduction we defer the confused or rusty there. Additionally, we provide the 1. Run both programs until they terminate. specification and relevant theorems in the appendix. 2. Check if they share the same normal form, up to alpha- Our focus in this section is to motivate System F: its syn- equivalence (renaming of variables). tax, typing, and evaluation. And why properties of each are 3. If they do, they are equal. Otherwise, unequal.

Syntax

푒 = :: terms: 휏 ::= types: 푥 variable 휏 휏 푒 푒 1 → 2 function type 1 2 application ∀훼.휏 polymorphic type 휆푥:휏.푒 abstraction 훼 푒 ⌈휏⌉ type application type variable 훼.푒 Λ type abstraction Γ ::= contexts: 푣 = · empty :: values: 푥:휏, Γ variable 휆푥:휏.푒 abstraction 훼, Λ훼.푒 type abstraction Γ type variable

Figure 2. Syntax in System F Learning in System F Conference’17, July 2017, Washington, DC, USA

For example, this decision procedure renders these programs Unlike typing, we only consider programs 푒 in normal equal: form. We discuss later how this pays dividends in the imple- mentation. With reference to the syntax in the appendix, we 휆푥 :휏.푥 =훽 (휆푦 : (휏 → 휏).푦)휆푧 :휏.푧 define System F programs 푒 in normal form: The decision procedure checks that two programs exist in the 푒 푒 휆푥 휏.푒 훼.푒 transitive reflexive closure of the evaluation relation. This B ˆ | : | Λ only works because programs always terminate in System F 푒ˆ B 푥 | 푒ˆ 푒 | 푒ˆ ⌈휏⌉ [Girard et al. 1989]. 3.1 Learning, a relation 3 Learning from Types If you squint, you may think that learning in Figure4 looks a We present learning as a relation between contexts Γ, pro- lot like typing in Figure3. The semblance isn’t superficial, but grams 푒, and types 휏. instead reflects an equivalence between learning and typing which we later state. Despite this, the learning relation isn’t 휏 푒 Γ ⊢ ⇝ redundant. It forms the core of an extended learning relation The relation asserts that given a context Γ and type 휏, you in the next section, where we learn from both types and can learn program 푒. examples. (L-Var) says if 푥 is bound to type 휏 in the context Γ, then Like typing, we define the relation with a set of infer- 푥 휏 ence rules. These rules confer similar benefits to typing. We you can learn the program of type . can prove useful properties of learning, and the rules guide 푥:휏 ∈ 푥:휏 (L-Var) implementation. 푥:휏 ⊢ 휏 ⇝ 푥

Typing Γ ⊢ 푒 : 휏

푥 : 휏 ∈ Γ Γ, 훼 ⊢ 푒 : 휏 (T-Var) (T-TAbs) Γ ⊢ 푥 : 휏 Γ ⊢ Λ훼.푒 : ∀훼.휏

Γ, 푥:휏1 ⊢ 푒2 : 휏2 Γ ⊢ 푒 : ∀훼.휏1 (T-Abs) (T-TApp) Γ ⊢ 휆푥:휏1.푒2 : 휏1 → 휏2 Γ ⊢ 푒 ⌈휏2⌉ : [휏2/훼]휏1

Γ ⊢ 푒1 : 휏1 → 휏2 Γ ⊢ 푒2 : 휏1 (T-App) Γ ⊢ 푒1푒2 : 휏2

Figure 3. Typing in System F

Learning Γ ⊢ 휏 ⇝ 푒

푥 : 휏 ∈ Γ Γ, 훼 ⊢ 휏 ⇝ 푒 (L-Var) (L-TAbs) Γ ⊢ 휏 ⇝ 푥 Γ ⊢ ∀훼.휏 ⇝ Λ훼.푒

Γ, 푥:휏1 ⊢ 휏2 ⇝ 푒2 Γ ⊢ ∀훼.휏1 ⇝ 푒ˆ (L-Abs) (L-TApp) Γ ⊢ 휏1 → 휏2 ⇝ 휆푥:휏1.푒2 Γ ⊢ [휏2/훼]휏1 ⇝ 푒ˆ ⌈휏2⌉

Γ ⊢ 휏1 → 휏2 ⇝ 푒ˆ Γ ⊢ 휏1 ⇝ 푒 (L-App) Γ ⊢ 휏2 ⇝ 푒ˆ 푒

Figure 4. Learning from types in System F Conference’17, July 2017, Washington, DC, USA Joey Velez-Ginorio and Nada Amin

(L-Abs) says if 푥 is bound to type 휏1 in the context and you 4 Learning from Examples 푒 휏 can learn a program 2 from type 2, then you can learn the To learn from examples, we extend our learning relation to 휆푥 휏 .푒 휏 휏 푥 program : 1 2 from type 1 → 2 and is removed from include examples [휒]. the context. Γ ⊢ 휏 [휒] ⇝ 푒 , 푥 휏 휏 휏 휆푦 휏 .푦 Γ : 1 ⊢ 2 → 2 ⇝ : 2  휏 (L-Abs) The relation asserts that given a context Γ, a type , and Γ ⊢ 휏1 →휏2 →휏2 ⇝ 휆푥:휏1.휆푦:휏2.푦 examples [휒], you can learn program 푒. Examples are lists of tuples with the inputs and output to a 푒 (L-App) says that if you can learn a program ˆ from type program. For example, [⟨1, 1⟩] describes a program whose in- 휏 →휏 푒 휏 푒 푒 1 2 and a program from type 1, then you can learn ˆ put is 1 and output is 1. If we want more than one example, we 휏 from type 2. can add to the list: [⟨1, 1⟩, ⟨2, 2⟩]. And with System F, types [⟨푁푎푡, , ⟩, ⟨푁푎푡, , ⟩] Γ ⊢ 휏 →휏 ⇝ 푓 Γ ⊢ 휏 ⇝ 푥 are also valid inputs. So 1 1 2 2 describes a (L-App) polymorphic program instantiated at type 푁푎푡 whose input Γ ⊢ 휏 ⇝ 푓 푥 is 1 and output is 1. In general, an example takes the form (L-TAbs) says that if 훼 is in the context, and you can learn 휒 B ⟨푒, 휒⟩ | ⟨휏, 휒⟩ | ⟨푒, 푁푖푙⟩ 푒 휏 훼.푒 a program from type , then you can learn a program Λ Importantly, the syntax doesn’t restrict what can be an 훼.휏 훼 from type ∀ and is removed from the context. input or output. Any program 푒 or type 휏 can be an in- put. Likewise, any program 푒 can be an output. We can de- Γ, 훼 ⊢ 훼 →훼 ⇝ 휆푥:훼.푥 (L-TAbs) scribe any input-output relationship in the language. Note Γ ⊢ ∀훼.훼 →훼 ⇝ Λ훼.휆푥:훼.푥 that we use the following short-hand notation for examples: ⟨휏, ⟨푒 , ⟨푒 , 푁푖푙⟩⟩⟩ ≡ ⟨휏, 푒 , 푒 ⟩ (T-TApp) says that if you can learn a program 푒ˆ from 1 2 1 2 . ∀훼.휏 푒 ⌈휏 ⌉ type 1, then you can learn the program ˆ 2 from type 4.1 Learning, a relation [휏2/훼]휏1. Unlike the previous learning relation, Figure5 looks a bit for- Γ ⊢ ∀훼.훼 →훼 ⇝ 푓 eign. Nevertheless, the intuition is simple. We demonstrate (T-TApp) with learning polymorphic identity from examples. With- Γ ⊢ 휏 →휏 ⇝ 푓 ⌈휏⌉ out loss of generality, assume a base type 푁푎푡 and natural numbers. 3.2 Metatheory · ⊢ ∀훼.훼 →훼 [⟨푁푎푡, 1, 1⟩] ⇝ ■ Learning is a relation. Hence we can discuss its metatheory. Examples describe possible worlds. ⟨푁푎푡, 1, 1⟩ is a world We care most about two properties: soundness and com- where ■’s input is 푁푎푡 and 1, with an output of 1. Through- pleteness. Soundness ensures we learn correct programs. out learning we need a way to keep track of these distinct Completeness ensures we learn all programs. worlds. So our first step is always to duplicate ■, so that we We state the relevant theorems in this section. Most of the have one per example. We also introduce empty let bindings heavy lifting for proving these properties is done by stan- to record constraints on variables at later steps. (L-Wrld) in dard proofs of type systems, like progress and preservation. Figure5 formalizes this step. Learning exploits these properties of type systems to provide similar guarantees. · ⊢ 훼.훼 →훼 [⟨푁푎푡, 1, 1⟩] ⇝ [let (·) 푖푛 ■]  훼.훼 →훼 Lemma 3.1 (Soundness of Learning). Now, our target type is . So we know ■ can bind 훼 If Γ ⊢ 휏 ⇝ 푒 then Γ ⊢ 푒 : 휏 a variable to some type. And since we have inputs in [⟨푁푎푡, 1, 1⟩], we know what that type is. (L-ETAbs) in Figure Lemma 3.2 (Completeness of Learning). 5 formalizes this step. If Γ ⊢ 푒 : 휏 then Γ ⊢ 휏 ⇝ 푒 훼 ⊢ 훼 →훼 [⟨1, 1⟩] ⇝ [let (훼 = 푁푎푡) 푖푛 ■] Structural induction on the learning and typing rules Our target type is now 훼 →훼. So we know ■ can bind a proves these two lemmas. And together, they directly prove variable 푥:훼 to some program. And since we have inputs in the equivalence of typing and learning. [⟨1, 1⟩], we know what that program is. (L-EAbs) in Figure Theorem 3.3 (Eqivalence of Typing and Learning). 5 formalizes this step. If and only if Γ ⊢ 휏 ⇝ 푒 then Γ ⊢ 푒 : 휏 훼, 푥:훼 ⊢ 훼 [⟨1⟩] ⇝ [let (훼 = 푁푎푡, 푥:훼 = 1) 푖푛 ■] Because of the equivalence we can extend strong metathe- There aren’t any inputs left to add bindings to our possible oretic guarantees to learning from examples, for which learn- worlds. Therefore, we invoke the relation for learning from ing from types is a key step. types (Figure4) to generate a candidate for ■. Then we check Learning in System F Conference’17, July 2017, Washington, DC, USA

Learning Γ ⊢ 휏 [휒] ⇝ 푒  휏 휒 , . . . , 휒 푒 ,..., 푒 Ó푛 푒 푒 Γ ⊢ [ 1 푛] ⇝ [let (·) in 1 let (·) in 푛] 푖=1 푖 =훽 푛  (L-Wrld) Γ ⊢ 휏 [휒1, . . . , 휒푛] ⇝ 푒  휏 푒 Ó푛 푏 푒 휏 Ó푛 푏 푒 휒 Γ ⊢ ⇝ 푖=1 (Γ ⊢ let ( 푖 ) in : ) 푖=1 (let ( 푖 ) in =훽 푖 ) (L-Base) Γ ⊢ 휏 [⟨휒1⟩,..., ⟨휒푛⟩] ⇝ [let (푏1) in 푒, . . . , let (푏푛) in 푒]  Γ, 푥:휏푎 ⊢ 휏푏 [휒1, . . . , 휒푛] ⇝ [let (푏1, 푥:휏푎 = 푒1) in 푒1,..., let (푏푛, 푥:휏푎 = 푒푛) in 푒푛] (L-EAbs)  Γ ⊢ 휏푎 →휏푏 [⟨푒1, 휒1⟩,..., ⟨푒푛, 휒푛⟩] ⇝ [let (푏1) in 휆푥:휏푎.푒1,..., let (푏푛) in 휆푥:휏푎.푒푛]  Γ, 훼 ⊢ 휏푎 [휒1, . . . , 휒푛] ⇝ [let (푏1, 훼 = 휏푏) in 푒1,..., let (푏푛, 훼 = 휏푏) in 푒푛] (L-ETAbs)  Γ ⊢ ∀훼.휏푎 [⟨휏푏, 휒1⟩,..., ⟨휏푏, 휒푛⟩] ⇝ [let (푏1) in Λ훼.푒1,..., let (푏푛) in Λ훼.푒푛] 

Figure 5. Learning from examples in System F that this candidate evaluates to the correct output in each Learning from examples maintains strong metatheoretic possible world. properties. As a result, it forms the basis for a powerful theory for learning programs from examples. However, the 훼, 푥 훼 ⊢ 훼 푥 let (훼 = 푁푎푡, 푥 훼 = ) 푖푛 푥 = : ⇝ : 1 훽 1 relation is non-deterministic, and does not directly translate With our examples satisfied, we can trivially extract the to an algorithm. Nevertheless, it does provide a blueprint for program Λ훼.휆푥:훼.푥 from the nested let expression. Note what that algorithm looks like—as we saw when learning that these expressions are merely a syntactic sugaring of polymorphic identity. In the next sections we explore the System F programs. strength of this blueprint by implementation. (·) 푖푛 푡 ≡ 푡 let 5 Implementation let (푥:휏 = 푠) 푖푛 푡 ≡ (휆푥:휏.푡)푠 Throughout this section we discuss the transition from the- let (훼 = 휏) 푖푛 푡 ≡ (Λ훼.푡)⌈휏⌉ ory to practice. In doing so, we answer two questions: let (푥:휏 = 푡 ,푏푠) 푖푛 푡 ≡ let (푥:휏 = 푡 ) 푖푛 let (푏푠) 푖푛 푡 1 2 1 2 • Does our learning relation yield an algorithm for learn- where 푏 B · | 푥:휏 = 푒,푏 | 훼 = 휏,푏 ing programs from examples? The answer is yes. • Can we lift learning in System F to learn in higher-level 4.2 Metatheory languages? The answer is also yes, but complicated. As before, we care most about two properties: soundness and completeness. Soundness ensures we learn correct programs 5.1 Language: System F + Sugar from examples. Completeness ensures we learn all programs Our implementation is System F with sugared Haskell-like from examples. syntax for features of modern functional languages: datatype We state the relevant theorems in this section. This time, declarations, function declarations, case analysis, and let there is no equivalence to typing. As a result, the proofs bindings. The only new syntactic form we introduce is for are more distinct. We omit the proofs here, but they are specifying learning problems. Otherwise, we desugar, type- not controversial. Completeness and soundness proofs for check, learn, and run programs in System F (and in that similar relations exist in [Osera 2015]. order). Lemma 4.1 (Soundness of Learning). We use standard methods for desugaring these features to System F. Algebraic data types and case analysis are the most If Γ ⊢ 휏 [휒1, . . . , 휒푛] ⇝ 푒 then Γ ⊢ 푒 : 휏 involved. Algebraic data are encoded as folds, and case anal-  Lemma 4.2 (Completeness of Learning). ysis is syntactic sugar for providing the requisite functions If Γ ⊢ 푒 : 휏 and there exist some [휒1, . . . , 휒푛] which satisfies 푒, for the fold [Girard et al. 1989; Jansen 2013]. As a result, our then Γ ⊢ 휏 [휒1, . . . , 휒푛] ⇝ 푒. case syntax deviates slightly from what you may be familiar  Conference’17, July 2017, Washington, DC, USA Joey Velez-Ginorio and Nada Amin with. We illustrate with example code from our implemented bothZero = [, language. ] data Bool= True When using these declarations for learning, the code above | False the declaration sets the context for learning. For example, when learning and, the user-defined helper function isZero data Nat= Zero will be in context, and so will constructors for Bool and | S Nat Nat. Interestingly, once we learn and it becomes part of the learning context for bothZero. This means we can set up a isZero :: Nat -> Bool curriculum (a sequence) of learning problems. In practice, isZeron = case n[Bool] of we find that setting up a curriculum leads to far quicker Zero -> True synthesis. For instance, the program for bothZero is much S n2 -> False simpler if you already know and. Therefore it benefits to set up a curriculum, as we did, where you learn and first. There are two key distinctions from the typical case syntax. The type application on the case argument, which specifies 5.2 Learning from examples the return type of the case analysis. And the lack of recursive Learning from examples has two stages. The first stage col- function call in the case analysis. For inductive data like lects constraints from your examples. The second stage learns natural numbers, this recursive call is typical. Due to our a term from the type which satisfies those constraints. encoding scheme, it’s not necessary to recursively call the Our implementation is a faithful rendition of the first stage, function. Instead, case analysis is syntactic sugar for defining captured by rules L-Wrld, L-EAbs, and L-ETAbs in Figure5. the functions we use to fold over algebraic data. Under the The algorithm for learning only spends a marginal amount hood, that case analysis looks like the following code snippet. of time at this stage, and follows closely to the informal . lamn :Nat.n[Bool] description given in section 4 1. (lam _:Unit.True) It’s worth noting that the formal rules here are also fairly (lam n2:Bool.False) simple. Early iterations of this work sought out to find a min- imal basis with which to express similar ideas in the type- Learning specifications are first-class entities in the lan- directed synthesis literature [Osera 2015]. We pare down guage, so they can appear anywhere a term can. If we en- the incidental details in this work, and reformulate similar counter a learning specification during evaluation, we learn mechanisms in concise notation. For this stage in particu- the program and then continue evaluation. In practice, we lar, this lends itself towards a near direct translation to our use a special declaration for learning specifications, since implementation. writing examples inside of other programs can quickly ob- Lastly, a key distinction in this work is that examples fuscate code. The declaration is a type signature followed by describe any input-output relationship in the language. Most a list of examples. related works impose restrictions on examples that break this property. As a result, learning isn’t a first class activity data Bool= True in the language. Learning can’t happen in arbitrary spots in | False a program, because you can’t have functions in the output. data Nat= Zero 5.3 Learning from types | S Nat The second stage is learning from types. We have constraints to satify, and now need to search for a program to satisfy isZero :: Nat -> Bool those constraints. The algorithm for learning spends almost isZeron = case n[Bool] of all its time at this stage. This is unsurprising, given that Zero -> True search can be demanding. When learning non-trivial pro- S n2 -> False grams, it’s always demanding. The translation from the rules in Figure4 to an algorithm and :: Bool -> Bool -> Bool is less direct than with learning from examples. So we list and = [, a set of design principles not inherent to the relation, but , which are crucial in the design of its implicit algorithm. , ] • Enumerate programs by increasing size, where size is the number of nodes in the program’s abstract syntax bothZero :: Nat -> Nat -> Nat tree. Learning in System F Conference’17, July 2017, Washington, DC, USA

• Enumerate programs in normal form. An insightful in the high-level language. This forces language designers analysis from [Osera 2015] describes how this dramat- who want to implement learning in their own languages to ically reduces the space of programs. adopt idiosyncracies of other high-level languages to enable • When instantiating a polymorphic type, only do so learning. System F has its own idiosyncracies for sure, but from types in the context. these are amenable to the development of a broad swath of languages—each where we can lift the learning machinery. These are the crux of the design commitments we make A weakness of the current implementation is that learning which deviate from the relation in Figure4. For the first two complex programs is difficult. Since all our programs desugar points, [Osera 2015] is a treatise for implementing relations to System F, they are larger than their sugared representa- like Figure4. For a more comprehensive tutorial on those tion. In practice, learning succeeds (generally, not just in design commitments we defer the interested there. Instead, this work) when there are means to tame the combinatorial we focus on the last point. explosion in programs. As programs grow larger, the more Unrestricted polymorphism is profoundly expressive [Gi- likely you are to feel the explosion. rard et al. 1989]. And this makes learning hard. Because While our design strategy in lifting synthesis is quite dis- if you enrich your language with polymorphism, suddenly tinct from related works, we still wanted to be able to learn there are many more programs which inhabit a particular similar programs. And we do, but performance suffers. This type. The last commitment is a way of taming the combina- is in part because our System F encodings are much larger, torial explosion inherent to polymorphism. Similar to type but also because we don’t make all the same optimizations as inference, learning is only possible in System F if we make related works. Because of this, we view the implementation some concessions. And this is one possible concession that as successfully demonstrating a proof of concept: that we can results in huge improvements in performance on programs lift learning in System F to higher-level languages to learn which manipulate algebraic data. non-trivial programs. It sets the stage for further research in The rules (L-App) and (L-TApp) are what trigger the combi- this direction, helping to carve what functional core ought natorial explosion. In (L-App), if I’m trying to learn a program to be. for type 휏2 then one possibility is an application of two pro- grams. To find that application, I have to learn a program of 휏 휏 type 1 → 2. But I don’t have the type on-hand, and have to 6 Experiments find all possible types to 휏2. With polymorphic types in the 휏 The implementation comes with a benchmark suite, whose context, there can be many ways to arrive at 2. For exam- results we show in Figure 5. They are a variety of programs ∀푋 .푋 →휏 ple, with 2 in the context there are many options: over algebraic data. Some are simple, and others are quite 퐵표표푙 →휏 푁푎푡 →휏 (퐵표표푙 →퐵표표푙) →휏 2, 2, 2, etc. complex like Figure 6: a curriculum for learning an inter- The inception of the heuristic came when looking at the preter for expressions in the natural number semiring. We System F encoding of case analyses over algebraic data. For now analyze these results. Ending with some intuitions about many useful programs, the type application in the encoding when to expect learning to succeed and when to expect it to is always at a type in the context. These type applications, fail. like the ones in previous examples, specify the return type. Generally, the language succeeds when algebraic data are And generally (but not always), the return type of a case simple and the folds required are easy to write in case syntax. analysis is a type that exists as a binding in the context. If it’s not easy to write the fold, then you can try and help by providing a curriculum. In many instances, this helps. 5.4 Strengths/Weaknesses Like with nat_bothZero, but not always. In instances where The strengths of the implementation are in its ability to han- curriculums don’t help, there’s simply so much in context dle functions in the output, which enable the learning of for learning that performance suffers. Or the shape of the a class of programs from examples not previously demon- algebraic data make it such that it’s easy to generate well- strated. As testament, we show in the next section a slew of typed but useless programs. Both of these are the case with programs over algebraic types that we learn. In each case, the natexpr in Figure 6. By the time learning happens at natexpr programs always return a function. Because each program there is a lot in the learning context. Additionally, the shape works over higher-order encodings of algebraic data. of natural numbers is such that it’s very easy to generate Another strength is the emphasis on a functional core for many terms of type Nat under any context. You can generate learning. The language we built on top of the functional core 1, 2, 3, etc. These candidates are generated despite the fact is incidental, and we merely lift the machinery for synthesis that constants aren’t frequently used for the programs we to the high-level language. But we could have built other write. Other algebraic data suffer from similar issues. high-level languages all while maintaining the same func- A path forward to resolving these bottlebecks are to extend tional core for learning. This departs from typical design the learning relation in Figure5 to better exploit examples strategies for learning, which embed the learning machinery from algebraic data. In [Osera 2015], they develop clever Conference’17, July 2017, Washington, DC, USA Joey Velez-Ginorio and Nada Amin

Name Size Time (s) # Ex # Help bexpr_eval 139 2.50 15 7 bool_nand 58 1.08 4 2 bool_or 42 0.06 4 2 bool_and 42 0.03 4 2 bool_not 53 0.02 2 2 bool_xor 72 3.75 4 2 data_fun 37 21.21 13 8 nat_bothZero 192 0.28 10 6 nexpr_eval 230 53.93 10 7 fun_plus 60 0.01 7 6 nat_plus 68 0.05 3 4 church_like 43 9.10 3 2 not_fun2 77 0.02 4 3 not_fun 36 0.05 4 3 poly_id 4 0.002 3 2 poly_twice 12 0.004 3 3 twice 38 0.002 3 3

Figure 6. Performance on a benchmark suite. Breaks down size of learned System F program (includes size of provided components as well). Time to learn. Number of examples. Number of helper functions: including constructors from datatype declarations, user-defined functions, and learned functions. ways to use examples to reduce the combinatorial explosion 7.2 Synquid of program space. The trick would be to express the seman- Synquid is a programming language which uses refinement tics of their learning formulation in terms of our functional types to learn programs [Polikarpova et al. 2016]. A refine- core for learning, System F. Another path is by moving up ment type lets you annotate programs with types containing the [Barendregt 1992]. logical predicates [Freeman 1994]. The logical predicates constrain the space of programs which inhabit that type, and 7 Related Work are used to guide learning. Consider learning with and without refinement types. I’m going to teach you a program. But all I’ll say is that its of type 7.1 Rosette 푛푎푡 → 푛푎푡. As we saw in the introduction, the type alone Rosette is a programming language which extends Racket isn’t enough to constrain learning. With synquid, instead of with machinery for synthesis, or learning [Torlak and Bodik going the route of examples they go the route of enriching 2013]. It imbues the language with SMT solvers.1 With them, types. Enter refinement types, which let you learn as follows. it’s possible to write programs with holes, called sketches We want you to learn a program. But all we say is that its [Solar-Lezama 2008], which an SMT solver can fill. If a sketch of type 푥:푛푎푡 → {푦:푛푎푡 | 푦 = 푥 +1}. It takes a natural number has holes amenable to reasoning via SMT, Rosette can be 푥, and returns a natural number 푦 one greater than 푥. Any quite effective. ideas? Perhaps a program which computes... For instance, recent work shows Rosette can reverse-engineer 푓 푥 푥 , 푓 푥 푥 , 푓 푥 푥 , parts of an ISA [Zorn et al. 2017]—the instruction set architec- ( ) = +1 ( ) = +2−1 ( ) = +3−2 ··· ture of a CPU (central processing unit). From bit-level data, The good news is that 푓 (푥) = 푥 + 1 is correct. And so are all you can recover programs which govern the behavior of the the other programs, as long as I’m agnostic to implementa- CPU. This works because SMT solvers support reasoning tion details. Refinement types impose useful constraints on about bit-vector arithmetic, the sort of arithmetic governing learning, it’s the crux of Synquid. low-level behavior of CPUs [Kroening and Strichman 2016]. 7.3 Myth Myth is a programming language which uses types and ex- 1Satisfiability modulo theories (SMT) is a spin on the satisfiability problem, amples to learn programs [Osera 2015]. Types, like in Syn- where one checks whether a boolean formulae is satisfied by a certain set quid, let you annotate programs by their behavior. But Myth of assignments to its variables. [Barrett and Tinelli 2018] doesn’t use refinement types, its types aren’t as expressive. Learning in System F Conference’17, July 2017, Washington, DC, USA data Nat= Zero | S Nat data NExpr= Val Nat | NAdd NExpr NExpr | NMul NExpr NExpr plus :: Nat -> Nat -> Nat plus = [,,< S Zero, Zero, S Zero>] mul :: Nat -> Nat -> Nat mulab = case a[Nat] of Zero -> Zero S c -> plus b c neval :: NExpr -> Nat neval = [, , , , , , ] main :: Unit main = neval

Figure 7. Learning an interpreter for arithmetic expressions in the natural number semiring. Multiplication provided by the user to speed up learning.

Instead of deferring to richer types, Myth offers examples So he formulated ISWIM, a functional core language with to constrain learning. Examples are nice because they’re which you could express other languages. And the idea per- often convenient to provide, and at times easier to provide sists to this day. Modern functional languages typically sit than richer types. Think of the difference between teaching on top a functional core. We don’t need fundamentally new chess by explaining all its principles, versus teaching chess languages for every programming task. by playing. In practice, a mix of both tends to do the trick. It’s a matter of empirical investigation, but the most im- Formally specifying the rules, but also letting example play portant idea in this paper is whether we need fundamentally guide learning as when my uncle taught me. new languages for every learning task. As it stands, the prolif- This work was most directly inspired by Myth, and we eration of learning tools (Rosette, Synquid, Myth) means the borrow and refine many ideas therein. Of special curiosity, proliferation of languages. But could there not be a functional that isn’t of focus in Myth is the capability of learning as a core with which to express other learning mechanisms? We first class activity in languages—which we further speculate tackle this question with the following contributions: about through the conclusion. • (On theory) A simple description of learning from examples as a relation. 8 Conclusion • (On practice) An implementation of learning from ex- In the 60’s, Peter Landin published The next 700 program- amples in System F. ming languages [Landin 1966]. He noticed a trend at the time, • (On practice) An implementation of a high-level lan- which was that everyone was designing different languages guage, which lifts learning from System F into itself. for solving different problems. Not because they had to, but because it was the only obvious design strategy. As a result, This work steps in a different direction of the design space, the idiosyncracies of the languages became interwined with and while there’s clearly plenty left to be desired, we see the idiosyncracies of the problems they were trying to solve. plenty trail ahead. Conference’17, July 2017, Washington, DC, USA Joey Velez-Ginorio and Nada Amin

References Henk P Barendregt. 1992. Lambda calculi with types. (1992). Clark Barrett and Cesare Tinelli. 2018. Satisfiability modulo theories. In Handbook of Model Checking. Springer, 305–343. Tim Freeman. 1994. Refinement Types for. ML Technical Report. Carnegie- Mellon University, Department of Computer Science, No. CMU-CS-94- 110. Jean-Yves Girard, Paul Taylor, and Yves Lafont. 1989. Proofs and types. Vol. 7. Jan Martin Jansen. 2013. Programming in the 휆-calculus: From Church to Scott and back. In The Beauty of Functional Code. Springer, 168–180. Daniel Kroening and Ofer Strichman. 2016. Decision procedures. Springer. Peter J Landin. 1966. The next 700 programming languages. Commun. ACM 9, 3 (1966), 157–166. Peter-Michael Santos Osera. 2015. Program synthesis with types. University of Pennsylvania, Department of Computer Science. PhD Thesis. (2015). Benjamin C Pierce. 2002. Types and programming languages. MIT press. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In ACM SIGPLAN Notices, Vol. 51. ACM, 522–538. Michael Sipser et al. 2006. Introduction to the Theory of Computation. Vol. 2. Thomson Course Technology Boston. Armando Solar-Lezama. 2008. Program synthesis by sketching. Citeseer. Emina Torlak and Rastislav Bodik. 2013. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software. ACM, 135–152. Bill Zorn, Dan Grossman, and Luis Ceze. 2017. Solver Aided Reverse Engi- neering of Architectural. Proceedings of the 44th Annual International Symposium on Computer Architecture (2017).