Toward Code Generation: A Survey and Lessons from Semantic

Celine Lee Justin Gottschlich Dan Roth Intel Labs, University of Pennsylvania Intel Labs, University of Pennsylvania University of Pennsylvania [email protected] [email protected] [email protected] Abstract With the growth of natural language processing techniques and demand for improved software engineering efficiency, there is an emerging interest in translating intention from human languages to programming languages. In this survey paper, we attempt to provide an overview of the growing body of research in this space. We begin by reviewing natu- ral language semantic parsing techniques and draw parallels with program synthesis efforts. We then consider seman- tic parsing works from an evolutionary perspective, with specific analyses on neuro-symbolic methods, architecture, and supervision. We then analyze advancements in frame- works for semantic parsing for code generation. In closing, Figure 1. The Three Pillars of Machine Programming (credit: we present what we believe are some of the emerging open Gottschlich et al. [28]). challenges in this domain.

1 Introduction In this survey paper, we first provide an overview of the Machine programming (MP) is the field concerned with the processes of NL semantic parsing and of code generation in automation of all aspects of software development [28]. Ac- Sections 2 and 3, respectively. In Section 4, we summarize cording to Gottschlich et al. [28], the field can be reasoned the evolution of techniques in NL and code semantic parsing about across three pillars: intention, invention, and adaptation from the 1970s onward. Section 5 explores the question of (Figure 1). Intention is concerned with capturing user intent, supervision for semantic parsing tasks, while Section 6 dis- whether by natural language, visual diagram, software code, cusses modern advances in neural semantic parsing for code or other techniques. Invention explores ways to construct generation. To conclude, we consider some possible future higher-order programs through the composition of exist- directions of semantic parsing for code generation. ing – or novel creation of – algorithms and data structures. Adaptation focuses on transforming higher-order program 2 Natural Language Semantic Parsing representations or legacy software programs to achieve cer- Natural language analysis can be segmented into at least two tain characteristics (e.g., performance, security, correctness, fundamental categories: syntax and semantics. In general, etc.) for the desired software and hardware ecosystem. syntax is the way a natural language phrase is structured; In this paper, we discuss (i) semantic parsing and (ii) code the order and arrangement of words and punctuation. The se-

arXiv:2105.03317v1 [cs.SE] 26 Apr 2021 generation, which chiefly fall within the boundaries of the mantics is the meaning that can be derived from such syntax. intention and invention pillars, respectively. In the context For example, the two sentences “the boy cannot go” and “it of natural language (NL), semantic parsing is generally con- is not possible for the boy to go”, are semantically equivalent cerned with converting NL utterances (i.e., the smallest unit even though they are syntactically different. By lifting seman- of speech) to structured logical forms. Once done, these logi- tics from syntax, NL semantic parsers can map semantically cal forms can then be utilized to perform various tasks such equivalent sentences to the same logical form, even when as [41, 42, 47, 80, 86], machine transla- they have different syntaxes. An example of this is shown tion [5, 81], or code generation [38, 49, 73, 87], amongst other in Figure 2, where two syntactically different sentences are things. For semantic parsing for code generation, the end shown to be semantically equivalent using an abstract mean- logical form will usually be some form of software code. This ing representation (AMR) as its logical form (more details may be the direct output program [41, 49, 90, 94] or an inter- in Section 2.4). For the remainder of this section, we dis- mediate representation of code [21, 42, 68, 87, 88], which can cuss how NL semantic parsers extract meaning from natural subsequently be fed into a program synthesis component to language utterances into various machine-understandable generate syntactically sound and functionally correct code. representations. Figure 3. An example of a CCG category. Syntax (the ADJ represents an adjective part of speech) is paired with seman- tics (the lambda term expression) (credit: Artzi et al. [7]). Figure 2. Abstract meaning representation (AMR) of two semantically equivalent, but syntactically different sentences (credit: Banarescu et al. [12]).

2.1 Purpose of Natural Language Semantic Parsing The general purpose of structured logical forms for natural language is to have a representation that enables natural language understanding (NLU) and automated reasoning, amongst other things. Historically, NL semantic parsers have Figure 4. Sample SQL query for a flight reservation enabled computers to query databases [89], play card games (credit: Iyer et al. [37]). [27], and even act as conversational agents such as Apple’s Siri and Amazon’s Alexa. Semantic parsing has taken on many forms. Early hand-crafted rules made way for super- The sentence “he eat apples” is grammatically incorrect be- vised learning techniques. Then, due to the labor-intensive cause a single noun acts with a verb that must end in ‘s.’ and error-prone process of hand-labeling data for paired nat- Likewise, in semantic parsing, a logical form can be gram- ural language and logical form datasets, researchers turned matically constrained to ensure well-formedness. A set of their attention toward weakly supervised [9, 20] and other grammatical rules constrains the search space of possible learning techniques that tend to require less labeled data. outputs. Combinatory categorical grammar (CCG) [76, 77] is one such example of a popular grammar formalism. It has 2.2 Components of a Semantic Parsing System historically made frequent appearances in semantic parsing Given an NL input, a NL semantic parsing system gener- models [45, 91, 92] due to its ability to jointly capture syn- ates its semantic representation as a structured logical form. tactic and semantic information of the constituents which it These logical forms are also known as meaning represen- label. Consider the example in Figure 3. By pairing syntactic tations or programs. To generate these outputs, semantic and semantic information, CCGs can describe textual input parsing pipelines usually trained by some learning algorithm for a machine to understand the semantics with the syntax. to produce a distribution over the output search space and then search for a best scoring logical form in that distribu- 2.4 Meaning Representations tion. This logical form can then be executed against some Meaning representations are the end product of NL semantic environment to carry out an intended task (e.g. to query a parsing. The output logical formula, or meaning represen- database). Because these logical forms are generally designed tation, grounds the semantics of the natural language input to be machine-understandable representations, they tend to in the target machine-understandable domain. A simple ex- have a structured nature, following an underlying formalism ample of logical forms is a database query. Popular datasets by which some grammar can be used to derive valid logical explored in semantic parsing include Geo880, Jobs640, ATIS, forms. Insight into the grammar of a set of logical forms and SQL itself [20, 37, 89]. An example of a SQL query is can be leveraged by different semantic parsing methods to shown in Figure 4. Another class of semantic parsing outputs improve performance. For example, semantic parsing for is NL instructions to actions, such as the model in Figure 5 for code generation should have a meaning representation that learning Freecell card game moves. Note that in these cases, can deterministically be transformed into valid code. There- the semantic parser is specific to a particular domain, and fore, the output space can be constrained by the underlying may not generalize to out-of-domain inquiries or differing grammar formalisms that define syntactically sound code. data structures. NL semantic parsing efforts have also translated natural 2.3 Grammars language input semantics into expressions. Grammar in the context of natural language processing is The expressive power of lambda calculus allows the incorpo- a set of rules that govern the formation of some structure ration of constants, logical connectors, quantifications, and (parsing of NL utterances) to ensure well-formedness. A fa- lambda expressions to represent functions [92]. miliar example is English grammar: number, tense, gender Sentence: what countries border France and other features must be consistent in an English sentence. Logical Form: 휆푥.푐표푢푛푡푟푦(푥) ∧ 푏표푟푑푒푟푠(푥, 퐹푟푎푛푐푒) 2 Many code recommendation systems use code similarity to retrieve different implementations of the same intention. Semantic code representations can provide a foundation on which to extract code similarity. Facebook et al.’s Aroma system for code recommendation [51] represents a program Figure 5. A natural language sentence (denoted x), a logical by a simplified parse tree (SPT): the parse tree is defined as formula corresponding to the sentence (denoted y), and a a recursive structure of keyword and non-keyword tokens, lexical alignment between the two (denoted h) (credit: Gold- agnostic to implementation programming language. Intel wasser and Roth [27]). et al.’s MISIM system for code similarity [85] defines the context-aware semantic structure (CASS), which leverages the structural advantages of the SPT, but also carries the Meaning representations can also take on graph-based advantage of incorporating customizable parameters for a forms. Graph-based formalisms are advantageous because wider variety of code context scenarios. The CASS enables they are easier for humans to read and they are abstracted configuration at different levels of granularity, allowing it from the syntax of the input natural language sentences. Pop- to be more flexible for general, rather than domain-specific, ular graphical representations are the Abstract Meaning Rep- usage. Iyer et al. [35] present the program-derived semantics resentation [12], Semantic Dependency Parsing [17, 59], Uni- graph (PSG) to capture semantics of code at multiple levels versal Conceptual Cognitive Annotation [1], Dependency- of abstraction. Additionally, it offers one layer of concrete based Compositional Semantics [47], and CCGBank [34]. syntactic abstraction that is programming language specific, While most general purpose programming languages are which allows it to be converted into source code. not necessarily abstract in their representations of intention Program synthesis has also come in the flavor of machine (source code often involves details about variables, memory translation [18, 67, 81], pseudocode-to-code transformation allocation, class structure, etc.), code can serve as one of the [44], and inductive program synthesis by input-output exam- more fundamental machine-understandable logical forms. ples [11, 31, 57]. While found its infancy To lift semantics from the code’s syntax, recent works have in translation between human languages, a series of works gone into developing and using meaning representations of has gone into translating from natural language to program- code, such as the abstract syntax tree (AST) [2–4, 67, 71, 72], ming languages. Wong and Mooney [81] integrate machine sketching [74], simplified parse tree (SPT) [51], contextual translation techniques with semantic parsing to create a flow graph (XFG)13 [ ], context-aware semantic structure syntax-based WASP: word alignment-based semantic pars- (CASS) [85], and program-derived semantics graph (PSG) ing. Quirk et al. [67] further combine machine translation [35]. While none of these representation structures are identi- to extract ASTs consistent with a constructed grammar for cal, they somewhat share the common goal of being a vehicle If-This-Then-That (IFTTT) recipes. Chen et al. [18] tackle of machine-understandable meaning of the code, abstracted program translation by employing tree-to-tree deep neural to some extent from the syntax of code. We discuss the ap- networks to migrate code from one language into an ecosys- plication of some of these structures in Section 3. tem built in a different language. Kulal et al. [44] perform the task of mapping line-to-line pseudocode to code for pro- 3 Program Synthesis / Code Generation gram synthesis. Inductive program synthesis allows us to In this section, we review a subset of existing efforts for produce a program consistent with a set of input-output program synthesis and code generation. Program synthe- examples [11, 24, 30, 31, 48, 57, 63, 66]. In these approaches, sis is the task of generating software from some high-level the program synthesis problem can be framed as the follow- program specification54 [ ]. The purpose of this section is ing: first, encode input-output examples as constraints; then, to highlight parallels between this process of using a high- search over the space of possible programs with a constraint level program specification to generate code, and semantic solver. A key challenge in this technique is scalability. Pu parsing’s translation of natural language into meaning repre- et al. [66] demonstrate that inductive program synthesis can sentations. Insights from the fields of program synthesis and be made a scalable task by having a model iteratively se- machine programming can guide semantic parsing efforts, lect a small subset of representative examples, then train on and vice versa. each small subset at a time. Another aspect to consider is Code has many different dimensions. Despite what could evaluation. A fitness function determines how well a given be construed as a strict syntax for compile-ability, source candidate program satisfies some set of requirements. Given code can be reasoned about on many different levels: se- these requirements as a set of input-output examples, such mantics, runtime, memory footprint, and compiler details, to as in inductive program synthesis, the fitness function can be name a few. As a result, design and selection of a code rep- learned [53], resulting in more efficient program synthesis resentation structure, and the subsequent reasoning about because it is biased to search likely programs. said structure is an interesting and nuanced task. 3 The task of program synthesis tends to be more tractable if the output programming language is less complex and thus the output space is more compact. Declarative programming languages such as SQL [15], Halide [69], and GraphIT [93] express the intention of a computation without detailing its execution. For this reason, these types of languages are also called intentional programming languages in the field of machine programming. As a function of the syntax and semantics being closely related, programs written in inten- Figure 6. Encoder-decoder model for mapping NL to logical tional programming languages tend to be easier to reason forms (Credit: Dong and Lapata [21]). about in a semantic space. A byproduct of this phenomenon is that intentional programming languages can also be eas- ier to optimize. Neo [56] and Bao [55] are learned neural Other supervised learning approaches have induced gram- models that optimize SQL queries. Verified lifting40 [ ] also mars to represent the underlying semantics of a knowledge demonstrates an ability to optimize Halide code. base. In the literature, CCGs have been a popular and power- As we discuss the evolution of semantic parsing in the next ful tool to capture textual semantics by mapping sentences to few sections, we identify parallels between these techniques their logical forms [8, 9, 91, 92]. Artzi and Zettlemoyer [10] used for program synthesis techniques and the techniques outline a CCG-based NL semantic parsing framework that used in semantic parsing. maps sentences to lambda-calculus representations of their meaning. They induce a grammar and estimate the parame- 4 Evolution of Semantic Parsing ters of the compositional and non-compositional semantics This section will review the evolution of NL semantic parsing of the space [7]. Artzi’s approach uses corpus-level statistics techniques from early hand crafted rules in the 1970s to the at each lexicon update during learning to prune the induced rise of learned grammars via statistical learning methods in CCG lexicons. Performance improvements in this system the late 1990s and early 2000s, to modern neural semantic indicate an advantage to maintaining compact CCG lexicons. parsing efforts. However, these model are typically unable to learn entities not presented to the model during training. [6, 91]. A challenge in semantic parsing has been how to scale a 4.1 Rules-based Techniques for Semantic Parsing semantic parsing model for different domains. Kwiatkowski Many early approaches to NL semantic parsing used hand- et al. [45] present a learned ontology matching model that crafted syntactic pattern and phrase matching to parse the adapts the output logical form of the semantic parser for meaning from a given natural language input [39, 82]. These each target domain. The parser uses a probabilistic CCG rules were then augmented by grammars that incorporate to construct a domain-independent meaning representation knowledge of semantic categories [33, 50, 80] so that similar from the NL input, then uses the ontology matching model words can be generalized to sets of defined categories and to transform that representation into the target domain. then constrain the output space of the logical forms. How- ever, as a function of being based off of pattern-matching 4.3 Neural Models for NL Semantic Parsing rules, these systems are brittle, and as a function of relying The modern rise in neural models has been marked by the on defined semantic categories, these grammar-constrained increase in papers exploring encoder-decoder sequence-to- systems are domain-specific. sequence (seq2seq) frameworks for semantic parsing. These encoders and decoders are often built with recurrent architec- 4.2 Grammar and Rule Induction for Semantic tures such as LSTM cells (see Figure 6) in an effort to better Parsing capture the dynamic and sequential nature of both natu- Statistical methods for the induction of NL semantic pars- ral and programming languages. Neural models can enable ing grammars rose in the late 1990s and early 2000s. Zelle systems to automatically induce grammars, templates, and and Mooney [90] present a system that learns rules for lan- features instead of relying on humans to manually annotate guage parsing instead of relying on hand-crafting them. The them. This provides the model with flexibility to generalize system employs inductive logic programming methods to across languages and representations [42]. These encoder- learn shift-reduce parsers that map sentences into database decoder models can bear a resemblance to machine transla- queries. Kate et al. [41] also induce transformation rules that tion systems [81] except that NL semantic parsing systems map natural language to logical form. By learning and recur- produce a structured logical form, while machine transla- sively applying these rules, Kate et al.’s system maps natural tion systems traditionally translate a natural language input language inputs (or their corresponding syntactic parse trees) into another unstructured natural language output [5]. The to a parse tree in the target representation language. formal output structure of semantic parsing enables neural 4 decoders to lean on the logic of the induced grammars, while may be large. This can increase computational tractability retaining the simplicity of neural translation without latent challenges in the search problem for intermediate latent rep- syntactic variables[42, 68, 87]. Thus, this type of modeling resentations. (2) A weaker response signal has the potential can generalize to different environments while reducing de- for spurious logical forms: semantic parsing outputs that mand for custom-annotated datasets. It should be noted, happen to produce the correct response, but do not actually however, that these neural models tend to have a weakened carry the correct semantics. Modifications to supervision by ability to leverage the logical compositionality of traditional denotation have been proposed to address these challenges. rules-based approaches. Goldwasser and Roth [27] present a method of learning from Dong and Lapata [21] present a NL semantic parsing natural language instructions and a binary feedback signal model that employs an attention-enhanced encoder-decoder. to address these issues.By capturing alignments between NL In a nod to the hierarchical structure of their system’s logical instruction textual fragments and meaning representation forms, they design a top-down tree decoder to more faithfully logical fragments, the model can more closely interpret NL decode well-formed meaning representations. Decoding of instructions and hopefully reduce the potential for spurious each token is augmented by: (i) soft alignments from atten- forms. Pasupat and Liang [62] propose a system that per- tion computed by the encoder and (ii) parent-feeding con- forms dynamic programming on denotations to address the nections to provide contextual information from prior time issue of exploding latent search space and spurious results. steps. The resulting model has competitive performance and By observing that the space of possible denotations grows increased flexibility without need for any human-engineered more slowly than the space of logical forms, this dynamic features, making it more scalable to different domains. This programming approach can find all consistent logical forms work marks the advent of growing attention to neural models up to some bounded size. Then, because spurious logical for semantic parsing. forms will generally give a wrong answer once the context A more in-depth discussion of modern techniques in neu- changes, they generate a slightly modified context and exam- ral semantic parsing models is presented in Section 6, where ine whether the logical form produces the correct denotation we focus on semantic parsing models for code generation. in this alternate context. If it does not, this spurious logical form can be discarded. The combination of limited search 5 Supervision in Semantic Parsing space and pruning of spurious forms allows them to use Parallel to the evolution of NL semantic parsing techniques denotation as a stronger supervision signal. is the evolution of supervision for semantic parsing. Fully- Weakly Supervised Learning. Artzi and Zettlemoyer [8] supervised learning customarily mandates a corpus of paired use conversational feedback from un-annotated conversation natural language utterances and their corresponding logi- logs as a form of weak supervision to induce a NL semantic cal forms. However, this can be expensive to generate and parser. In this approach, Artzi and Zettlemoyer model the difficult to scale. meaning of user statements in the conversational log with Supervision by Denotation. Supervision by denotation latent variables, and define loss as a rough measure of how is proposed as an alternative, more scalable approach to well a candidate meaning representation matches the expec- NL semantic parsing. Instead of directly evaluating the suc- tations about the conversation context and dialog domain. cess of a semantic parser by comparing the output to a gold Artzi and Zettlemoyer [9] demonstrate the viability of weak standard “correct” logical form as done in fully-supervised supervision for mapping natural language instructions to learning, NL semantic parsing can be driven by the response actions in the robotics domain. They run the output of a of the context or environment. Clarke et al. [19] present a parsing component through an execution component and observe the result relative to a binary-feedback validation learning algorithm that relies only on a binary feedback sig- 1 nal for validation. The resulting parser is competitive with function in the environment. fully supervised systems in the Geoquery domain, demon- Self-supervised Learning. Poon and Domingos [65] present strating the feasibility of supervision by denotation. Liang a self-supervised system for NL semantic parsing by recur- et al. [47] also elide the need for labeled logical forms by in- sively clustering semantically equivalent logical forms to ab- ducing a NL semantic parsing model from question-answer stract away syntactical implementation details. This method, pairs. They introduce a new semantic representation, the however, does not ground the induced clusters in an ontology. dependency-based compositional semantics (DCS) to represent Poon [64] subsequently proposes a grounded unsupervised formal semantics as a tree. The model maps a NL utterance learning approach that takes a set of natural language ques- to DCS trees, aligns predicates within each tree to the span tions and a database, learns a probabilistic semantic grammar in the utterance that triggered it, and beam search for the over it, then constrains the search space using the database best candidates. However, at least two primary challenges arise with super- 1We discuss other emerging works of weak supervision, such as Snorkel [70], vision by denotation. (1) The space of potential denotations in Section 7. 5 schema. For a given input sentence, this system annotates mechanism [29] based on attention weights is presented to the sentence’s dependency tree with latent semantic states copy an environment token to the output. This allows the derived from the database. The dependency-based logical model to copy in variable names not seen in the training form is additionally augmented with a state space to account data or in the input utterance. for semantic relationships not represented explicitly in the dependency tree. This semantically-annotated tree can then be deterministically converted into the logical form, without 6.2 Grammar in Neural Semantic Parsing Models ever needing pre-labeled data. Goldwasser et al. [26] suggest Grammar also can guide and constrain the decoding pro- a confidence-driven self-supervised protocol for NL seman- cess of neural encoder-decoder models. Xiao et al. [83] pro- tic parsing. The model compensates for the lack of training pose a sequence-based approach to generate a knowledge data by employing a self-training protocol driven by confi- base query: they sequentialize the target logical forms, then dence estimation to iteratively re-train on high-confidence demonstrate the advantage of incorporating grammatical predicted datapoints. constraints into the derivation sequence. Further, when query- Open vocabulary NL semantic parsing replaces a formal ing information from semi-structured tables, Krishnamurthy knowledge base with a probabilistic base learned from a text et al. [42] demonstrate the benefits of enforcing type con- corpus. In this framework, language is mapped onto queries straints during query generation. They show that including with predicates (parts of the structure that make claims about explicit entity linking during encoding can bolster accuracy the actions or characteristics of the subject) derived directly because it ensures the generation of well-typed logical forms. from the domain text instead of provided through a grammar. As discussed in Section 3, code is dynamic in its represen- The model must also learn the execution models for these tation and semantics, making grammar especially interest- induced predicates [43, 46]. Gardner and Krishnamurthy [25] ing for generating source code. Unlike prior works which address the challenge of generalizing NL semantic parsing focus on domain-specific languages, which may have lim- models to out-of-domain and open-vocabulary applications. ited scope and complexity, Yin and Neubig [87] and Ling Instead of mapping language to a single knowledge base et al. [49] present methods for generating high-level general- query like traditional NL semantic parsers, this approach purpose programming language code such as Python. Ling maps language to a weighted combination of queries plus a et al. [49] present a data-driven sequence-to-sequence code distributional component to represent a much broader class generation method that implements multiple predictors: a of concepts. This approach can expand the applicability of generative model for code generation, and multiple pointer semantic parsing techniques to more complex domains. models to copy keywords from the input. Building off of Ling et al. [49], Yin and Neubig [87] point out that general 6 Modern Advances in Neural Semantic purpose languages are generally more complex in schema Parsing for Code Generation and syntax than other domain-specific semantic parsing chal- lenges. Therefore, code generation models could benefit from Recent semantic parsing efforts have largely centered around a grammar that explicitly captures the target programming neural models, especially encoder-decoder networks. In this language syntax as prior knowledge. Yin and Neubig [87] use section, we discuss modern advances in neural models for abstract syntax trees (ASTs) to constrain the search space by semantic parsing for code generation. exclusively allowing well-formed outputs: the model gener- ates a series of actions that produce a valid AST. The actions 6.1 Context and Variable Copying themselves are derived from the underlying syntax of the Information to generate a NL semantic parse is not always programming language. This approach demonstrates im- encapsulated entirely in a single input sentence. Surrounding proved Python code generation, but Yin and Neubig point context can provide valuable information for disambiguation out a decline in performance as the AST becomes more com- in NL semantic parsing [8]. Context can be fed into the plex, noting that there may be space for improvement in encoder to incorporate context to encoder-decoder models. scalability. Iyer et al. [38] develop an architecture that leverages pro- Code idioms are bits of code that occur frequently across a grammatic context to write contextually-relevant code. This code repository and tend to have a specific semantic purpose. encoder-decoder system feeds the encoder the input utter- They can be used to enhance grammars in semantic parsing. ance along with featurized representations of environment Iyer et al. [36] apply idiom-based decoding to reduce the identifiers, such as type and name of each variable and burden on training data in grammar-constrained semantic method. The decoding process attends first to the natural lan- parsing systems. The top-K most frequently seen idioms in guage, then to the variables and methods of the environment. the training set are used to collapse the grammar for decod- Performing this attention in two steps can help the model ing. This compact set of rules helps supervise the learning of match words in the natural language to the environment the semantic parser. However, there is a limit to how much identifier representations. Additionally, a supervised copy idioms help with performance. The paper notes that adding 6 too many idioms may cause the model to over-fit and be- framework that characterizes three uncertainties: model un- come less generalizable. Shin et al. [73] mine code idioms certainty, data uncertainty, and input uncertainty. By com- with the objective of helping models reason about high-level puting these confidence metrics for a given prediction and and low-level reasoning at the same time. The mined code using them as features in a regression model, they obtain a idioms are incorporated into the grammar as new named confidence score which can be used to identify which partof operators, allowing the model to emit entire idioms at once. the input or model contributes to uncertain predictions. This Incorporating the code idioms into the vocabulary of the uncertainty modeling allows models to better understand neural synthesizer enables it to generate high-level and low- what it does and does not know. The ability to target weak level program constructs interchangeably at each step, which parts of a model leads us to conversational programming is more like how human programmers write code. approaches, where the model can query the user for missing or uncertain information. 6.3 Sketching When performing NL semantic parsing to extract some logical flow, conversational AI allows models to remedy in- Solar-Lezama et al. [75] present the concept of sketching to complete or invalid requests. By engaging in a back-and- generate programs from a high-level description. Through forth dialogue between the program and the human user, sketching, developers communicate insight through a partial gaps and errors in logical forms can be filled and remedied, re- program, i.e. the sketch, that describes the high-level struc- spectively. Additionally, clarifications can reduce the search ture of an implementation. Then, synthesis procedures can space to promise a more accurate output [8, 16, 52]. generate the full implementation of this sketch in program- ming language. Dong and Lapata [22] present a semantic parsing decoder 7 Future Directions that operates at varying levels of abstraction for encoder- The task of connecting human communication with machine decoder semantic parsing. By operating as a representation data and capabilities, as in the intention pillar of machine of the basic meaning, the sketch can provide a basic idea of programming, has seen considerable headway over the past the meaning of the utterance while abstracting away from few decades. Moving forward in semantic parsing, we iden- the syntactic details of the actual implementation. The de- tify four areas of consideration for the next steps of semantic coder presented in this paper operates in two stages: (1) parsing for code generation: generating a coarse sketch of the meaning representation that omits low-level details, and (2) filling in those missing Evaluation. Current methods of evaluation for seman- details of arguments and variable names. This coarse-to- tic parsing correctness use strict matches to gold standards, fine decoding has the advantage of disentangling high and denotation, hand-wavy heuristics, or some other imperfect low-level semantic information so that the program can be form of measuring understanding of semantic meaning. Ex- modeled at different granularities. An attention-enhanced act match evaluation may incorrectly penalize mismatching RNN serves as the fine meaning decoder, using the coarse logical forms that actually produce the same semantic mean- sketch to constrain the decoding output. This approach does ing. Supervision by denotation can incorrectly reward spuri- not utilize syntax or grammatical information because the ous logical forms. Hand-wavy heuristics such as Bleu score sketch naturally constrains the output space. [61], which evaluates the "quality" of machine translation Nye et al. [58] present a method to dynamically integrate outputs via a modified n-gram precision metric, have been natural language instructions and examples to infer program shown to not reliably capture semantic equivalence for natu- sketches. Previous systems have mostly used static, hand- ral language [14] or for code [79]. Efforts to resolve the weak- designed intermediate sketch grammars, which tend to be nesses of each of these evaluation methods [20, 27, 62, 64, 65] rigid in how much a system relies on pattern recognition have not been ineffective, but do not seem to fully address vs. symbolic search. When given an easy or familiar task, the key issue, which is that an evaluator for semantic pars- Nye et al. [58]’s neuro-symbolic synthesis system can rely ing should have an understanding of semantic reasoning. We on learned pattern recognition to produce a more complete propose that an important next step may be to research ei- output. When given a hard task, the system can produce ther or both of: (i) an evaluator that can identify matching a less complete sketch and spend more time filling in that core semantics of different logical forms, and (ii) a meaning sketch using search techniques. By learning how complete representation that has one unique representation for any to make the sketch for different applications, this model can given semantic intention. employ sketching in a more fitting and customized manner. Evaluation of a model’s actual understanding could be another open challenge in semantic parsing. Rules- and 6.4 Conversational Semantic Parsing grammar-based systems [10, 33, 39, 41, 50, 80, 90] are gen- A remaining challenge in neural semantic parsers is inter- erally more transparent and thus more interpretable than pretability. Dong et al. [23] outline a confidence modeling neural models because the rules on which they are built 7 can be examined as the logic of the model. However, neu- only with code snippets of a limited length and complexity. ral models tend to function more as "black boxes," where However, actual functional programs tend to be longer and induced logic is mostly evaluated by examining the outputs. more complex than the code snippet outputs in current se- We recommend research moving forward to consider how to mantic parsing models. Generation of programs in a GPL design this neural learning to better capture the complex na- [49] is generally a more complex task than in a DSL due to ture of both natural language and source code in translating the wide variety of domains and the high-dimensional code natural language into source code. Compositional general- reasoning space. Eventually, we want to be able to work dy- ization is one way to evaluate how well a model understands namically with larger programs. Steps toward incorporating natural language. Oren et al. [60] investigate compositional context to and using conversational AI in neural semantic generalization in semantic parsing. They find that while parsing models have shown promising results, and should well-known and novel extensions to the seq2seq model im- continue to be pursued. prove generalization, performance on in-distribution data As the field moves into working with larger programs, is still significantly higher than on out-of-distribution data. existing datasets that match natural language utterances This suggests a need to make more drastic changes to the to code snippets will likely not be enough. To obtain more neural encoder-decoder approach in order to inject funda- data for neural deep learning, we recommend using data mental "understanding." Neural Module Networks (NMNs) generation models such as Snorkel [70]. Snorkel employs can also provide interpretability, compositionality, and im- weak supervision to create training data without any need proved generalizability to semantic parsers [32, 78]. Gupta for hand-labeled data. We also suggest that new datasets et al. [32] investigate how multiple task-specialized NMNs be developed that account for longer chunks of code. This can be composed together to perform complex reasoning can be done by factoring in context variables and functions, over unstructured knowledge such as natural language text. or by mining larger chunks of code with periodic natural However, the advanced interpretability of NMNs comes at language annotations in the form of comments. the cost of restricted expressivity because each NMN needs to be individually defined. Gupta et. al [32] reflect that this User Interfacing. Future work could also consider the tradeoff suggests future further research to bridge these rea- actual impact of code generation on programmer productiv- soning gaps. ity. Xu et al. [84] conducted a study that found no impact on code correctness or generation efficiency when the subjects Code Semantics Representation. We perceive that an were asked to work in a programming environment IDE with ongoing challenge in semantic reasoning about language a code generation plug-in. This begs the question of how to and code may be the lack of a unified code representation develop future models to be not just user-friendly, but user that can capture semantics and details enough to resolve am- aiding. Additionally, user interfaces for code generation may biguity but remain agnostic of programming language. The not share the same vocabulary between trained program- graph-based meaning representations discussed in Section mers and non-technical developers. There may be room for 2.4 are promising avenues to explore. They have shown to be natural language understanding research advances for mod- effective in code retrieval, so we propose they be deployed els to intuit programming concepts from plain English, free in a semantic parsing framework to evaluate their viability of such jargon as “list,” “graph,” and “iterate.” for semantic parsing and code generation. Big picture, developing this language-agnostic meaning 8 Broader Impact representation is a promising direction to facilitate a full The objective of this paper is to compile information and machine programming pipeline. If the meaning representa- synthesize ideas about the fields of semantic parsing for tion is transformable in all the directions between natural natural language and for code. Since the basis of this paper is language, meaning representation, and code, this unified in- primarily educational, we see its broader impacts to largely termediate representation enables a sort of "programming be positive. by conversation." Not only could software be generated from Semantic parsing for code generation will likely grant language, as discussed in this paper, it could also then be more people a greater capacity to create programs. How- translated from source code into natural language. Then ever, this enhanced capability also carries potential negative once in natural language form, intention could be iterated impacts. One possible misuse is the creation of malicious upon via an interactive conversational agent. The iterated code. The models outlined in this paper could be trained to representation could then be used to translate this refined learn to write computer viruses, identify security breaches intention back into software. in other peoples’ networks, or any of a number of dangerous Scope. Most current semantic parsing models reduce the behaviors. Aside from continuing to make this malicious scope of the problem (and thus, often, output space) to make code illegal, it may be worthwhile to embed safety features the problem more feasible. This can be done either by work- into the prior knowledge of these models, to discourage de- ing with DSLs, using a selected subset of APIs, or by working structive code. On the flip side, models for code generation 8 could also in the worst case generate programs with egre- Programs. In 5th International Conference on Learning Representations, giously poor performance. If deployed, these programs will ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceed- consume excessive computation and energy, thus negatively ings. [12] Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira impacting the environment. One way to mitigate this harm Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, is to encourage code reviews for generated code, just as code and Nathan Schneider. 2013. Abstract Meaning Representation for reviews exist now for human-created code. Another poten- Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop tial misuse of this technology that we would be concerned and Interoperability with Discourse. Association for Computational Lin- about is computer science students using this technology guistics, Sofia, Bulgaria, 178–186. https://www.aclweb.org/anthology/ W13-2322 to complete their programming assignments, rather than [13] Tal Ben-Nun, Alice Shoshana Jakobovits, and Torsten Hoefler. 2018. learning how to write them themselves. This will be difficult Neural Code Comprehension: A Learnable Representation of Code to discourage from the model development side, but devel- Semantics. In Proceedings of the 32nd International Conference on Neural opment and usage of a technology to identify coding style Information Processing Systems (Montréal, Canada) (NIPS’18). Curran could help instructors detect whether students are writing Associates Inc., Red Hook, NY, USA, 3589–3601. [14] Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re- their own code. Overall, researchers, as designers of these evaluating the Role of Bleu in Machine Translation Research. In 11th semantic parsing systems, should consider the safety and Conference of the European Chapter of the Association for Computational impact of their work on the many different realms of society. Linguistics. Association for Computational Linguistics, Trento, Italy. https://www.aclweb.org/anthology/E06-1032 References [15] Donald D. Chamberlin and Raymond F. Boyce. 1974. SEQUEL: A Structured English Query Language. In Proceedings of the 1974 ACM [1] Omri Abend and Ari Rappoport. 2013. Universal Conceptual Cogni- SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and tive Annotation (UCCA). In Proceedings of the 51st Annual Meeting of Control (Ann Arbor, Michigan) (SIGFIDET ’74). Association for Com- the Association for Computational Linguistics (Volume 1: Long Papers). puting Machinery, New York, NY, USA, 249–264. https://doi.org/10. Association for Computational Linguistics, Sofia, Bulgaria, 228–238. 1145/800296.811515 https://www.aclweb.org/anthology/P13-1023 [16] Shobhit Chaurasia and Raymond J. Mooney. 2017. Dialog for Language [2] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. to Code. In Proceedings of the Eighth International Joint Conference on 2018. Learning to Represent Programs with Graphs. In International Natural Language Processing (Volume 2: Short Papers). Asian Federation Conference on Learning Representations. of Natural Language Processing, Taipei, Taiwan, 175–180. https: [3] Uri Alon, Omer Levy, and Eran Yahav. 2019. code2seq: Generating //www.aclweb.org/anthology/I17-2030 Sequences from Structured Representations of Code. In International [17] Wanxiang Che, Yanqiu Shao, Ting Liu, and Yu Ding. 2016. SemEval- . Conference on Learning Representations 2016 Task 9: Chinese Semantic Dependency Parsing. In Proceedings [4] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. of the 10th International Workshop on Semantic Evaluation (SemEval- Code2vec: Learning Distributed Representations of Code. Proc. ACM 2016). Association for Computational Linguistics, San Diego, California, Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages. https: 1074–1080. https://doi.org/10.18653/v1/S16-1167 //doi.org/10.1145/3290353 [18] Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-Tree Neural [5] Jacob Andreas, Andreas Vlachos, and Stephen Clark. 2013. Semantic Networks for Program Translation. In Proceedings of the 32nd Inter- Parsing as Machine Translation. In Proceedings of the 51st Annual Meet- national Conference on Neural Information Processing Systems (Mon- ing of the Association for Computational Linguistics (Volume 2: Short tréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, Papers). Association for Computational Linguistics, Sofia, Bulgaria, 2552–2562. 47–52. https://www.aclweb.org/anthology/P13-2009 [19] James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. [6] Yoav Artzi, Dipanjan Das, and Slav Petrov. 2014. Learning Compact Driving Semantic Parsing from the World’s Response. In Proceedings of Lexicons for CCG Semantic Parsing. In Proceedings of the 2014 Confer- the Fourteenth Conference on Computational Natural Language Learning. ence on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Uppsala, Sweden, 18–27. Association for Computational Linguistics, Doha, Qatar, 1273–1283. https://www.aclweb.org/anthology/W10-2903 https://doi.org/10.3115/v1/D14-1134 [20] Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, and [7] Yoav Artzi, Nicholas Fitzgerald, and Luke Zettlemoyer. 2014. Semantic Eduard Hovy. 2019. Iterative Search for Weakly Supervised Semantic Parsing with Combinatory Categorial Grammars. In Proceedings of the Parsing. In Proceedings of the 2019 Conference of the North American 2014 Conference on Empirical Methods in Natural Language Processing: Chapter of the Association for Computational Linguistics: Human Lan- . Association for Computational Linguistics, Doha, Tutorial Abstracts guage Technologies, Volume 1 (Long and Short Papers). Association Qatar. https://www.aclweb.org/anthology/D14-2003 for Computational Linguistics, Minneapolis, Minnesota, 2669–2680. [8] Yoav Artzi and Luke Zettlemoyer. 2011. Bootstrapping Semantic https://doi.org/10.18653/v1/N19-1273 Parsers from Conversations. In Proceedings of the 2011 Conference [21] Li Dong and Mirella Lapata. 2016. Language to Logical Form with . Association on Empirical Methods in Natural Language Processing Neural Attention. In Proceedings of the 54th Annual Meeting of the for Computational Linguistics, Edinburgh, Scotland, UK., 421–432. Association for Computational Linguistics (Volume 1: Long Papers). https://www.aclweb.org/anthology/D11-1039 Association for Computational Linguistics, Berlin, Germany, 33–43. [9] Yoav Artzi and Luke Zettlemoyer. 2013. Weakly Supervised Learning https://doi.org/10.18653/v1/P16-1004 of Semantic Parsers for Mapping Instructions to Actions. Transactions [22] Li Dong and Mirella Lapata. 2018. Coarse-to-Fine Decoding for Neu- 1 (2013), 49–62. of the Association for Computational Linguistics https: ral Semantic Parsing. In Proceedings of the 56th Annual Meeting of the //doi.org/10.1162/tacl_a_00209 Association for Computational Linguistics (Volume 1: Long Papers). Asso- [10] Yoav Artzi and Luke S. Zettlemoyer. 2015. Situated understanding and ciation for Computational Linguistics, Melbourne, Australia, 731–742. learning of natural language. Ph.D. Dissertation. https://doi.org/10.18653/v1/P18-1068 [11] Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write 9 [23] Li Dong, Chris Quirk, and Mirella Lapata. 2018. Confidence Modeling User Feedback. In Proceedings of the 55th Annual Meeting of the As- for Neural Semantic Parsing. In Proceedings of the 56th Annual Meeting sociation for Computational Linguistics (Volume 1: Long Papers). Asso- of the Association for Computational Linguistics (Volume 1: Long Papers). ciation for Computational Linguistics, Vancouver, Canada, 963–973. Association for Computational Linguistics, Melbourne, Australia, 743– https://doi.org/10.18653/v1/P17-1089 753. https://doi.org/10.18653/v1/P18-1069 [38] Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. [24] John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing 2018. Mapping Language to Code in Programmatic Context. In Proceed- Data Structure Transformations from Input-Output Examples. SIG- ings of the 2018 Conference on Empirical Methods in Natural Language PLAN Not. 50, 6 (June 2015), 229–239. https://doi.org/10.1145/2813885. Processing. Association for Computational Linguistics, Brussels, Bel- 2737977 gium, 1643–1652. https://doi.org/10.18653/v1/D18-1192 [25] Matt Gardner and J. Krishnamurthy. 2017. Open-Vocabulary Semantic [39] Tim Johnson. 1984. Natural Language Computing: The Commercial Parsing with both Distributional Statistics and Formal Knowledge. In Applications. The Knowledge Engineering Review 1, 3 (1984), 11–23. AAAI. https://doi.org/10.1017/S0269888900000588 [26] Dan Goldwasser, Roi Reichart, James Clarke, and Dan Roth. 2011. Con- [40] Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar- fidence Driven Unsupervised Semantic Parsing. In Proceedings of the Lezama. 2016. Verified Lifting of Stencil Computations. SIGPLAN Not. 49th Annual Meeting of the Association for Computational Linguistics: 51, 6 (June 2016), 711–726. https://doi.org/10.1145/2980983.2908117 Human Language Technologies. Association for Computational Lin- [41] Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. 2005. Learning guistics, Portland, Oregon, USA, 1486–1495. https://www.aclweb.org/ to Transform Natural to Formal Languages. In Proceedings of the 20th anthology/P11-1149 National Conference on Artificial Intelligence - Volume 3 (Pittsburgh, [27] Dan Goldwasser and Dan Roth. 2014. Learning from natural in- Pennsylvania) (AAAI’05). AAAI Press, 1062–1068. structions. Machine Learning 94, 2 (1 Feb. 2014), 205–232. https: [42] Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner. 2017. Neural //doi.org/10.1007/s10994-013-5407-y Semantic Parsing with Type Constraints for Semi-Structured Tables. In [28] Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Proceedings of the 2017 Conference on Empirical Methods in Natural Lan- Carbin, Martin Rinard, Regina Barzilay, Saman Amarasinghe, Joshua B guage Processing. Association for Computational Linguistics, Copen- Tenenbaum, and Tim Mattson. 2018. The Three Pillars of Machine hagen, Denmark, 1516–1526. https://doi.org/10.18653/v1/D17-1160 Programming. arXiv:1803.07244 [cs.AI] [43] Jayant Krishnamurthy and Tom M. Mitchell. 2015. Learning a Compo- [29] Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorpo- sitional Semantics for Freebase with an Open Predicate Vocabulary. rating Copying Mechanism in Sequence-to-Sequence Learning. In Pro- Transactions of the Association for Computational Linguistics 3 (2015), ceedings of the 54th Annual Meeting of the Association for Computational 257–270. https://doi.org/10.1162/tacl_a_00137 Linguistics (Volume 1: Long Papers). Association for Computational Lin- [44] S. Kulal, Panupong Pasupat, K. Chandra, Mina Lee, Oded Padon, A. guistics, Berlin, Germany, 1631–1640. https://doi.org/10.18653/v1/P16- Aiken, and Percy Liang. 2019. SPoC: Search-based Pseudocode to Code. 1154 In NeurIPS. [30] Sumit Gulwani. 2011. Automating String Processing in Spreadsheets [45] Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer. Using Input-Output Examples. SIGPLAN Not. 46, 1 (Jan. 2011), 317–330. 2013. Scaling Semantic Parsers with On-the-Fly Ontology Matching. https://doi.org/10.1145/1925844.1926423 In Proceedings of the 2013 Conference on Empirical Methods in Natural [31] Sumit Gulwani and Prateek Jain. 2017. Programming by Examples: PL Language Processing. Association for Computational Linguistics, Seat- Meets ML. In Programming Languages and Systems - 15th Asian Sym- tle, Washington, USA, 1545–1556. https://www.aclweb.org/anthology/ posium, APLAS 2017, Suzhou, China, November 27-29, 2017, Proceedings D13-1161 (Lecture Notes in Computer Science, Vol. 10695), Bor-Yuh Evan Chang [46] Mike Lewis and Mark Steedman. 2013. Combined Distributional and (Ed.). Springer, 3–20. https://doi.org/10.1007/978-3-319-71237-6_1 Logical Semantics. Transactions of the Association for Computational [32] Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. Linguistics 1 (2013), 179–192. https://doi.org/10.1162/tacl_a_00219 2020. Neural Module Networks for Reasoning over Text. In Interna- [47] Percy Liang, Michael Jordan, and Dan Klein. 2011. Learning tional Conference on Learning Representations. Dependency-Based Compositional Semantics. In Proceedings of the [33] Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, and Jonathan 49th Annual Meeting of the Association for Computational Linguistics: Slocum. 1978. Developing a Natural Language Interface to Complex Human Language Technologies. Association for Computational Lin- Data. ACM Trans. Database Syst. 3, 2 (June 1978), 105–147. https: guistics, Portland, Oregon, USA, 590–599. https://www.aclweb.org/ //doi.org/10.1145/320251.320253 anthology/P11-1060 [34] Julia Hockenmaier and Mark Steedman. 2007. CCGbank: A Corpus [48] H. Lieberman. 2001. Your wish is my command: Programming by of CCG Derivations and Dependency Structures Extracted from the example. (01 2001). Penn Treebank. Computational Linguistics 33, 3 (2007), 355–396. https: [49] Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, //doi.org/10.1162/coli.2007.33.3.355 Tomáš Kočiský, Fumin Wang, and Andrew Senior. 2016. Latent Predic- [35] Roshni G. Iyer, Yizhou Sun, Wei Wang, and Justin Gottschlich. 2020. tor Networks for Code Generation. In Proceedings of the 54th Annual Software Language Comprehension using a Program-Derived Seman- Meeting of the Association for Computational Linguistics (Volume 1: Long tics Graph. arXiv:2004.00768 [cs.AI] Papers). Association for Computational Linguistics, Berlin, Germany, [36] Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learn- 599–609. https://doi.org/10.18653/v1/P16-1057 ing Programmatic Idioms for Scalable Semantic Parsing. In Proceed- [50] Peter C. Lockemann and Frederick B. Thompson. 1969. A Rapidly Ex- ings of the 2019 Conference on Empirical Methods in Natural Lan- tensible Language System (The REL Language Processor). In Interna- guage Processing and the 9th International Joint Conference on Nat- tional Conference on Computational Linguistics COLING 1969: Preprint ural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, No. 34. Sånga Säby, Sweden. https://www.aclweb.org/anthology/C69- November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiao- 3401 jun Wan (Eds.). Association for Computational Linguistics, 5425–5434. [51] Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. https://doi.org/10.18653/v1/D19-1545 2019. Aroma: Code Recommendation via Structural Code Search. Proc. [37] Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, ACM Program. Lang. 3, OOPSLA, Article 152 (Oct. 2019), 28 pages. and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from https://doi.org/10.1145/3360578

10 [52] Semantic Machines, Jacob Andreas, John Bufe, David Burkett, Charles Linguistics (Volume 1: Long Papers). Association for Computational Lin- Chen, Josh Clausman, Jean Crawford, Kate Crim, Jordan DeLoach, guistics, Sofia, Bulgaria, 933–943. https://www.aclweb.org/anthology/ Leah Dorner, Jason Eisner, Hao Fang, Alan Guo, David Hall, Kristin P13-1092 Hayes, Kellie Hill, Diana Ho, Wendy Iwaszuk, Smriti Jha, Dan [65] Hoifung Poon and Pedro Domingos. 2009. Unsupervised Semantic Klein, Jayant Krishnamurthy, Theo Lanman, Percy Liang, Christo- Parsing. In Proceedings of the 2009 Conference on Empirical Methods in pher H Lin, Ilya Lintsbakh, Andy McGovern, Aleksandr Nisnevich, Natural Language Processing: Volume 1 - Volume 1 (Singapore) (EMNLP Adam Pauls, Dmitrij Petters, Brent Read, Dan Roth, Subhro Roy, ’09). Association for Computational Linguistics, USA, 1–10. Jesse Rusak, Beth Short, Div Slomin, Ben Snyder, Stephon Striplin, [66] Yewen Pu, Zachery Miranda, Armando Solar-Lezama, and Leslie Kael- Yu Su, Zachary Tellman, Sam Thomson, Andrei Vorobev, Izabela bling. 2018. Selecting Representative Examples for Program Syn- Witoszko, Jason Wolfe, Abby Wray, Yuchen Zhang, and Alexander thesis. In Proceedings of the 35th International Conference on Ma- Zotov. 2020. Task-Oriented Dialogue as Dataflow Synthesis. Trans- chine Learning (Proceedings of Machine Learning Research, Vol. 80), actions of the Association for Computational Linguistics 8 (September Jennifer Dy and Andreas Krause (Eds.). PMLR, 4161–4170. http: 2020). https://www.microsoft.com/en-us/research/publication/task- //proceedings.mlr.press/v80/pu18b.html oriented-dialogue-as-dataflow-synthesis/ [67] Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Lan- [53] Shantanu Mandal, T. Anderson, Javier Turek, Justin Emile Gottschlich, guage to Code: Learning Semantic Parsers for If-This-Then-That Sheng-Tian Zhou, and A. Muzahid. 2020. Learning Fitness Functions Recipes. In Proceedings of the 53rd Annual Meeting of the Associ- for Machine Programming. arXiv: Neural and Evolutionary Computing ation for Computational Linguistics and the 7th International Joint (2020). Conference on Natural Language Processing (Volume 1: Long Papers). [54] Zohar Manna and Richard Waldinger. 1980. A Deductive Approach to Association for Computational Linguistics, Beijing, China, 878–888. Program Synthesis. ACM Trans. Program. Lang. Syst. 2, 1 (Jan. 1980), https://doi.org/10.3115/v1/P15-1085 90–121. https://doi.org/10.1145/357084.357090 [68] Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract [55] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Moham- Syntax Networks for Code Generation and Semantic Parsing. In Pro- mad Alizadeh, and Tim Kraska. 2020. Bao: Learning to Steer Query ceedings of the 55th Annual Meeting of the Association for Computational Optimizers. arXiv:2004.03814 [cs.DB] Linguistics (Volume 1: Long Papers). Association for Computational [56] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Linguistics, Vancouver, Canada, 1139–1149. https://doi.org/10.18653/ Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. v1/P17-1105 Neo: A Learned Query Optimizer. Proc. VLDB Endow. 12, 11 (July 2019), [69] Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain 1705–1718. https://doi.org/10.14778/3342263.3342644 Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Lan- [57] Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and guage and Compiler for Optimizing Parallelism, Locality, and Recom- Adam Kalai. 2013. A Machine Learning Framework for Programming putation in Image Processing Pipelines. SIGPLAN Not. 48, 6 (June by Example. In Proceedings of the 30th International Conference on 2013), 519–530. https://doi.org/10.1145/2499370.2462176 Machine Learning (Proceedings of Machine Learning Research, Vol. 28), [70] Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason Alan Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, Fries, Sen Wu, and C. Ré. 2019. Snorkel: rapid training data creation USA, 187–195. http://proceedings.mlr.press/v28/menon13.html with weak supervision. The Vldb Journal 29 (2019), 709 – 730. [58] Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, and Armando Solar- [71] Veselin Raychev, Pavol Bielik, and Martin Vechev. 2016. Probabilistic Lezama. 2019. Learning to Infer Program Sketches. , 4861–4870 pages. Model for Code with Decision Trees. In Proceedings of the 2016 ACM http://proceedings.mlr.press/v97/nye19a.html SIGPLAN International Conference on Object-Oriented Programming, [59] Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Dan Systems, Languages, and Applications (Amsterdam, Netherlands) (OOP- Flickinger, Jan Hajič, Angelina Ivanova, and Yi Zhang. 2014. SemEval SLA 2016). Association for Computing Machinery, New York, NY, USA, 2014 Task 8: Broad-Coverage Semantic Dependency Parsing. In Pro- 731–747. https://doi.org/10.1145/2983990.2984041 ceedings of the 8th International Workshop on Semantic Evaluation [72] Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting (SemEval 2014). Association for Computational Linguistics, Dublin, Program Properties from "Big Code". In Proceedings of the 42nd An- Ireland, 63–72. https://doi.org/10.3115/v1/S14-2008 nual ACM SIGPLAN-SIGACT Symposium on Principles of Programming [60] Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, and Jonathan Languages (Mumbai, India) (POPL ’15). Association for Computing Berant. 2020. Improving Compositional Generalization in Semantic Machinery, New York, NY, USA, 111–124. https://doi.org/10.1145/ Parsing. In Findings of the Association for Computational Linguistics: 2676726.2677009 EMNLP 2020. Association for Computational Linguistics, Online, 2482– [73] Eui Chul Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, 2495. https://doi.org/10.18653/v1/2020.findings-emnlp.225 and Alex Polozov. 2019. Program Synthesis and Semantic Pars- [61] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. ing with Learned Code Idioms. In Advances in Neural Informa- Bleu: a Method for Automatic Evaluation of Machine Translation. In tion Processing Systems 32: Annual Conference on Neural Informa- Proceedings of the 40th Annual Meeting of the Association for Computa- tion Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Van- tional Linguistics. Association for Computational Linguistics, Philadel- couver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina phia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083. Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Gar- 1073135 nett (Eds.). 10824–10834. https://proceedings.neurips.cc/paper/2019/ [62] Panupong Pasupat and Percy Liang. 2016. Inferring Logical Forms hash/cff34ad343b069ea6920464ad17d4bcf-Abstract.html From Denotations. In Proceedings of the 54th Annual Meeting of the [74] Armando Solar-Lezama. 2008. Program Synthesis by Sketching. Ph.D. Association for Computational Linguistics (Volume 1: Long Papers). Dissertation. USA. Advisor(s) Bodik, Rastislav. Association for Computational Linguistics, Berlin, Germany, 23–32. [75] Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, https://doi.org/10.18653/v1/P16-1003 and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs. [63] Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. SIGARCH Comput. Archit. News 34, 5 (Oct. 2006), 404–415. https: 2014. Test-Driven Synthesis. SIGPLAN Not. 49, 6 (June 2014), 408–418. //doi.org/10.1145/1168919.1168907 https://doi.org/10.1145/2666356.2594297 [76] Mark Steedman. 1987. Combinatory grammars and parasitic gaps. [64] Hoifung Poon. 2013. Grounded Unsupervised Semantic Parsing. In Pro- Natural Language & Linguistic Theory 5 (1987), 403–439. ceedings of the 51st Annual Meeting of the Association for Computational

11 [77] Mark Steedman and Jason Baldridge. 2007. Combinatory Categorical Republic, 678–687. https://www.aclweb.org/anthology/D07-1071 Grammar. [92] Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map [78] Sanjay Subramanian, Ben Bogin, Nitish Gupta, Tomer Wolfson, Sameer Sentences to Logical Form: Structured Classification with Probabilistic Singh, Jonathan Berant, and Matt Gardner. 2020. Obtaining Faithful Categorial Grammars. In Proceedings of the Twenty-First Conference Interpretations from Compositional Neural Networks. In ACL. on Uncertainty in Artificial Intelligence (Edinburgh, Scotland) (UAI’05). [79] Ngoc Tran, Hieu Tran, Son Nguyen, Hoan Nguyen, and Tien Nguyen. AUAI Press, Arlington, Virginia, USA, 658–666. 2019. Does BLEU Score Work for Code Migration? 2019 IEEE/ACM [93] Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, 27th International Conference on Program Comprehension (ICPC) (May Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High- 2019). https://doi.org/10.1109/icpc.2019.00034 Performance Graph DSL. Proc. ACM Program. Lang. 2, OOPSLA, Article [80] David L. Waltz. 1978. An English Language Question Answering 121 (Oct. 2018), 30 pages. https://doi.org/10.1145/3276491 System for a Large Relational Database. Commun. ACM 21, 7 (July [94] Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: 1978), 526–539. https://doi.org/10.1145/359545.359550 Generating Structured Queries From Natural Language Using Rein- [81] Yuk Wah Wong and Raymond Mooney. 2006. Learning for Semantic forcement Learning. arXiv:1709.00103 [cs.CL] Parsing with Statistical Machine Translation. In Proceedings of the Human Language Technology Conference of the NAACL, Main Confer- ence. Association for Computational Linguistics, New York City, USA, 439–446. https://www.aclweb.org/anthology/N06-1056 [82] W. A. Woods. 1973. Progress in Natural Language Understanding: An Application to Lunar Geology. In Proceedings of the June 4-8, 1973, National Computer Conference and Exposition (New York, New York) (AFIPS ’73). Association for Computing Machinery, New York, NY, USA, 441–450. https://doi.org/10.1145/1499586.1499695 [83] Chunyang Xiao, Marc Dymetman, and Claire Gardent. 2016. Sequence- based Structured Prediction for Semantic Parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1341–1350. https://doi.org/10.18653/v1/P16-1127 [84] Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2021. In-IDE Code Generation from Natural Language: Promise and Challenges. arXiv:2101.11149 [cs.SE] [85] Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nes- ime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, and Justin Gottschlich. 2020. MISIM: A Novel Code Similarity System. arXiv:2006.05265 [cs.LG] [86] Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic Parsing for Single-Relation Question Answering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Baltimore, Maryland, 643–648. https://doi.org/10.3115/v1/P14-2105 [87] Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In Proceedings of the 55th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 440–450. https://doi.org/10.18653/v1/P17-1041 [88] Pengcheng Yin and Graham Neubig. 2018. TRANX: A Transition- based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation. In EMNLP. [89] Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Nat- ural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3911–3921. https://doi.org/10.18653/v1/D18-1425 [90] John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Data- base Queries Using Inductive Logic Programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2 (Portland, Oregon) (AAAI’96). AAAI Press, 1050–1055. [91] Luke Zettlemoyer and Michael Collins. 2007. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP- CoNLL). Association for Computational Linguistics, Prague, Czech

12