Abstract Syntax Networks for Code Generation and Semantic Parsing

Abstract Syntax Networks for Code Generation and Semantic Parsing Maxim Rabinovich∗ Mitchell Stern∗ Dan Klein Computer Science Division University of California, Berkeley frabinovich,mitchell,[email protected] Abstract name: [ ’D’, ’i’, ’r’, ’e’, ’ ’, Tasks like code generation and semantic ’W’, ’o’, ’l’, ’f’, ’ ’, ’A’, ’l’, ’p’, ’h’, ’a’] parsing require mapping unstructured (or cost: [’2’] partially structured) inputs to well-formed, type: [’Minion’] rarity: [’Common’] executable outputs. We introduce ab- race: [’Beast’] class: [’Neutral’] stract syntax networks, a modeling frame- description: [ work for these problems. The outputs ’Adjacent’, ’minions’, ’have’, ’+’, ’1’, ’Attack’, ’.’] are represented as abstract syntax trees health: [’2’] attack: [’2’] (ASTs) and constructed by a decoder with durability: [’-1’] a dynamically-determined modular structure paralleling the structure of the output class DireWolfAlpha(MinionCard): tree. On the benchmark HEARTHSTONE def __init__(self): super().__init__( dataset for code generation, our model ob- "Dire Wolf Alpha",2, CHARACTER_CLASS.ALL, tains 79.2 BLEU and 22.7% exact match CARD_RARITY.COMMON, minion_type=MINION_TYPE.BEAST) def create_minion(self, player): accuracy, compared to previous state-of- return Minion(2,2, auras=[ Aura(ChangeAttack(1), MinionSelector(Adjacent())) the-art values of 67.1 and 6.1%. Further- ]) more, we perform competitively on the ATIS,JOBS, and GEO semantic parsing Figure 1: Example code for the “Dire Wolf Alpha” datasets with no task-specific engineering. Hearthstone card. 1 Introduction show me the fare from ci0 to ci1 Tasks like semantic parsing and code generation lambda $0 e are challenging in part because they are struc- ( exists $1 ( and ( from $1 ci0 ) tured (the output must be well-formed) but not ( to $1 ci1 ) synchronous (the output structure diverges from ( = ( fare $1 ) $0 ) ) ) the input structure). Figure 2: Example of a query and its logical form Sequence-to-sequence models have proven ef- from the ATIS dataset. The ci0 and ci1 tokens fective for both tasks (Dong and Lapata, 2016; are entity abstractions introduced in preprocess- Ling et al., 2016), using encoder-decoder frame- ing (Dong and Lapata, 2016). works to exploit the sequential structure on both the input and output side. Yet these approaches do not account for much richer structural con- version of CCG-based semantic parsing (Zettle- straints on outputs—including well-formedness, moyer and Collins, 2005). well-typedness, and executability. The well- In this work, we introduce abstract syntax formedness case is of particular interest, since it networks (ASNs), an extension of the standard can readily be enforced by representing outputs as encoder-decoder framework utilizing a modular abstract syntax trees (ASTs) (Aho et al., 2006), an decoder whose submodels are composed to na- approach that can be seen as a much lighter weight tively generate ASTs in a top-down manner. The ∗Equal contribution. decoding process for any given input follows a dynamically chosen mutual recursion between the ing a limited form of top-down recursion, but modules, where the structure of the tree being without the modularity or tight coupling between produced mirrors the call graph of the recursion. output grammar and model characteristic of our We implement this process using a decoder model approach. built of many submodels, each associated with a Neural (and probabilistic) modeling of code, in- specific construct in the AST grammar and including for prediction problems, has a longer his- voked when that construct is needed in the out- tory. Allamanis et al.(2015) and Maddison and put tree. As is common with neural approaches to Tarlow(2014) proposed modeling code with a structured prediction (Chen and Manning, 2014; neural language model, generating concrete syn- Vinyals et al., 2015), our decoder proceeds greed- tax trees in left-first depth-first order, focusing on ily and accesses not only a fixed encoding but metrics like perplexity and applications like code also an attention-based representation of the in- snippet retrieval. More recently, Shin et al.(2017) put (Bahdanau et al., 2014). attacked the same problem using a grammar-based Our model significantly outperforms previous variational autoencoder with top-down generation architectures for code generation and obtains com- similar to ours instead. Meanwhile, a separate line petitive or state-of-the-art results on a suite of se- of work has focused on the problem of program mantic parsing benchmarks. On the HEARTH- induction from input-output pairs (Balog et al., STONE dataset for code generation, we achieve a 2016; Liang et al., 2010; Menon et al., 2013). token BLEU score of 79.2 and an exact match ac- The prediction framework most similar in spirit curacy of 22.7%, greatly improving over the pre- to ours is the doubly-recurrent decoder network in- vious best results of 67.1 BLEU and 6.1% exact troduced by Alvarez-Melis and Jaakkola(2017), match (Ling et al., 2016). which propagates information down the tree using The flexibility of ASNs makes them readily ap- a vertical LSTM and between siblings using a hor- plicable to other tasks with minimal adaptation. izontal LSTM. Our model differs from theirs in We illustrate this point with a suite of seman- using a separate module for each grammar con- tic parsing experiments. On the JOBS dataset, struct and learning separate vertical updates for we improve on previous state-of-the-art, achiev- siblings when the AST labels require all siblings ing 92.9% exact match accuracy as compared to to be jointly present; we do, however, use a hori- the previous record of 90.7%. Likewise, we per- zontal LSTM for nodes with variable numbers of form competitively on the ATIS and GEO datasets, children. The differences between our models re- matching or exceeding the exact match reported flect not only design decisions, but also differences by Dong and Lapata(2016), though not quite in data—since ASTs have labeled nodes and la- reaching the records held by the best previous se- beled edges, they come with additional structure mantic parsing approaches (Wang et al., 2014). that our model exploits. Apart from ours, the best results on the code- 1.1 Related work generation task associated with the HEARTH- Encoder-decoder architectures, with and without STONE dataset are based on a sequence-to- attention, have been applied successfully both to sequence approach to the problem (Ling et al., sequence prediction tasks like machine translation 2016). Abstract syntax networks greatly improve and to tree prediction tasks like constituency pars- on those results. ing (Cross and Huang, 2016; Dyer et al., 2016; Previously, Andreas et al.(2016) introduced Vinyals et al., 2015). In the latter case, work has neural module networks (NMNs) for visual ques- focused on making the task look like sequence-to- tion answering, with modules corresponding to sequence prediction, either by flattening the output linguistic substructures within the input query. tree (Vinyals et al., 2015) or by representing it as The primary purpose of the modules in NMNs is a sequence of construction decisions (Cross and to compute deep features of images in the style of Huang, 2016; Dyer et al., 2016). Our work dif- convolutional neural networks (CNN). These fea- fers from both in its use of a recursive top-down tures are then fed into a final decision layer. In generation procedure. contrast to the modules we describe here, NMN Dong and Lapata(2016) introduced a sequence- modules do not make decisions about what to gen- to-sequence approach to semantic parsing, includ- erate or which modules to call next, nor do they ClassDef Call name func args body bases Name identifier “DireWolfAlpha” identifier Call Call Name FunctionDef FunctionDef “Aura” func args func args Name Name identifier identifier ... identifier ... “MinionCard” “__init__” “create_minion” identifier Num identifier Call “ChangeAttack” “MinionSelector” func args object Name 1 identifier “Adjacent” (a) The root portion of the AST. (b) Excerpt from the same AST, corresponding to the code snippet Aura(ChangeAttack(1),MinionSelector(Adjacent())). Figure 3: Fragments from the abstract syntax tree corresponding to the example code in Figure1. Blue boxes represent composite nodes, which expand via a constructor with a prescribed set of named children. Orange boxes represent primitive nodes, with their corresponding values written underneath. Solid black squares correspond to constructor fields with sequential cardinality, such as the body of a class definition (Figure 3a) or the arguments of a function call (Figure 3b). maintain recurrent state. primitive types: identifier, object, ... stmt 2 Data Representation = FunctionDef( identifier name, arg* args, stmt* body) 2.1 Abstract Syntax Trees | ClassDef( identifier name, expr* bases, stmt* body) Our model makes use of the Abstract Syntax | Return(expr? value) Description Language (ASDL) framework (Wang | ... et al., 1997), which represents code fragments as expr trees with typed nodes. Primitive types correspond = BinOp(expr left, operator op, expr right) | Call(expr func, expr* args) to atomic values, like integers or identifiers. Ac- | Str(string s) cordingly, primitive nodes are annotated with a | Name(identifier id, expr_context ctx) primitive type and a value of that type—for in- | ... stance, in Figure 3a, the identifier node stor- ... ing "create minion" represents a function of the same name. Figure 4: A simplified fragment of the Python Composite types correspond to language con- ASDL grammar.1 structs, like expressions or statements. Each type has a collection of constructors, each of which specifies the particular language construct a node a composite node of type stmt that represents a of that type represents. Figure4 shows con- class definition and therefore uses the ClassDef structors for the statement (stmt) and expression constructor. In Figure 3b, on the other hand, the (expr) types. The associated language constructs root uses the Call constructor because it repre- include function and class definitions, return state- sents a function call.

Load more