arXiv:1704.08387v3 [cs.CL] 14 Jun 2017 akseicgamr( grammar task-specific a https://github.com/cheng6076/scanner parsed first an is utterance the to approach, second 2015 al. et Flanigan 2010 a Mooney and Wong to grounded 1996 and approach, parsed first representation is the meaning two con- Under utterance following work an approaches. existing and main semantic representation most meaning ceptualizes structure, of differences choice model Despite the interpretable in machine representations. to natural meaning mapping utterances of task language the is parsing Semantic Introduction 1 eetto idt ytci asrand parser syntactic a to tied resentation 1 u oewl eaalbeat available be will code Our civ h tt fteato S We on art the G denotations. of and state their the or is achieve log- parser forms annotated semantic ical using tar- The end-to-end to trained mapped domains. subsequently get sys- transition and a tem with induced are which structures, in predicate-argument of representations form utter- the intermediate language to parser ances natural semantic converts neural which a introduce We eiierslso G on results petitive i asn n o hs r different are these ones. motivated how linguistically from and the parsing on seman- tic for useful light representations of shed types structures argument Q { ; ; ; UESTIONS jianpeng.cheng,siva.reddy rshize al. et Groschwitz in tal. et Liang intermediate RAPH inegCheng Jianpeng erigSrcue aua agaeRepresentations Language Natural Structured Learning eteoe n Collins and Zettlemoyer , Q h nue predicate- induced The . UESTIONS , , Abstract 2014 2011 2006 EO , ; † akidpnetrep- task-independent ; colo nomtc,Uiest fEdinburgh of University Informatics, of School directly Q 2015 ; UERY † n bancom- obtain and eate al. et Berant wakws tal. et Kwiatkowksi aua n Liang and Pasupat el n Mooney and Zelle .Udrthe Under ). iaReddy Siva n W and i learning via , ‡ o eatcParsing Semantic for B ..Wto Research Watson T.J. IBM [email protected] 1 PADES , EB 2013 2005 . - † } e.cu,[email protected], @ed.ac.uk, ; ; , , , ia Saraswat Vijay hnmpe oagone representation grounded a to ( mapped then 2014 2015 creates ( domains across it and transfer words knowledge unseen that of handling po- the is enables which tentially interpretations, approach intermediate two-stage reusable the of Krishnamurthy and Gardner 2014 2016 2011 ( work 1996 previous to however, contrast approach, in first the under fall still models Liang and Jia nei apdt agtmaigrepresen- meaning ( target format string a in to a utter- tation mapped as an is parsing where ance semantic problem transduction treat sequence to impetus strong ( models sn netra asr( parser external than rather an However, using predicates. nat- language containing class ural representation mapped second intermediate first the an are to under utterances where falls approaches model prob- of Our aforementioned the lems. alleviates that parser or extra with brackets). target missing (e.g., the ill-formed be of can terms which in output and possible consider the to of derivations terms in is both problem unconstrained, fairly learning the knowledge, task- prior any specific without Moreover, parsers. in semantic bet- build ter role to critical as so a limitations modeling plays understand knowledge Such per- is formed. composition pos- meaning longer how no interpret is to it sible since cost a modeling at this comes flexibility But engineering. feature generally extensive more and learning, grammar sumptions, wakwk tal. et Kwiatkowski h ucsflapiaino encoder-decoder of application successful The nti ok epooeanua semantic neural a propose we work, this In oavreyo L ak a provided has tasks NLP of variety a to ) ). ; rmnal pcfidCGgrammars CCG specified manually or ) as- domain-specific for need the reduce they ) ; eteoe n Collins and Zettlemoyer adnue al. et Bahdanau rsnmrh n Mitchell and Krishnamurthy , ‡ 2016 and , ; Koˇcisk´y al. et iel Lapata Mirella 2013 , ogadLapata and Dong 2015 ; , ed tal. et Reddy ed tal. et Reddy 2017 , el n Mooney and Zelle 2005 ; usee tal. et Sutskever , .Amerit A ). edre al. et Bender ; 2016 in tal. et Liang , † , .Such ). , , 2015 2016 2016 2014 ; ; , , , , , , (Kwiatkowski et al., 2013), we induce interme- Predicate Usage Sub-categories answer denotation wrapper — diate representations in the form of predicate- stateid, cityid, type entity type checking argument structures from data. This is achieved riverid, etc. querying for an entire with a transition-based approach which by de- all — set of entities sign yields recursive semantic structures, avoid- one-argument meta count, largest, aggregation ing the problem of generating ill-formed meaning predicates for sets smallest, etc. representations. Compared to existing chart-based logical two-argument meta intersect, connectives predicates for sets union, exclude semantic parsers (Krishnamurthy and Mitchell, 2012; Cai and Yates, 2013; Berant et al., 2013; Table 1: List of domain-general predicates. Berant and Liang, 2014), the transition-based ap- proach does not require feature decomposition lem is to learn a semantic parser that maps x to G over structures and thereby enables the exploration via an intermediate ungrounded representation U. of rich, non-local features. The output of the tran- When G is executed against K, it outputs denota- sition system is then grounded (e.g., to a knowl- tion y. edge base) with a neural mapping model under the Grounded Meaning Representation We assumption that grounded and ungrounded struc- represent grounded meaning representations tures are isomorphic.2 As a result, we obtain a in FunQL (Kate et al., 2005) amongst many neural network that jointly learns to parse natural other alternatives such as language semantics and induce a lexicon that helps (Zettlemoyer and Collins, 2005), λ-DCS (Liang, grounding. 2013) or graph queries (Holzschuher and Peinl, The whole network is trained end-to-end on 2013; Harris et al., 2013). FunQL is a variable- natural language utterances paired with anno- free query language, where each predicate is tated logical forms or their denotations. We treated as a function symbol that modifies an conduct experiments on four datasets, includ- argument list. For example, the FunQL represen- ing GEOQUERY (which has logical forms; tation for the utterance which states do not border Zelle and Mooney 1996), SPADES (Bisk et al., texas is: 2016), WEBQUESTIONS (Berant et al., 2013), and GRAPHQUESTIONS (Su et al., 2016) (which answer(exclude(state(all), next to(texas))) have denotations). Our semantic parser achieves where next to is a domain-specific binary predi- the state of the art on SPADES and GRAPH- cate that takes one argument (i.e., the entity texas) QUESTIONS, while obtaining competitive results and returns a set of entities (e.g., the states border- on GEOQUERY and WEBQUESTIONS. A side- ing Texas) as its denotation. all is a special predi- product of our modeling framework is that the in- cate that returns a collection of entities. exclude is duced intermediate representations can contribute a predicate that returns the difference between two to rationalizing neural predictions (Lei et al., input sets. 2016). Specifically, they can shed light on An advantage of FunQL is that the result- the kinds of representations (especially predi- ing s-expression encodes semantic composition- cates) useful for semantic parsing. Evaluation of ality and derivation of the logical forms. This the induced predicate-argument relations against property makes FunQL logical forms natural syntax-based ones reveals that they are inter- to be generated with recurrent neural networks pretable and meaningful compared to heuristic (Vinyals et al., 2015; Choe and Charniak, 2016; baselines, but they sometimes deviate from lin- Dyer et al., 2016). However, FunQL is less ex- guistic conventions. pressive than lambda calculus, partially due to the elimination of variables. A more compact logical 2 Preliminaries formulation which our method also applies to is Problem Formulation Let K denote a knowl- λ-DCS (Liang, 2013). In the absence of anaphora edge base or more generally a reasoning system, and composite binary predicates, conversion algo- and x an utterance paired with a grounded mean- rithms exist between FunQL and λ-DCS. How- ing representation G or its denotation y. Our prob- ever, we leave this to future work.

2We discuss the merits and limitations of this assumption Ungrounded Meaning Representation We in Section 5. also use FunQL to express ungrounded meaning representations. The latter consist primarily of again answer(exclude(states(all), border(texas))) natural language predicates and domain-general which is tree structured. Each predicate (e.g., bor- predicates. Assuming for simplicity that domain- der) can be visualized as a non-terminal node of general predicates share the same vocabulary the tree and each entity (e.g., texas) as a termi- in ungrounded and grounded representations, nal. The predicate all is a special case which the ungrounded representation for the example acts as a terminal directly. We can generate the utterance is: tree top-down with a transition system reminis- cent of grammars (RN- answer(exclude(states(all), border(texas))) NGs; Dyer et al. 2016). Similar to RNNG, our al- where states and border are natural language pred- gorithm uses a buffer to store input tokens in the icates. In this work we consider five types of utterance and a stack to store partially completed domain-general predicates illustrated in Table 1. trees. A major difference in our semantic pars- Notice that domain-general predicates are often ing scenario is that tokens in the buffer are not implicit, or represent extra-sentential knowledge. fetched in a sequential order or removed from the For example, the predicate all in the above utter- buffer. This is because the lexical alignment be- ance represents all states in the domain which are tween an utterance and its semantic representation not mentioned in the utterance but are critical for is hidden. Moreover, some domain-general pred- working out the utterance denotation. Finally, note icates cannot be clearly anchored to a token span. that for certain domain-general predicates, it also Therefore, we allow the generation algorithm to makes sense to extract natural language rationales pick tokens and combine logical forms in arbitrary (e.g., not is indicative for exclude). But we do not orders, conditioning on the entire set of sentential find this helpful in experiments. features. Alternative solutions in the traditional se- In this work we constrain ungrounded represen- mantic parsing literature include a floating chart tations to be structurally isomorphic to grounded parser (Pasupat and Liang, 2015) which allows to ones. In order to derive the target logical forms, construct logical predicates out of thin air. all we have to do is replacing predicates in the Our transition system defines three actions, ungrounded representations with symbols in the namely NT, TER, and RED, explained below. 3 knowledge base. NT(X) generates a Non-Terminal predicate. This predicate is either a natural language expression 3 Modeling such as border, or one of the domain-general In this section, we discuss our neural model which predicates exemplified in Table 1 (e.g., exclude). maps utterances to target logical forms. The se- The type of predicate is determined by the place- mantic parsing task is decomposed in two stages: holder X and once generated, it is pushed onto the we first explain how an utterance is converted to stack and represented as a non-terminal followed an intermediate representation (Section 3.1), and by an open bracket (e.g., ‘border(’). The open then describe how it is grounded to a knowledge bracket will be closed by a reduce operation. base (Section 3.2). TER(X) generates a TERminal entity or the spe- 3.1 Generating Ungrounded Representations cial predicate all. Note that the terminal choice At this stage, utterances are mapped to interme- does not include variable (e.g., $0, $1), since FunQL is a variable-free language which suffi- diate representations with a transition-based algo- ciently captures the semantics of the datasets we rithm. In general, the transition system generates work with. The framework could be extended the representation by following a derivation tree to generate directed acyclic graphs by incorporat- (which contains a set of applied rules) and some canonical generation order (e.g., pre-order). For ing variables with additional transition actions for FunQL, a simple solution exists since the repre- handling variable mentions and co-reference. sentation itself encodes the derivation. Consider RED stands for REDuce and is used for subtree 3As a more general definition, we consider two seman- completion. It recursively pops elements from the tic graphs isomorphic if the graph structures governed by stack until an open non-terminal node is encoun- domain-general predicates, ignoring local structures contain- ing only natural language predicates, are the same (Section tered. The non-terminal is popped as well, af- 5). ter which a composite term representing the entire Sentence: which states do not border texas Non-terminal symbols in buffer: which, states, do, not, border Terminal symbols in buffer: texas Stack Action NT choice TER choice NT answer answer ( NT exclude answer ( exclude ( NT states answer ( exclude ( states ( TER all answer ( exclude ( states ( all RED answer ( exclude ( states ( all ) NT border answer ( exclude ( states ( all ), border ( TER texas answer ( exclude ( states ( all ), border ( texas RED answer ( exclude ( states ( all ), border ( texas ) RED answer ( exclude ( states ( all ), border ( texas ) ) RED answer ( exclude ( states ( all ), border ( texas ) ) )

Table 2: Actions taken by the transition system for generating the ungrounded meaning representation of the example utterance. Symbols in red indicate domain-general predicates. subtree, e.g., border(texas), is pushed back to the is because we do not have an explicit buffer rep- stack. If a RED action results in having no more resentation due to the non-projectivity of seman- open non-terminals left on the stack, the transition tic parsing. We therefore compute at each time system terminates. Table 2 shows the transition step an adaptively weighted representation of bt actions used to generate our running example. (Bahdanau et al., 2015) conditioned on the stack The model generates the ungrounded represen- representation st. This buffer representation is tation U conditioned on utterance x by recursively then concatenated with the stack representation to calling one of the above three actions. Note that form the system representation et. U is defined by a sequence of actions (denoted When the predicted action is either NT or TER, by a) and a sequence of term choices (denoted an ungrounded term ut (either a predicate or an by u) as shown in Table 2. The conditional proba- entity) needs to be chosen from the candidate list bility p(U|x) is factorized over time steps as: depending on the specific placeholder X. To se- lect a domain-general term, we use the same rep- p U x p a, u x ( | )= ( | ) (1) resentation of the transition system et to compute T a probability distribution over candidate terms: I(at=6 RED) = p(at|a

p(at|a

Jayant Krishnamurthy and Tom Mitchell. 2012. Panupong Pasupat and Percy Liang. 2015. Compo- Weakly supervised training of semantic parsers. In sitional semantic parsing on semi-structured tables. Proceedings of the 2012 Joint Conference on Empir- In Proceedings of the 53rd Annual Meeting of the ical Methods in Natural Language Processing and Association for Computational Linguistics and the Computational Natural Language Learning. Jeju Is- 7th International Joint Conference on Natural Lan- land, Korea, pages 754–765. guage Processing (Volume 1: Long Papers). Beijing, China, pages 1470–1480. Jayant Krishnamurthy and Tom M. Mitchell. 2015. Learning a Compositional Semantics for Freebase Jeffrey Pennington, Richard Socher, and Christopher with an Open Predicate Vocabulary. Transactions Manning. 2014. Glove: Global vectors for word of the Association for Computational Linguistics representation. In Proceedings of the 2014 Con- 3:257–270. ference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, pages 1532– Tom Kwiatkowksi, Luke Zettlemoyer, Sharon Goldwa- 1543. ter, and Mark Steedman. 2010. Inducing probabilis- tic CCG grammars from logical form with higher- Siva Reddy, Mirella Lapata, and Mark Steedman. 2014. order unification. In Proceedings of the 2010 Con- Large-scale semantic parsing without question- ference on Empirical Methods in Natural Language answer pairs. Transactions of the Association for Processing. Cambridge, MA, pages 1223–1233. Computational Linguistics 2:377–392. Siva Reddy, Oscar T¨ackstr¨om, Michael Collins, Tom Meeting of the Association for Computational Lin- Kwiatkowski, Dipanjan Das, Mark Steedman, and guistics and the 7th International Joint Conference Mirella Lapata. 2016. Transforming dependency on Natural Language Processing (Volume 1: Long structures to logical forms for semantic parsing. Papers). Beijing, China, pages 1321–1331. Transactions of the Association for Computational Linguistics 4:127–140. Wen-tau Yih, Matthew Richardson, Chris Meek, Ming- Wei Chang, and Jina Suh. 2016. The value of se- Sunita Sarawagi and William W Cohen. 2005. Semi- mantic parse labeling for knowledge base question markov conditional random fields for information answering. In Proceedings of the 54th Annual Meet- extraction. In Advances in Neural Information Pro- ing of the Association for Computational Linguistics cessing Systems 17, MIT Press, pages 1185–1192. (Volume 2: Short Papers). Berlin, Germany, pages 201–206. Yu Su, Huan Sun, Brian Sadler, Mudhakar Srivatsa, Izzeddin Gur, Zenghui Yan, and Xifeng Yan. 2016. John M. Zelle and Raymond J. Mooney. 1996. Learn- On generating characteristic-rich question sets for ing to Parse Database Queries Using Inductive Logic qa evaluation. In Proceedings of the 2016 Confer- Programming. In Proceedings of the 13th National ence on Empirical Methods in Natural Language Conference on Artificial Intelligence. Portland, Ore- Processing. Austin, Texas, pages 562–572. gon, pages 1050–1055. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Luke Zettlemoyer and Michael Collins. 2007. Online Sequence to sequence learning with neural net- learning of relaxed CCG grammars for parsing to works. In Advances in Neural Information Process- logical form. In Proceedings of the 2007 Joint Con- ing Systems 27, MIT Press, pages 3104–3112. ference on Empirical Methods in Natural Language Processing and Computational Natural Language Ferhan Ture and Oliver Jojic. 2016. Simple and ef- Learning (EMNLP-CoNLL). Prague, Czech Repub- fective question answering with recurrent neural net- lic, pages 678–687. works. arXiv preprint arXiv:1606.05029 . Luke S. Zettlemoyer and Michael Collins. 2005. Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav Petrov, Learning to Map Sentences to Logical Form: Struc- Ilya Sutskever, and Geoffrey Hinton. 2015. Gram- tured Classification with Probabilistic Categorial mar as a foreign language. In Advances in Neu- Grammars. In Proceedings of 21st Conference in ral Information Processing Systems 28. MIT Press, Uncertainilty in Artificial Intelligence. Edinburgh, pages 2773–2781. Scotland, pages 658–666. Yuk Wah Wong and Raymond Mooney. 2006. Learn- Kai Zhao and Liang Huang. 2015. Type-driven in- ing for semantic parsing with statistical machine cremental semantic parsing with polymorphism. In translation. In Proceedings of the Human Language Proceedings of the 2015 Conference of the North Technology Conference of the NAACL, Main Con- American Chapter of the Association for Computa- ference. New York City, USA, pages 439–446. tional Linguistics: Human Language Technologies. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Denver, Colorado, pages 1416–1421. Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual at- tention. In International Conference on Machine Learning. pages 2048–2057. Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016. Question answering on Freebase via relation extraction and textual evi- dence. In Proceedingsof the 54th Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Berlin, Germany, pages 2326– 2336. Xuchen Yao and Benjamin Van Durme. 2014. Infor- mation extraction over structured data: Question an- swering with Freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers). Balti- more, Maryland, pages 956–966. Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. 2015. Semantic parsing via staged query graph generation: Question answering with knowledge base. In Proceedings of the 53rd Annual